![]() |
Valdosta State University |
UNIT 9 |
Chapter 14: Standardized Testing
1. Describe the two characteristics possessed by all standardized instruments.
2. List/describe the three uses of standardized tests.
3. Describe/explain the distinction(s) between group and individual standardized tests.
4. Describe/explain the difference(s) between criterion-referenced and norm-referenced instruments.
5. Standardized tests fit into two categories, aptitude and achievement. Define what an achievement test measures. Define what all types of aptitude tests measure, and list/describe the four types of aptitude tests.
6. Define the following terms: raw scores; standard/derived scores; percentiles, grade equivalents; deviation IQ scores; stanine; mean; median; mode; standard deviation; _; X; N; & reliability.
A. List and describe five types of standard scores.
B. On pages 551-552, the text describes several types of validity (content, construct, and criterion-related). Use the definitions for the various types of validity that are provided in this syllabus.
7. Describe the normal distribution and its two primary characteristics.
8. Given a set of raw scores, compute the three measures of central tendency and the three measures of variability.
9. "A test cannot be valid unless it is also reliable, but the reverse is not true." Explain what this means.
10. What are the "acceptable" levels of validity and reliability for standardized tests? How does one "find out" about the goodness of a test?
The Buros web site gives users information about accessing information on standardized tests. http://www.unl.edu/buros/home.html
11. What two criteria (usually)
must be met for a student to be considered for a gifted educational program?
Review: Standardized Testing
Assessment devices are instruments used to determine both how well a student has learned covered materials, and/or how well he will do in future endeavors. Assessment can be accomplished through tests, homework, seatwork, etc. Most formal assessments that are used to assign grades and/or for selection purposes or predictions involve tests. A test is a systematic method for measuring students' behaviors and evaluating these behaviors against standards and norms. Tests can be standardized or teacher-made.
I. Standardized tests are instruments that measure and predict ability/aptitude and achievement. Such tests are (a) normed on an appropriate reference group (e.g., a group of people similar to those that the test will be used with); and (b) always administered, scored, and interpreted in the same way.
A. Uses of standardized tests. These instruments are used in the following ways (note: standardized tests should never be used by themselves to make the following decisions-they should be one piece of information used to make a decision, but not the only piece.): (1) selection and placement of students (and others) into various programs (gifted, special ed., admittance into college, etc.); (2) to diagnose specific strengths and weaknesses associated with learning, performance in school, emotional problems, etc. When a test is used as an aid to identifying/diagnosing a problem, it will most likely also be useful in identifying necessary remediation for the deficit; (3) these tests are most commonly used, in education, for evaluation purposes to determine how students are progressing compared to others (in the school, district, state, region, nationally), and to measure the effectiveness of the instruction and curriculum of the school.
B. Classes of standardized tests. There are literally thousands of standardized tests available for use in education, psychology, business, etc. Because of the number of tests, they are commonly classified in the following manner:
STANDARDIZED TESTS
Group Individual
Norm-referenced
Criterion-referenced
Aptitude Achievement
Personality
Projective
Interest inventory
Intelligence
Group tests are used to screen large groups of people to identify possible candidates for individual testing. Group tests are, compared to individual ones, inexpensive with respect to time and money; the trade-off for being less expensive is that they do not provide a great deal of information. Individual tests provide detailed information about the individual but cost a great deal. Financially, schools cannot afford to test all students individually, and it is unethical to collect the type of detailed, private information on students unless there is a professionally sound reason. Individual tests should only be used when there is a need (based on the results of a group test, parent/teacher referral, etc.); unless such a need exists, the use of an individual (or group) test may constitute an unwarranted invasion of privacy.
Norm-referenced tests compare one's abilities to the other individuals taking the test (currently or in the original norming group). Criterion-referenced instruments compare the individual's abilities against some established standard (not just the norming group's performance). Achievement tests measure what one has learned in a particular area (such as mathematics) that has been directly taught in a curriculum; they are often criterion-referenced. Aptitude tests are used for selection and to predict future performance in an area (an occupation, an educational program, etc.); they generally are norm referenced and often do not cover information that was directly taught in a curriculum. There are several types of aptitude tests.
Personality tests describe the characteristic ways one behaves, while projective tests are used to help diagnose emotional disturbance, typically by presenting an ambiguous stimulus that the testee must interpret. How people interpret the stimulus reflects their underlying personality characteristics and pathology because they "project" these onto the stimulus. Interest inventories compare one's expressed likes/dislikes with those of successful people in various occupations; they are used extensively in career planning.
Intelligence tests (sometimes considered a variation of achievement instruments) very narrowly attempt to predict one's ability to learn new materials (based on what one has already learned-achievement, and how quickly one is able to learn). Gifted and Talented. The "wanted" types of exceptionality. Qu.: Do we currently do as good a job with these kids as we do with the intellectually challenged?
Most definitions for giftedness require: (a) a testable IQ of 130 or higher, and (b) high performance in one or more of the following areas: academics, creative/productive thinking; leadership, or the visual/performing arts.
Standardized tests are evaluated on the basis of their validity and reliability. Validity, very generally, is concerned with how well a test measures what it is supposed to (e.g., accuracy), while reliability is the extent to which a test consistently measures what it is measuring. (Notice that a test can be reliable but not valid!) There are several types of validity: Content (measures the degree of relationship between what's taught and what's tested), construct (does the test measure the particular psychological domain-e.g., IQ-that it purports to), and criterion-related.
Criterion-related validity attempts to show that one's performance on a particular instrument/test is correlated with one's performance on some external standard right now or in the future (i.e., predictive). For example, if one scored high on a math achievement test, it would be reasonable to expect that that person would be earning a high grade in math (e.g., the student's performance in the math class is concurrently validating the score on the test). Predictive validity also attempts to relate test performance to an external standard of performance, but the external standard is temporally distant in that one's performance on today's test is used to predict how one will perform on the external standard at some point in the future (predictive).
Interpreting Test Results. Tests are scored by hand or computer and a profile (or report if it is an individual test) is prepared for each student. Performance in each area and sub-area of the test is presented as both a raw score (which is usually meaningless unless one also has -and knows how to use- the test's measures of central tendency and variability) and a derived scored (a raw score that has been transformed/converted so it has normative meaning by itself [many computer-scored tests also provide graphs and charts of the students' scores to aid in their interpretation]).
A. Standard scores
are derived scores that are associated with statistical concepts including
the normal distribution and standard deviation. There are
several types of derived-scores. Percentiles express the student's
score as the percentage of students who took (or have taken-norms) the
test and scored at or below where the student scored. So, if a student's
score is reported as the 65th percentile rank, that means that 65% of the
students scored at or below where this student did on the test; restated,
only 35% scored higher than the student. Grade equivalents report
the student's performance in terms of the grade level at which she scored.
If the kid is in the 3rd grade and, on a standardized reading test, scored
where the kids in the 5th grade WHO TOOK THE SAME TEST scored, her grade
equivalent would be the fifth grade. (This form of derived score is especially
meaningful for parents who are not statistically sophisticated; if their
kid is in the 3rd grade and is reading at the lst grade level, they understand
there is a problem. The problem with grade equivalents is that teachers
and parents often think it means that the 3rd grader, for example, can
do 5th grade level math; not so. It means that the child did as well on
the test as the average 5th grader would score if (s)he took that particular
test.)
Virtually all human characteristics are normally distributed in that if they are graphed, the graph looks like a bell-shaped curve that is symmetrical around the bisecting line. (See the accompanying example.) Notice that the curved lines on the graph descend as they approach the horizontal axis. This tells one that as you move away from the bisecting line (which in a normal distribution represents the mean, median, and mode) the area under the curve decreases.
Many standardized tests are
"normalized" so that obtained scores "fit" the normal
curve and always have the same standard deviation. (Standard deviation,
an indicator of how much dispersion/variability a set of data possesses,
is used to show how much distance there is between scores in a distribution
and the mean of the distribution; S-the average amount of distance between
each score in the distribution and the distribution's mean.
Every set of scores has its
own standard deviation that is fairly easy to compute. First, compute the
variance of the distribution:
S2 = {_X2 - ([_X]2 /N)}/N-1
For example, assume you had 4 scores: 3, 5, 7, & 9 (so N = 4). To compute the standard deviation (SD), first add the scores ( 3+5+7+9 = 24). Then square each score and add them together (the _ sign tells one to add what follows): 9+25+49+81 = 164. Then "plug it into the formula:
S2 = {164 - ([24]2/4)}/3
= {164 - (576/4)}/3 = {164 - 144}/3 =
20/3= 6.67 = S2
THE STANDARD DEVIATION OF AN ARRAY IS THE SQUARE ROOT OF THE VARIANCE ( S2). THE SQUARE ROOT OF 6.67 IS 2.58 = S
STATISTICS REVIEW
I. Statistics: Methods used to (a) summarize/organize and (b) analyze objective data.
A. Two types: descriptive (summarizes large quantities while accurately depicting its characteristics) and inferential (tests of significance used to determine if observed differences between groups as measured by the dependent variable are real or due to error).
l. Two categories of inferential tests: parametric and nonparametric.
B. Descriptive statistics has four subtypes: measures of central tendency; measures of variability; data display; and standard scores. l. measures of central tendency: mean: arithmetic average; median: the most central score that divides a ranked array in half so that half are larger than the central score, and half smaller; mode: the score that occurs more often than any other in the array. 2. measures of variability: range: the distance overwhich scores are spread (e.g., largest-smallest); variance: sq.'d average amount of distance between each score and the mean of the array; standard deviation: average amount of distance between each score and the mean of the array 3. data display: charts, tables, graphs, etc. 4. standard scores: converted scores that have meaning such as percentages and percentile ranks. Grade equivalents: express a score in terms of the average at a particular grade level (e.g., if a 2nd grader gets a grade equivalent of 4.5, it means she did as well as the average kid in the 4th grade fifth month); percentile rank: the score is expressed as the percentage of other scores that were at or below the student's raw score; stanine: the normal curve is divided into 9 equal areas (the mean is 5; lower scores have lower stanines; other standard scores include IQ deviations, z-scores, and T-scores.
II. Populations: a population's mean and standard deviation are called parameters. l. Parameters are always denoted by Greek letters. a. a population's mean is µ ("mu") b. a population's standard deviation is a lower-case sigma ( ); its variance is sigma-squared ( )
III. Samples: a randomly selected sample's mean and standard deviation are called statistics.
1. Statistics are always denoted by Roman (i.e., our alphabet's) letters.
a. a sample's mean is X ("x-bar")
b. a sample's standard deviation is S; its variance is S2
IV. Other symbols:
a. N always represents how many scores are in an array (array=a group of numbers)
b. _ (upper case sigma) tells you to add what follows; it means "the sum of what follows"
c. Xi represents the scores in an array
V. Computing the mean, median, mode and range:
a. sample scores: 24, 10,
35, 9
b. The mean:
µ=_Xi = 24+10+35+9 = 78/4= 19.5
N 4
c. Median: middle score in a ranked array.
l. arrange (rank) scores in order from low to high:
9, 10, 24, 35 If N is an even number, add two central scores and divide by 2. (10+24)/2 = 34/2=17
d. Mode-the score that occurs more often than any other. In the example no score occurs more often than the others, so no mode.
VI. Measures of Variability: range, variance, and standard deviation
Range- the entire distance over which scores are spread. The distance between the smallest and largest score.
l. subtract the smallest score from the largest. Ex. 35-9=26=range
2. Variance: S2 = _X2 - [_X]2 3. standard deviation S = V---
N
N-1
Ex. compute the S and S2 for the following: 3, 5, 8, 2.
X X2
8 64 S2 = {102 - [(18)2/4]}/3 = {102 - [324/4]}/3 =
5 25
3 9 {102 - 81}/3 = 21/3 = 7 = S2; By taking the square root of
2 4 the variance (i.e.,7) you get the standard deviation, S = 2.646
_X=18 _X2=102
Last Updated: May 20, 1997