Criterion-Referenced Language TestingOver the past decade criterion-referenced testing (CRT) has become an emerging issue in language assessment. Most language testing books have hitherto focused almost exclusively on norm-referenced testing, whereby test takers' scores are interpreted with reference to the performance of other test takers, and have ignored CRT, an approach that examines the level of knowledge of a specific domain of target behaviours. It is designed to comprehensively address the wide variety of CRT and decision-making needs that more and more language-teaching professionals must address in their daily work. Criterion-referenced Language Testing is the first volume to create a nexus between the theoretical constructs and practical applications of this new area of language testing. |
What people are saying - Write a review
We haven't found any reviews in the usual places.
Contents
Criterionreferenced tests are different | 9 |
The place of CRTs in language testing theory and research | 14 |
What is language prociency? | 16 |
What problems do CRT developers face? | 25 |
2 | 28 |
A closer look at objectives and criterionreferenced testing | 36 |
Performance objectives | 39 |
Experiential objectives | 46 |
| 62 | |
| 64 | |
| 69 | |
| 71 | |
| 78 | |
| 86 | |
| 95 | |
| 98 | |
| 101 | |
| 102 | |
| 107 | |
| 111 | |
| 120 | |
| 126 | |
| 127 | |
| 128 | |
| 131 | |
| 132 | |
The threeparameter model | 133 |
2 | 138 |
George Helga Dan Pat | 140 |
+ | 147 |
5 | 149 |
consistency reliability and | 150 |
NRTreliability | 151 |
A note on correlation | 152 |
Reliability | 153 |
Figure 51 Plot of listening by reading | 156 |
Table 53 Correlation matrix for the | 159 |
Equivalent forms reliability | 163 |
Thresholdloss agreement methods | 169 |
Masters | 171 |
Table 510 Calculating estimated variance components for persons | 179 |
z2 | 181 |
Thus phi is the ratio of the persons variance o | 185 |
Useful relationships among reliability and dependability | 198 |
Local independence | 206 |
Model to data t | 207 |
6 | 212 |
Content validity | 213 |
Expert judgments approach to content validity | 220 |
Construct validity | 225 |
PreB | 226 |
Differentialgroups construct validity studies | 230 |
Expanded views of validity | 240 |
Cronbachs perspectives on questions about validity | 246 |
Making decisions with criterionreferenced tests | 248 |
Are traditional cutpoints and grading on a curve justied? | 249 |
Are cutpoints necessarily arbitrary? | 251 |
What is standards setting? | 253 |
acceptable | 261 |
What is the relationship between standards and | 264 |
How are validity and criterionreferenced decision making | 265 |
7 | 269 |
Team development of CRTs | 270 |
Marshaling adequate resources | 275 |
Counterbalancing criterionreferenced forms | 277 |
Who should get feedback? | 279 |
Interpreting gain scores | 288 |
Difculties in reporting criterionreferenced results | 291 |
References | 292 |
Index | 310 |
1 | 1 |
Some useful denitions | 2 |
Criterionreferenced tests | 3 |
Differences and similarities between NRTs and CRTs in | 6 |
Fitting assessment types to curriculum | 47 |
The role of feedback | 49 |
3 | 56 |
Some useful denitions | 57 |
Table 31 Linguistic and format confoundings | 61 |
Format confoundings | 62 |
Selfassessment | 64 |
reading listening grammar knowledge and phonemic | 69 |
Constructedresponse items | 71 |
Personalresponse items | 78 |
Criterionreferenced item format analysis | 86 |
Improving the specications | 95 |
Item quality and content analyses | 98 |
4 | 101 |
Description of CRTscore distributions | 102 |
Table 41 Calculating the mean for a set of CRT | 107 |
Understanding numerical descriptions | 111 |
Difference index | 120 |
all of those who failed the test answered item 3 | 126 |
Criterionreferenced item selection | 127 |
Item response theory and CRT | 128 |
The oneparameter model | 131 |
The twoparameter model | 132 |
The threeparameter model | 133 |
2 | 138 |
George Helga Dan Pat | 140 |
5 | 149 |
consistency reliability and | 150 |
NRTreliability | 151 |
A note on correlation | 152 |
Reliability | 153 |
Figure 51 Plot of listening by reading | 156 |
Table 53 Correlation matrix for the | 159 |
Equivalent forms reliability | 163 |
Thresholdloss agreement methods | 169 |
Masters | 171 |
Table 510 Calculating estimated variance components for persons | 179 |
z2 | 181 |
Thus phi is the ratio of the persons variance o | 185 |
Useful relationships among reliability and dependability | 198 |
Local independence | 206 |
Model to data t | 207 |
6 | 212 |
Content validity | 213 |
Expert judgments approach to content validity | 220 |
Construct validity | 225 |
PreB | 226 |
Differentialgroups construct validity studies | 230 |
Expanded views of validity | 240 |
Cronbachs perspectives on questions about validity | 246 |
Making decisions with criterionreferenced tests | 248 |
Are traditional cutpoints and grading on a curve justied? | 249 |
Are cutpoints necessarily arbitrary? | 251 |
What is standards setting? | 253 |
acceptable | 261 |
What is the relationship between standards and | 264 |
How are validity and criterionreferenced decision making | 265 |
7 | 269 |
Team development of CRTs | 270 |
Marshaling adequate resources | 275 |
Counterbalancing criterionreferenced forms | 277 |
Who should get feedback? | 279 |
Interpreting gain scores | 288 |
Difculties in reporting criterionreferenced results | 291 |
References | 292 |
Index | 310 |
Other editions - View all
Common terms and phrases
1500-word science article ability level accuracy administration answer approach assessment B-index Brown calculated classi®cation classical test theory con®dence construct validity content validity correctly course criterion criterion-referenced tests curriculum cut-point cut-score de®ned de®nition decisions dependability discussed domain Educational Measurement Educational Testing Service error estimate evaluation example feedback format forms formula generalizability generalizability theory guidelines Hambleton identi®ed indicates interpretation involved item dif®culty item discrimination item facility item response theory item speci®cations item statistics language testing learning logit material mean multiple-choice non-masters norm-referenced tests normal distribution number of items objectives overall particular performance portfolios post-test pre-test questions Rasch model raters re¯ect reading referenced tests reliability response sample scale selected self-assessment shown in Table signi®cant skills standard deviation strategies subtests tasks teachers teaching test development test items test scores theory TOEFL topics types unidimensionality variance component writing
Popular passages
Page 33 - And the Gileadites took the passages of Jordan before the Ephraimites : and it was so, that when those Ephraimites which were escaped said, Let me go over ; that the men of Gilead said unto him, Art thou an Ephraimite? If he said, Nay ; Then said they unto him, Say now Shibboleth : * and he said Sibboleth : for he could not frame to pronounce it right.
Page 35 - A criterion-referenced test is one that is deliberately constructed to yield measurements that are directly interpretable in terms of specified performance standards. Performance standards are generally specified by defining a class or domain of tasks that should be performed by the individual.
Page 272 - Messick (1989a:13) sums it up this way: "validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment".
Page 123 - They are signs, and they do no more than denote the objects to which they are attached. What we call a symbol is a term, a name, or even a picture that may be familiar in daily life, yet that possesses specific connotations in addition to its conventional and obvious meaning. It implies something vague, unknown, or hidden from us. Many Cretan monuments, for instance, are marked with the design of the double adze. This is an object that we know, but we do not know its symbolic implications. For another...
Page 59 - ... information lies in the standard used as a reference. The standard against which a student's performance is compared in order to obtain the first kind of information is the criterion behavior which defines subject matter competence.
Page 53 - ... an approach requiring an integrated, facile performance on the part of the examinee. It is conceivable that knowledge could exist without facility. If we limit ourselves to testing only one point at a time, more time is ordinarily allowed for reflection than would occur in a normal communication situation, no matter how rapidly the discrete items are presented. For this reason I recommend tests in which there is less attention paid to specific structure points or lexicon than to the total communicative...
Page 27 - Statistical Tables for Biological, Agricultural and Medical Research, by RA Fisher and F. Yates (Oliver and Boyd, London, 1938).
Page 35 - Criterion-referenced measures indicate the content of the behavioral repertory, and the correspondence between what an individual does and the underlying continuum of achievement. Measures which assess student achievement in terms of a criterion standard thus provide information as to the degree of competence attained by a particular student which is independent of reference to the performance of others.
Page 35 - Along such a continuum of attainment, a student's score on a criterion-referenced measure provides explicit information as to what the individual can or cannot do. Criterion-referenced measures indicate the content of the behavioral repertory, and the correspondence between what an individual does and the underlying continuum of achievement.
Page 13 - Hudson, TD (1989b). Measurement approaches in the development of functional ability level language tests: norm-referenced, criterionreferenced, and item response theory decisions. Unpublished PhD dissertation. University of California at Los Angeles. Hudson, T, Detmer, E., & Brown, JD (1992).



