The Computer as a Silent Partner in Essay Sc

The Computer as a Silent Partner in Essay Sc

[featured_image]
  • Version
  • Download 107
  • File Size 235.31 KB
  • File Count 1
  • Create Date August 2, 2018
  • Last Updated August 2, 2018

The Computer as a Silent Partner in Essay Sc

Psychometric measurement based on subjective judgments of performance quality (e.g., essay ratings) is, typically, not very reliable. The subjective judgments are often integrated into a single score by means of the following scoring model: Initially, two independent judgments are conducted, then, if the absolute difference between them is not too large, their mean is used as the score. Otherwise, an additional judgment is conducted, and the score is determined by mean of the third judgment and whichever of the original two is closest to it. Whenever the two judgments are sampled from the same distribution, their mean is an unbiased estimate of the true score. However, quite surprisingly, substituting any one of the judgments according to the scoring model described above would result in increased error variance. In some domains, such as the rating of short essays, it is possible to attain a high level of agreement between a human judgment and a mechanical judgment (Automatic Essay Scoring –, AES) based on fairly simple considerations. Though it is not common practice to rely absolutely on AES, the aforesaid high level of agreement suggests that a model employing the difference between a mechanically generated score and a score generated by a human judge is worth considering. Accordingly, we propose that the following model be put into practice: In the initial phase, two judgments are obtained, one human and the other mechanical. It follows from the logic described above that a large difference between the two scores indicates the likelihood that the human-generated score is fairly far from the true score. Since, in some situations, the validity of correcting for this by averaging that score with the mechanically-generated one is disputable, the recruiting of another human judge is called for. The overall cost of judgment would be substantially reduced by reducing the considerable rate of scores generated by human judges that have to be corrected in this manner. The study explores the benefits of using this model. The current study, which is based on simulated essays and scores, explores the error of measurement associated with various scoring rules.

Attached Files

FileAction
paper_1162a201e4.pdfDownload