Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Distribution Reliability and Automation01:25

Distribution Reliability and Automation

519
Distribution reliability in electrical power systems is critical for ensuring an uninterrupted power supply to consumers at minimal cost. According to IEEE Standard Terms, reliability is the probability that a device will function without failure over a specified time period or amount of usage. For electric power distribution, this translates to maintaining continuous power supply and addressing customer concerns over power outages. Several indices, as defined by IEEE Standard 1366-2012, are...
519
Reliability and Validity01:29

Reliability and Validity

14.1K
Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.
14.1K
Introduction to z Scores01:06

Introduction to z Scores

11.2K
A z score (or standardized value) is measured in units of the standard deviation. It tells you how many standard deviations the value x is above (to the right of) or below (to the left of) the mean, μ. Values of x that are larger than the mean have positive z scores, and values of x that are smaller than the mean have negative z scores. If x equals the mean, then x has a zero z score. It is important to note that the mean of the z scores is zero, and the standard deviation is one.
z scores...
11.2K
Introduction to z Scores01:05

Introduction to z Scores

1.4K
A z score (or standardized value) is measured in units of the standard deviation. It indicates how many standard deviations the value x is above (to the right of) or below (to the left of) the mean, μ. Values of x that are larger than the mean have positive z scores, and values of x that are smaller than the mean have negative z scores. If x equals the mean, then x has a zero z score. It is important to note that the mean of the z scores is zero, and the standard deviation is one.
z scores...
1.4K
z Scores and Area Under the Curve01:17

z Scores and Area Under the Curve

19.6K
z scores are the standardized values obtained after converting a normal distribution into a standard normal distribution. A z score is measured in units of the standard deviation. The z score tells you how many standard deviations the value x is above (to the right of) or below (to the left of) the mean, μ. Values of x that are larger than the mean have positive z scores, and values of x that are smaller than the mean have negative z scores. If x equals the mean, then x has a z score of...
19.6K
Weighted Mean00:57

Weighted Mean

6.4K
While taking the arithmetic, geometric, or harmonic mean of a sample data set, equal importance is assigned to all the data points. However, all the values may not always be equally important in some data sets. An intrinsic bias might make it more important to give more weightage to specific values over others.
For example, consider the number of goals scored in the matches of a tournament. While computing the average number of goals scored in the tournament, it may be more important to...
6.4K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Cognitive reflection is a distinct and measurable trait.

Proceedings of the National Academy of Sciences of the United States of America·2024
Same author

A generative AI-driven interactive listening assessment task.

Frontiers in artificial intelligence·2024
Same author

The interactive reading task: Transformer-based automatic item generation.

Frontiers in artificial intelligence·2022
Same author

Is a Computerized Adaptive Test More Motivating Than a Fixed-Item Test?

Applied psychological measurement·2018
Same author

Differences in Reaction to Immediate Feedback and Opportunity to Revise Answers for Multiple-Choice and Open-Ended Questions.

Educational and psychological measurement·2018
Same author

Effort in Low-Stakes Assessments: What Does It Take to Perform as Well as in a High-Stakes Setting?

Educational and psychological measurement·2018
Same journal

babebi: An R Package for Bayesian Estimation and Validation in Small-N Two-Rater Pre-Post Designs.

Applied psychological measurement·2026
Same journal

A Tool for Agreement and Alignment Analysis in Binary Rating Tasks: The R Package scindex.

Applied psychological measurement·2026
Same journal

The EM Algorithm and Its Variants in Cognitive Diagnostic Models: Comparing Their Propensity for Boundaries, Extremes, Convergence, and Suboptimal Solutions.

Applied psychological measurement·2026
Same journal

When Perceptions of Social Desirability Differ: Implications for the Multidimensional Nominal Response Model of Faking.

Applied psychological measurement·2026
Same journal

csemGT: An R Package for Estimating Raw-Score Conditional Standard Errors of Measurement in Generalizability Theory.

Applied psychological measurement·2026
Same journal

Confirmatory Factor Analysis with Adaptive Quadrature Estimator Using Four Link Functions.

Applied psychological measurement·2026
See all related articles

Related Experiment Video

Updated: Feb 9, 2026

An Automated Microscopic Scoring Method for the γ-H2AX Foci Assay in Human Peripheral Blood Lymphocytes
08:23

An Automated Microscopic Scoring Method for the γ-H2AX Foci Assay in Human Peripheral Blood Lymphocytes

Published on: December 25, 2021

5.4K

Reliability-Based Feature Weighting for Automated Essay Scoring.

Yigal Attali1

  • 1Educational Testing Service, Princeton, NJ, USA.

Applied Psychological Measurement
|June 9, 2018
PubMed
Summary
This summary is machine-generated.

Automated essay scoring (AES) systems can improve reliability by weighting features for internal consistency, not just human score prediction. This approach yields comparable or better validity coefficients in large-scale writing assessments.

Keywords:
automated scoringessay assessmentreliabilityvaliditywriting

More Related Videos

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

15.4K
Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research
04:54

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Published on: November 8, 2024

1.0K

Related Experiment Videos

Last Updated: Feb 9, 2026

An Automated Microscopic Scoring Method for the γ-H2AX Foci Assay in Human Peripheral Blood Lymphocytes
08:23

An Automated Microscopic Scoring Method for the γ-H2AX Foci Assay in Human Peripheral Blood Lymphocytes

Published on: December 25, 2021

5.4K
Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

15.4K
Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research
04:54

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Published on: November 8, 2024

1.0K

Area of Science:

  • Natural Language Processing
  • Educational Measurement
  • Artificial Intelligence in Education

Background:

  • Traditional automated essay scoring (AES) systems prioritize emulating human scores, using statistical methods to determine feature importance.
  • This reliance on human score prediction may lead to feature weights reflecting statistical artifacts rather than genuine writing quality.
  • Machine essay evaluation differs fundamentally from human evaluation, suggesting a need for alternative scoring approaches.

Purpose of the Study:

  • To propose and evaluate alternative feature weighting schemes for AES systems.
  • To maximize the reliability and internal consistency of composite essay scores.
  • To investigate if these alternative schemes outperform traditional human-prediction based weights.

Main Methods:

  • Developed feature weighting schemes focused on optimizing score reliability and internal consistency.
  • Compared these novel weighting schemes against traditional human-prediction based weights.
  • Utilized a large-scale writing assessment dataset for empirical evaluation.

Main Results:

  • Alternative feature weighting schemes produced significantly different feature weights compared to human-prediction methods.
  • The proposed schemes resulted in comparable or superior reliability coefficients.
  • Validity coefficients were also comparable or improved using the new weighting strategies.

Conclusions:

  • Rethinking feature weighting in AES is crucial for developing more robust and reliable scoring systems.
  • Prioritizing internal consistency over direct human score emulation offers a promising alternative for AES.
  • This approach can lead to more effective automated assessment of writing skills.