Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Distribution Reliability and Automation

Distribution Reliability and Automation

Distribution reliability in electrical power systems is critical for ensuring an uninterrupted power supply to consumers at minimal cost. According to IEEE Standard Terms, reliability is the probability that a device will function without failure over a specified time period or amount of usage. For electric power distribution, this translates to maintaining continuous power supply and addressing customer concerns over power outages. Several indices, as defined by IEEE Standard 1366-2012, are...

Reliability and Validity

Reliability and Validity

Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.

Introduction to z Scores

Introduction to z Scores

A z score (or standardized value) is measured in units of the standard deviation. It tells you how many standard deviations the value x is above (to the right of) or below (to the left of) the mean, μ. Values of x that are larger than the mean have positive z scores, and values of x that are smaller than the mean have negative z scores. If x equals the mean, then x has a zero z score. It is important to note that the mean of the z scores is zero, and the standard deviation is one.
z scores...

Introduction to z Scores

Introduction to z Scores

A z score (or standardized value) is measured in units of the standard deviation. It indicates how many standard deviations the value x is above (to the right of) or below (to the left of) the mean, μ. Values of x that are larger than the mean have positive z scores, and values of x that are smaller than the mean have negative z scores. If x equals the mean, then x has a zero z score. It is important to note that the mean of the z scores is zero, and the standard deviation is one.
z scores...

z Scores and Area Under the Curve

z Scores and Area Under the Curve

z scores are the standardized values obtained after converting a normal distribution into a standard normal distribution. A z score is measured in units of the standard deviation. The z score tells you how many standard deviations the value x is above (to the right of) or below (to the left of) the mean, μ. Values of x that are larger than the mean have positive z scores, and values of x that are smaller than the mean have negative z scores. If x equals the mean, then x has a z score of...

Weighted Mean

Weighted Mean

While taking the arithmetic, geometric, or harmonic mean of a sample data set, equal importance is assigned to all the data points. However, all the values may not always be equally important in some data sets. An intrinsic bias might make it more important to give more weightage to specific values over others.
For example, consider the number of goals scored in the matches of a tournament. While computing the average number of goals scored in the tournament, it may be more important to...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Cognitive reflection is a distinct and measurable trait.

Proceedings of the National Academy of Sciences of the United States of America·2024

Same author

A generative AI-driven interactive listening assessment task.

Frontiers in artificial intelligence·2024

Same author

The interactive reading task: Transformer-based automatic item generation.

Frontiers in artificial intelligence·2022

Same author

Is a Computerized Adaptive Test More Motivating Than a Fixed-Item Test?

Applied psychological measurement·2018

Same author

Differences in Reaction to Immediate Feedback and Opportunity to Revise Answers for Multiple-Choice and Open-Ended Questions.

Educational and psychological measurement·2018

Same author

Effort in Low-Stakes Assessments: What Does It Take to Perform as Well as in a High-Stakes Setting?

Educational and psychological measurement·2018

Same journal

babebi: An R Package for Bayesian Estimation and Validation in Small-N Two-Rater Pre-Post Designs.

Applied psychological measurement·2026

Same journal

A Tool for Agreement and Alignment Analysis in Binary Rating Tasks: The R Package scindex.

Applied psychological measurement·2026

Same journal

The EM Algorithm and Its Variants in Cognitive Diagnostic Models: Comparing Their Propensity for Boundaries, Extremes, Convergence, and Suboptimal Solutions.

Applied psychological measurement·2026

Same journal

When Perceptions of Social Desirability Differ: Implications for the Multidimensional Nominal Response Model of Faking.

Applied psychological measurement·2026

Same journal

csemGT: An R Package for Estimating Raw-Score Conditional Standard Errors of Measurement in Generalizability Theory.

Applied psychological measurement·2026

Same journal

Confirmatory Factor Analysis with Adaptive Quadrature Estimator Using Four Link Functions.

Applied psychological measurement·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Feb 9, 2026

An Automated Microscopic Scoring Method for the γ-H2AX Foci Assay in Human Peripheral Blood Lymphocytes

An Automated Microscopic Scoring Method for the γ-H2AX Foci Assay in Human Peripheral Blood Lymphocytes

Published on: December 25, 2021

Reliability-Based Feature Weighting for Automated Essay Scoring.

¹Educational Testing Service, Princeton, NJ, USA.

Applied Psychological Measurement

|June 9, 2018

Summary

This summary is machine-generated.

Automated essay scoring (AES) systems can improve reliability by weighting features for internal consistency, not just human score prediction. This approach yields comparable or better validity coefficients in large-scale writing assessments.

Keywords:

automated scoring essay assessment reliability validity writing

More Related Videos

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Published on: November 8, 2024

Related Experiment Videos

Last Updated: Feb 9, 2026

An Automated Microscopic Scoring Method for the γ-H2AX Foci Assay in Human Peripheral Blood Lymphocytes

An Automated Microscopic Scoring Method for the γ-H2AX Foci Assay in Human Peripheral Blood Lymphocytes

Published on: December 25, 2021

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Author Spotlight: IntelliSleepScorer — A High-Accuracy, Accessible GUI Software for Automated Sleep Stage Scoring in Mice and its Application in Psychiatric Research

Published on: November 8, 2024

Area of Science:

Natural Language Processing
Educational Measurement
Artificial Intelligence in Education

Background:

Traditional automated essay scoring (AES) systems prioritize emulating human scores, using statistical methods to determine feature importance.
This reliance on human score prediction may lead to feature weights reflecting statistical artifacts rather than genuine writing quality.
Machine essay evaluation differs fundamentally from human evaluation, suggesting a need for alternative scoring approaches.

Purpose of the Study:

To propose and evaluate alternative feature weighting schemes for AES systems.
To maximize the reliability and internal consistency of composite essay scores.
To investigate if these alternative schemes outperform traditional human-prediction based weights.

Main Methods:

Developed feature weighting schemes focused on optimizing score reliability and internal consistency.
Compared these novel weighting schemes against traditional human-prediction based weights.
Utilized a large-scale writing assessment dataset for empirical evaluation.

Main Results:

Alternative feature weighting schemes produced significantly different feature weights compared to human-prediction methods.
The proposed schemes resulted in comparable or superior reliability coefficients.
Validity coefficients were also comparable or improved using the new weighting strategies.

Conclusions:

Rethinking feature weighting in AES is crucial for developing more robust and reliable scoring systems.
Prioritizing internal consistency over direct human score emulation offers a promising alternative for AES.
This approach can lead to more effective automated assessment of writing skills.