Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

6.3K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
6.3K
Cochran's Q Test01:17

Cochran's Q Test

500
Cochran's Q Test is a nonparametric statistical test used to determine if there are potential differences in the outcomes of three or more related groups on a binary (yes/no) or dichotomous outcome. It is essentially an extension of the McNemar Test, which is limited to two related samples - Cochran's Q test can handle three or more related samples, making it more versatile in scenarios where subjects are measured under multiple conditions. The test statistic follows a Chi-Square...
500
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

1.8K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
1.8K
Random Sampling Method01:09

Random Sampling Method

11.9K
Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest. Among the various sampling methods used by...
11.9K
Confidence Coefficient01:24

Confidence Coefficient

7.8K
The confidence coefficient is also known as the confidence level or degree of confidence. It is the percent expression for the probability, 1-α, that the confidence interval contains the true population parameter assuming that the confidence interval is obtained after sufficient unbiased sampling; for example, if the CL = 90%, then in 90 out of 100 samples the interval estimate will enclose the true population parameter. Here α is the area under the curve, distributed equally under...
7.8K
Randomized Experiments01:13

Randomized Experiments

7.1K
The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...
7.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Relationship between the levels of metabolites of organophosphate flame retardants in adult urine and NAFLD.

Journal of environmental health science & engineering·2026
Same author

Accelerometer-measured postoperative physical activity confers a significant mortality benefit following joint arthroplasty.

Bone & joint research·2026
Same author

Salmonella Effector SpvC Targets SEC23B of Intestinal Epithelial Cells to Resist Gasdermin D-Mediated Protection Against Systemic Infection.

Microorganisms·2026
Same author

Diagnosing socioeconomic dominance in rainfall-runoff relationship changes across global basins.

Journal of environmental management·2026
Same author

SLC30A7 phosphorylation by ERK1 promotes esophageal squamous cell carcinoma tumorigenesis via activating MMP2/3/9-β-catenin signaling.

Cancer letters·2026
Same author

The m<sup>6</sup>A reader protein YTHDF2 facilitates HTLV-1 infectious and mitotic propagation by stabilizing Tax RNA.

Journal of virology·2026
Same journal

Research on a Regional Availability Evaluation Model for Road-Area High-Entropy Energy Based on Synergy Factors.

Entropy (Basel, Switzerland)·2026
Same journal

Atmospheric Turbulence Channel Modeling and Performance Analysis of a CO-ZP-OFDM Coherent Optical Communication System for UAV Air-to-Ground Scenarios.

Entropy (Basel, Switzerland)·2026
Same journal

Information Geometry and Asymptotic Theory for SMML Estimators.

Entropy (Basel, Switzerland)·2026
Same journal

Correlation Entropy and Power-Law Kinetics.

Entropy (Basel, Switzerland)·2026
Same journal

Research on the Contagion of Systemic Financial Risk Under the Impact of Climate Risks-From the Perspective of Complex Networks and Machine Learning.

Entropy (Basel, Switzerland)·2026
Same journal

The Statistical-Mechanical Meaning of the Wave Function of Quantum Mechanics.

Entropy (Basel, Switzerland)·2026
See all related articles

Related Experiment Video

Updated: Aug 22, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

667

QAScore-An Unsupervised Unreferenced Metric for the Question Generation Evaluation.

Tianbo Ji1, Chenyang Lyu2, Gareth Jones1

  • 1ADAPT Centre, School of Computing, Dublin City University, 9 Dublin, Ireland.

Entropy (Basel, Switzerland)
|November 11, 2022
PubMed
Summary
This summary is machine-generated.

A new evaluation metric, QAScore, is proposed for Question Generation (QG) systems. This reference-free metric better aligns with human judgment than existing methods, improving evaluation accuracy for automated question generation.

Keywords:
question generationquestion generation evaluationreference-free evaluation

More Related Videos

Modeling Verbal Behavior Deficits with the Stimulus Control Ratio Equation, SCoRE
06:57

Modeling Verbal Behavior Deficits with the Stimulus Control Ratio Equation, SCoRE

Published on: May 14, 2019

10.6K
Use of a Video Scoring Anchor for Rapid Serial Assessment of Social Communication in Toddlers
09:16

Use of a Video Scoring Anchor for Rapid Serial Assessment of Social Communication in Toddlers

Published on: March 14, 2018

10.3K

Related Experiment Videos

Last Updated: Aug 22, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

667
Modeling Verbal Behavior Deficits with the Stimulus Control Ratio Equation, SCoRE
06:57

Modeling Verbal Behavior Deficits with the Stimulus Control Ratio Equation, SCoRE

Published on: May 14, 2019

10.6K
Use of a Video Scoring Anchor for Rapid Serial Assessment of Social Communication in Toddlers
09:16

Use of a Video Scoring Anchor for Rapid Serial Assessment of Social Communication in Toddlers

Published on: March 14, 2018

10.3K

Area of Science:

  • Natural Language Processing
  • Artificial Intelligence

Background:

  • Automated Question Generation (QG) has advanced with neural models, but evaluation remains a challenge.
  • Current metrics like BLEU and BERTScore rely on references and show low agreement with human judgment.
  • Existing metrics for QG systems do not consider the passage or answer context.

Purpose of the Study:

  • To introduce QAScore, a novel reference-free evaluation metric for Question Generation.
  • To provide a more accurate and human-aligned method for assessing QG system performance.
  • To address the limitations of current QG evaluation metrics.

Main Methods:

  • QAScore evaluates generated questions by measuring a language model's ability to predict masked answer words.
  • The metric computes cross-entropy based on the probability of correctly generating masked words within the answer.
  • A human evaluation experiment was conducted to compare QAScore with existing metrics.

Main Results:

  • QAScore demonstrates a stronger correlation with human judgment compared to BLEU and BERTScore.
  • The proposed metric offers improved evaluation accuracy for Question Generation systems.
  • Human evaluation confirmed the superiority of QAScore in assessing question quality.

Conclusions:

  • QAScore provides a more reliable and accurate evaluation for Question Generation systems.
  • Reference-free evaluation using QAScore enhances the assessment of automated question quality.
  • This metric facilitates better development and benchmarking of QG technologies.