Search research articles

关于 JoVE

概览领导团队博客 JoVE 帮助中心

作者

出版流程编辑委员会范围与政策同行评审常见问题投稿

图书馆员

用户评价订阅访问资源图书馆顾问委员会常见问题

研究

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments 存档

教育

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual 教师资源中心教师网站

使用条款与条件

相关概念视频

Spearman's Rank Correlation Test

Spearman's Rank Correlation Test

Spearman's rank correlation test, also known as Spearman's rho, is a nonparametric method for assessing the strength and direction of association between two variables. This test is particularly valuable when the data distribution is unknown or when the assumption of normality does not hold. Named after the English psychologist and statistician Dr. Charles Edward Spearman, it serves as the nonparametric counterpart to Pearson's correlation coefficient.
Spearman's test calculates...

Weighted Mean

Weighted Mean

While taking the arithmetic, geometric, or harmonic mean of a sample data set, equal importance is assigned to all the data points. However, all the values may not always be equally important in some data sets. An intrinsic bias might make it more important to give more weightage to specific values over others.
For example, consider the number of goals scored in the matches of a tournament. While computing the average number of goals scored in the tournament, it may be more important to...

Goodness-of-Fit Test

Goodness-of-Fit Test

The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...

Measures of Intelligence

Measures of Intelligence

Psychologists measure intelligence by using standardized tests that produce a score known as the intelligence quotient or IQ. To understand IQ tests, it's important to recognize the key principles behind their construction: validity, reliability, and standardization.
Validity refers to how well a test measures what it claims to measure. An intelligence test should accurately assess intelligence rather than another characteristic, like anxiety. Criterion validity is one way to evaluate this;...

Multiple Comparison Tests

Multiple Comparison Tests

Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...

Test for Homogeneity

Test for Homogeneity

The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to conclude whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence. The hypotheses for the test for homogeneity can...

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序

Same author

Developing and validating a frailty score based on patient-reported outcome 3 months after stroke: A Riksstroke-based study.

PloS one·2026

Same author

The bit scale: A metric score scale for unidimensional item response theory models.

Psychometrika·2025

Same author

Calculating Bias in Test Score Equating in a NEAT Design.

Applied psychological measurement·2025

Same author

An Information Manifold Perspective for Analyzing Test Data.

Applied psychological measurement·2024

Same author

Efficiency Analysis of Item Response Theory Kernel Equating for Mixed-Format Tests.

Applied psychological measurement·2023

Same author

Evaluating Equating Transformations in IRT Observed-Score and Kernel Equating Methods.

Applied psychological measurement·2023

Same journal

The EM Algorithm and Its Variants in Cognitive Diagnostic Models: Comparing Their Propensity for Boundaries, Extremes, Convergence, and Suboptimal Solutions.

Applied psychological measurement·2026

Same journal

When Perceptions of Social Desirability Differ: Implications for the Multidimensional Nominal Response Model of Faking.

Applied psychological measurement·2026

Same journal

csemGT: An R Package for Estimating Raw-Score Conditional Standard Errors of Measurement in Generalizability Theory.

Applied psychological measurement·2026

Same journal

Confirmatory Factor Analysis with Adaptive Quadrature Estimator Using Four Link Functions.

Applied psychological measurement·2026

Same journal

Automatic Item Generation Measurement Models Respecting the Stochastic Sampling Space for Cross-Classified and Two-Level Sampling of Subjects and Incidentals.

Applied psychological measurement·2026

Same journal

Multistage Testing for Cognitive Diagnosis Based on Skill-Space Partitioning.

Applied psychological measurement·2026

查看所有相关文章

Search research articles

相关实验视频

Updated: Sep 12, 2025

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

结合倾向分数和测试分数等同的常见项目

Inga Laukaityte¹, Gabriel Wallin², Marie Wiberg³

¹Department of Applied Educational Science, Umeå University, Sweden.

Applied psychological measurement

|August 4, 2025

概括

此摘要是机器生成的。

本研究引入了一种用于公平测试成绩比较的新统计方法. 将倾向分数与常见项目数据相结合,可以提高准确性,减少教育测试中的偏见.

关键词:

学术录取学术录取是什么教育测试教育测试教育测试在等同化方面,它是相当的.公平的公平的公平.没有等效的组与测试设计.

更多相关视频

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

Published on: February 7, 2025

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

相关实验视频

Last Updated: Sep 12, 2025

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

A Tablet-Based Curriculum-Based Measurement Protocol for Kindergarten Writing

Published on: February 7, 2025

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

科学领域:

统计统计统计统计
教育测量教育的测量
心理测量心理测量心理测量

背景情况:

确保测试形式和组之间的得分可比性是教育测试中的一个关键挑战.
目前测试分数等同的方法通常依赖于共同的项目或组相似性的假设.
需要新的方法来提高测试成绩比较的公平性和准确性.

研究的目的:

开发和评估一种用于测试分数等同的新统计方法.
将基于背景共变量的倾向得分与常见项目信息相结合,以提高得分的可比性.
通过实证和模拟研究来评估这种综合方法的性能.

主要方法:

利用了从考生背景共变量中得出的倾向分数.
综合性倾向分数与使用内核平滑技术的共同项目信息.
进行了对高风险的大学入学考试和模拟研究的经验分析.

主要成果:

拟议的方法整合了倾向分数和常见项目数据,与单独使用任何来源相比,显示了较少的标准错误和偏差.
测试对象共变量的均衡组被证明可以提高分数比较的公平性和准确性.
该研究强调了利用所有可用的数据来提高得分可比性的好处.

结论:

这种新的方法通过整合倾向分数和常见项目信息,有效地提高了测试成绩的可比性.
这种方法提供了一种更强大,更准确的方式,以确保在不同群体的教育测试中公平.
考虑所有收集的数据,包括背景共变量,对于提高测试分数等级的精度至关重要.