Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Multiple Regression01:25

Multiple Regression

3.0K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.0K
Regression Analysis01:11

Regression Analysis

5.7K
Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
5.7K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

1.5K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
1.5K
Factorial Design02:01

Factorial Design

13.0K
Factorial Analysis is an experimental design that applies Analysis of Variance (ANOVA) statistical procedures to examine a change in a dependent variable due to more than one independent variable, also known as factors. Changes in worker productivity can be reasoned, for example, to be influenced by salary and other conditions, such as skill level. One way to test this hypothesis is by categorizing salary into three levels (low, moderate, and high) and skills sets into two levels (entry level...
13.0K
Multiple Allele Traits01:49

Multiple Allele Traits

34.1K
The Concept of Multiple Allelism
34.1K
Collisions in Multiple Dimensions: Problem Solving01:06

Collisions in Multiple Dimensions: Problem Solving

3.7K
In multiple dimensions, the conservation of momentum applies in each direction independently. Hence, to solve collisions in multiple dimensions, we should write down the momentum conservation in each direction separately. To help understand collisions in multiple dimensions, consider an example.
A small car of mass 1,200 kg traveling east at 60 km/h collides at an intersection with a truck of mass 3,000 kg traveling due north at 40 km/h. The two vehicles are locked together. What is the...
3.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Integrated analysis for electronic health records with structured and sporadic missingness.

Journal of biomedical informatics·2025
Same author

Phylogenetic association analysis with conditional rank correlation.

Biometrika·2024
Same author

Matrix Reordering for Noisy Disordered Matrices: Optimality and Computationally Efficient Algorithms.

IEEE transactions on information theory·2024
Same author

Estimation and Inference for High-Dimensional Generalized Linear Models with Knowledge Transfer.

Journal of the American Statistical Association·2024
Same author

Optimal Estimation of Genetic Relatedness in High-dimensional Linear Models.

Journal of the American Statistical Association·2024
Same author

Transfer Learning in Large-scale Gaussian Graphical Models with False Discovery Rate Control.

Journal of the American Statistical Association·2023
Same journal

Simplifying debiased inference via automatic differentiation and probabilistic programming.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026
Same journal

Principal stratification with U-statistics under principal ignorability.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026
Same journal

Causal K-Means Clustering.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026
Same journal

Inference of dependency knowledge graph for Electronic Health Records.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026
Same journal

Correction to: Inference of dependency knowledge graph for Electronic Health Records.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026
Same journal

Harmonized Estimation of Subgroup-Specific Treatment Effects in Randomized Trials: The Use of External Control Data.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026
See all related articles

Related Experiment Video

Updated: Jun 13, 2025

Cross-Modal Multivariate Pattern Analysis
13:51

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

19.9K

Testing high-dimensional multinomials with applications to text analysis.

T Tony Cai1, Zheng T Ke2, Paxton Turner2

  • 1Department of Statistics and Data Science, University of Pennsylvania, Philadelphia, PA, USA.

Journal of the Royal Statistical Society. Series B, Statistical Methodology
|September 16, 2024
PubMed
Summary
This summary is machine-generated.

We developed a new statistical test for comparing high-dimensional multinomial distributions, crucial for text mining and discrete distribution inference. This test is efficient and achieves optimal detection boundaries in various applications.

Keywords:
authorship attributioncloseness testingcustomer reviewsmartingale central limit theoremminimax optimalitytopic model

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K
Basics of Multivariate Analysis in Neuroimaging Data
06:35

Basics of Multivariate Analysis in Neuroimaging Data

Published on: July 24, 2010

16.8K

Related Experiment Videos

Last Updated: Jun 13, 2025

Cross-Modal Multivariate Pattern Analysis
13:51

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

19.9K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K
Basics of Multivariate Analysis in Neuroimaging Data
06:35

Basics of Multivariate Analysis in Neuroimaging Data

Published on: July 24, 2010

16.8K

Area of Science:

  • Multivariate statistics
  • Computational statistics
  • Machine learning

Background:

  • Comparing discrete probability distributions is vital for text mining, topic modeling, and authorship attribution.
  • Existing methods often require assumptions about parameter homogeneity or equal sample sizes, limiting their applicability.
  • High-dimensional multinomial distributions present unique challenges due to the curse of dimensionality.

Purpose of the Study:

  • To develop a novel statistical test for the equality of probability mass functions across K groups of high-dimensional multinomial distributions.
  • To establish the asymptotic properties of the proposed test statistic under the null hypothesis.
  • To demonstrate the test's optimality and practical utility in real-world scenarios.

Main Methods:

  • A new test statistic is proposed for comparing multinomial probability mass functions.
  • Asymptotic null distribution of the test statistic is derived as standard normal.
  • The test's ability to achieve the optimal detection boundary is theoretically established.
  • Simulation studies and real-world dataset analyses are conducted.

Main Results:

  • The proposed test statistic has an asymptotic standard normal distribution under the null hypothesis.
  • The limiting null distribution is parameter-free and does not require equal group sizes or identical parameters within groups.
  • The test achieves the optimal detection boundary across the parameter space.
  • Simulations confirm the test's performance, and applications show its utility in analyzing customer reviews and scientific abstracts.

Conclusions:

  • A robust and asymptotically optimal test for comparing high-dimensional multinomial distributions is presented.
  • The method offers a powerful tool for applications in text mining, topic modeling, and discrete distribution analysis.
  • The test's parameter-free null distribution and optimality make it broadly applicable without stringent assumptions.