Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Frequency-dependent Selection

Frequency-dependent Selection

When the fitness of a trait is influenced by how common it is (i.e., its frequency) relative to different traits within a population, this is referred to as frequency-dependent selection. Frequency-dependent selection may occur between species or within a single species. This type of selection can either be positive—with more common phenotypes having higher fitness—or negative, with rarer phenotypes conferring increased fitness.

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Goodness-of-Fit Test

Goodness-of-Fit Test

The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...

Introduction to R

Introduction to R

R is a powerful software environment for statistical computing and graphics. Originating as an implementation of the S language, developed at Bell Laboratories, R has evolved into a robust, open-source statistical software favored by statisticians and data scientists worldwide. Its comprehensive suite includes data manipulation, calculation, and graphical display capabilities, making it versatile for data analysis and visualization. Its programming language is at the core of R's...

Regression Analysis

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

MicroRNA Expression Analysis and Biological Pathways in Chemoresistant Non-Small Cell Lung Cancer.

Cancers·2025

Same author

A case study evaluating the effect of clustering, publication bias, and heterogeneity on the meta-analysis estimates in implant dentistry.

European journal of oral sciences·2023

Same author

XPF interacts with TOP2B for R-loop processing and DNA looping on actively transcribed genes.

Science advances·2023

Same author

Learning biologically-interpretable latent representations for gene expression data: Pathway Activity Score Learning Algorithm.

Machine learning·2023

Same author

Automated machine learning for genome wide association studies.

Bioinformatics (Oxford, England)·2023

Same author

A machine learning approach utilizing DNA methylation as an accurate classifier of COVID-19 disease severity.

Scientific reports·2022

Same journal

The Outcome of Cardiac Hydatid Surgery in The Iraqi Center of Heart Diseases.

F1000Research·2026

Same journal

Perception of body donation among the Phase-1 medical students, a questionnaire-based study.

F1000Research·2026

Same journal

Exploring Infertility in Saudi Arabia: Qualitative Insights into IVF Treatment Services and Policy Recommendations.

F1000Research·2026

Same journal

Cyber Military Operations under International Humanitarian Law: Interpreting the Concept of "Attack" and Challenges in Protecting Civilians.

F1000Research·2026

Same journal

Sentiment Analysis of Acceptance TVET Online Courses on the Skill Academy App from Google Play: Leveraging Text Mining with Comparison Machine Learning Model.

F1000Research·2026

Same journal

Emotional intelligence: An important skill to learn now more than ever.

F1000Research·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 5, 2026

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

Feature selection with the R package MXM.

Michail Tsagris^1,2,3, Ioannis Tsamardinos^2,4,5

¹Department of Economics, University of Crete, Rethymnon, 74100, Greece.

|October 30, 2019

Summary

This summary is machine-generated.

The R package MXM provides advanced feature selection algorithms for diverse data types and large datasets. It offers unique advantages over other packages for predictive modeling and data analysis.

Keywords:

Feature selection R package algorithms computational efficiency

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Related Experiment Videos

Last Updated: Jan 5, 2026

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Area of Science:

Computational statistics
Machine learning
Data science

Background:

Feature selection is crucial for identifying optimal predictors.
Existing R packages for feature selection have limitations in algorithm variety and data handling.
The MXM package aims to address these limitations.

Purpose of the Study:

To introduce and evaluate the R package MXM for feature selection.
To compare MXM's capabilities with existing feature selection packages.
To demonstrate MXM's utility with real-world high-dimensional data.

Main Methods:

Qualitative comparison of MXM with other R feature selection packages.
Demonstration of MXM's algorithms using diverse, high-dimensional datasets.
Utilizing memory-efficient algorithms for handling large-volume data in R.

Main Results:

MXM supports a wide array of target variable types (continuous, survival, categorical, etc.).
It integrates various regression models for different data types.
MXM includes algorithms for detecting statistically equivalent feature sets and handling big data.

Conclusions:

MXM offers a versatile and powerful feature selection solution.
Its unique features provide advantages for complex and large-scale data analysis.
The package enhances predictive modeling capabilities in R.