Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Selected Data About Geographic Locations

Selected Data About Geographic Locations

Geographic Information Systems (GIS) rely on two core types of data: spatial data and attribute data.Spatial DataSpatial data defines the physical location of features within a coordinate system, typically expressed in terms of latitude and longitude. It provides precise positioning for elements like roads, rivers, or buildings.Attribute DataAttribute data complements spatial data by adding descriptive information about these features. For example, a road's spatial data includes its start and...

Trial and Error and Algorithm

Trial and Error and Algorithm

A problem-solving strategy is a plan of action used to find a solution. Different strategies have distinct action plans. Trial and error involves trying different solutions until one works. For instance, to fix a broken printer, you might check ink levels, ensure the paper tray isn't jammed, and verify the printer's connection to your laptop. This method can be time-consuming but is commonly used. Thomas Edison, for example, used trial and error to find a suitable filament for the light...

Antibiotic Selection

Antibiotic Selection

What is Natural Selection?

What is Natural Selection?

Natural selection is an evolutionary process in which individuals with survival-promoting traits reproduce at higher rates. These favorable traits become more common within a population or species. Naturally selected traits initially arise via random genetic mutations. In order for selection to occur, there must be variation within a population, the trait controlling the variation must be heritable, and there must be an evolutionary advantage for variation in the trait.

Dimensional Analysis

Dimensional Analysis

Dimensional analysis, also known as the factor label method, is a versatile approach for mathematical operations. The main principle behind this approach is: the units of quantities must be subjected to the same mathematical operations as their associated numbers. This method can be applied to computations ranging from simple unit conversions to more complex and multi-step calculations involving several different quantities and their units.
Conversion Factors and Dimensional Analysis
The unit...

How Data are Classified: Numerical Data

How Data are Classified: Numerical Data

Data that are countable or measurable in specific units are called numerical or quantitative data. Quantitative data are always numbers. Quantitative data are the result of counting or measuring the attributes of a population. Amount of money, pulse rate, weight, number of people living in a town, and number of students who opt for statistics are examples of quantitative data.
Quantitative data may be either discrete or continuous. All quantitative data that take on only specific numerical...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

MicroRNA Expression Analysis and Biological Pathways in Chemoresistant Non-Small Cell Lung Cancer.

Cancers·2025

Same author

XPF interacts with TOP2B for R-loop processing and DNA looping on actively transcribed genes.

Science advances·2023

Same author

Learning biologically-interpretable latent representations for gene expression data: Pathway Activity Score Learning Algorithm.

Machine learning·2023

Same author

What Tweets and YouTube comments have in common? Sentiment and graph analysis on data related to US elections 2020.

PloS one·2023

Same author

A machine learning approach utilizing DNA methylation as an accurate classifier of COVID-19 disease severity.

Scientific reports·2022

Same author

Just Add Data: automated predictive modeling for knowledge discovery and feature selection.

NPJ precision oncology·2022

Same journal

Your Next State-of-the-Art Could Come from Another Domain: A Cross-Domain Analysis of Hierarchical Text Classification.

Machine learning·2026

Same journal

Linear Causal Discovery with Interventional Constraints.

Machine learning·2026

Same journal

Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models.

Machine learning·2025

Same journal

Mining exceptional social behavior on attributed interaction networks.

Machine learning·2025

Same journal

Persistent Laplacian-enhanced algorithm for scarcely labeled data classification.

Machine learning·2025

Same journal

Ensuring medical AI safety: interpretability-driven detection and mitigation of spurious model behavior and associated data.

Machine learning·2025

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 27, 2026

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

A greedy feature selection algorithm for Big Data of high dimensionality.

Ioannis Tsamardinos^1,2, Giorgos Borboudakis¹, Pavlos Katsogridakis^1,3

¹1Computer Science Department, University of Crete, Heraklion, Greece.

Machine Learning

|March 26, 2019

Summary

This summary is machine-generated.

We introduce the Parallel, Forward-Backward with Pruning (PFBP) algorithm for efficient feature selection in high-dimensional Big Data. PFBP achieves massive parallelization and scalability, outperforming existing methods.

Keywords:

Big Data Data analytics Feature selection Forward selection Variable selection

More Related Videos

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Related Experiment Videos

Last Updated: Jan 27, 2026

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Area of Science:

Machine Learning
Computational Statistics
Bioinformatics

Background:

High-dimensional Big Data presents significant challenges for traditional feature selection methods.
Scalability and computational efficiency are critical for analyzing large datasets.

Purpose of the Study:

To develop a massively parallel algorithm for feature selection (FS) in high-dimensional Big Data.
To introduce the Parallel, Forward-Backward with Pruning (PFBP) algorithm designed for efficiency and scalability.

Main Methods:

PFBP partitions data matrices by rows and columns, utilizing local computations and meta-analysis.
Employs p-values of conditional independence tests to minimize communication costs.
Incorporates heuristics like Early Dropping, Early Stopping, and Early Return for efficient decision-making.

Main Results:

PFBP demonstrates super-linear speedup with increasing sample size and linear scalability with features and cores.
Empirical analysis confirms its effectiveness and superior performance compared to other feature selection algorithms.
Asymptotic guarantees of optimality are provided for data representable by causal networks.

Conclusions:

PFBP is a highly scalable and efficient algorithm for feature selection in Big Data.
Its parallel architecture and heuristics enable effective analysis of high-dimensional datasets.
The presented heuristics offer potential improvements for other greedy feature selection algorithms.