Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Selected Data About Geographic Locations01:25

Selected Data About Geographic Locations

265
Geographic Information Systems (GIS) rely on two core types of data: spatial data and attribute data.Spatial DataSpatial data defines the physical location of features within a coordinate system, typically expressed in terms of latitude and longitude. It provides precise positioning for elements like roads, rivers, or buildings.Attribute DataAttribute data complements spatial data by adding descriptive information about these features. For example, a road's spatial data includes its start and...
265
Trial and Error and Algorithm01:12

Trial and Error and Algorithm

404
A problem-solving strategy is a plan of action used to find a solution. Different strategies have distinct action plans. Trial and error involves trying different solutions until one works. For instance, to fix a broken printer, you might check ink levels, ensure the paper tray isn't jammed, and verify the printer's connection to your laptop. This method can be time-consuming but is commonly used. Thomas Edison, for example, used trial and error to find a suitable filament for the light...
404
Antibiotic Selection00:57

Antibiotic Selection

59.6K
Overview
59.6K
What is Natural Selection?01:32

What is Natural Selection?

126.9K
Natural selection is an evolutionary process in which individuals with survival-promoting traits reproduce at higher rates. These favorable traits become more common within a population or species. Naturally selected traits initially arise via random genetic mutations. In order for selection to occur, there must be variation within a population, the trait controlling the variation must be heritable, and there must be an evolutionary advantage for variation in the trait.
126.9K
Dimensional Analysis03:40

Dimensional Analysis

61.2K
Dimensional analysis, also known as the factor label method, is a versatile approach for mathematical operations. The main principle behind this approach is: the units of quantities must be subjected to the same mathematical operations as their associated numbers. This method can be applied to computations ranging from simple unit conversions to more complex and multi-step calculations involving several different quantities and their units.
Conversion Factors and Dimensional Analysis
The unit...
61.2K
How Data are Classified: Numerical Data00:59

How Data are Classified: Numerical Data

37.2K
Data that are countable or measurable in specific units are called numerical or quantitative data. Quantitative data are always numbers. Quantitative data are the result of counting or measuring the attributes of a population. Amount of money, pulse rate, weight, number of people living in a town, and number of students who opt for statistics are examples of quantitative data.
Quantitative data may be either discrete or continuous. All quantitative data that take on only specific numerical...
37.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

MicroRNA Expression Analysis and Biological Pathways in Chemoresistant Non-Small Cell Lung Cancer.

Cancers·2025
Same author

XPF interacts with TOP2B for R-loop processing and DNA looping on actively transcribed genes.

Science advances·2023
Same author

Learning biologically-interpretable latent representations for gene expression data: Pathway Activity Score Learning Algorithm.

Machine learning·2023
Same author

What Tweets and YouTube comments have in common? Sentiment and graph analysis on data related to US elections 2020.

PloS one·2023
Same author

A machine learning approach utilizing DNA methylation as an accurate classifier of COVID-19 disease severity.

Scientific reports·2022
Same author

Just Add Data: automated predictive modeling for knowledge discovery and feature selection.

NPJ precision oncology·2022
Same journal

Your Next State-of-the-Art Could Come from Another Domain: A Cross-Domain Analysis of Hierarchical Text Classification.

Machine learning·2026
Same journal

Linear Causal Discovery with Interventional Constraints.

Machine learning·2026
Same journal

Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models.

Machine learning·2025
Same journal

Mining exceptional social behavior on attributed interaction networks.

Machine learning·2025
Same journal

Persistent Laplacian-enhanced algorithm for scarcely labeled data classification.

Machine learning·2025
Same journal

Ensuring medical AI safety: interpretability-driven detection and mitigation of spurious model behavior and associated data.

Machine learning·2025
See all related articles

Related Experiment Video

Updated: Jan 27, 2026

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data
05:12

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

11.9K

A greedy feature selection algorithm for Big Data of high dimensionality.

Ioannis Tsamardinos1,2, Giorgos Borboudakis1, Pavlos Katsogridakis1,3

  • 11Computer Science Department, University of Crete, Heraklion, Greece.

Machine Learning
|March 26, 2019
PubMed
Summary
This summary is machine-generated.

We introduce the Parallel, Forward-Backward with Pruning (PFBP) algorithm for efficient feature selection in high-dimensional Big Data. PFBP achieves massive parallelization and scalability, outperforming existing methods.

Keywords:
Big DataData analyticsFeature selectionForward selectionVariable selection

More Related Videos

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data
04:57

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

17.4K
Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model
07:15

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

7.4K

Related Experiment Videos

Last Updated: Jan 27, 2026

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data
05:12

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

11.9K
Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data
04:57

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

17.4K
Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model
07:15

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

7.4K

Area of Science:

  • Machine Learning
  • Computational Statistics
  • Bioinformatics

Background:

  • High-dimensional Big Data presents significant challenges for traditional feature selection methods.
  • Scalability and computational efficiency are critical for analyzing large datasets.

Purpose of the Study:

  • To develop a massively parallel algorithm for feature selection (FS) in high-dimensional Big Data.
  • To introduce the Parallel, Forward-Backward with Pruning (PFBP) algorithm designed for efficiency and scalability.

Main Methods:

  • PFBP partitions data matrices by rows and columns, utilizing local computations and meta-analysis.
  • Employs p-values of conditional independence tests to minimize communication costs.
  • Incorporates heuristics like Early Dropping, Early Stopping, and Early Return for efficient decision-making.

Main Results:

  • PFBP demonstrates super-linear speedup with increasing sample size and linear scalability with features and cores.
  • Empirical analysis confirms its effectiveness and superior performance compared to other feature selection algorithms.
  • Asymptotic guarantees of optimality are provided for data representable by causal networks.

Conclusions:

  • PFBP is a highly scalable and efficient algorithm for feature selection in Big Data.
  • Its parallel architecture and heuristics enable effective analysis of high-dimensional datasets.
  • The presented heuristics offer potential improvements for other greedy feature selection algorithms.