Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Randomized Experiments

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...

DNA Microarrays

DNA Microarrays

Microarrays are high-throughput and relatively inexpensive assays that can be automated to analyze large quantities of data at a time. They are used in genome-wide studies to compare gene or protein expression under two varied conditions, such as healthy and diseased states. Microarrays consist of glass or silica slides on which probe molecules are covalently attached through surface functionalization. Most commonly, the slides are prepared through the chemisorption of silanes to silica...

Genetic Screens

Genetic Screens

Genetic screens are tools used to identify genes and mutations responsible for phenotypes of interest. Genetic screens help identify individuals or a group of people at risk of developing genetic diseases and help them with early intervention, targeted therapy, and reproductive options.
Forward genetic screens
Forward or “classical” genetic screens involve creating random mutations in an organism’s DNA using radiation, mutagens, or insertion of additional bases, which result in visible changes...

Random Sampling Method

Random Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest. Among the various sampling methods used by...

RACE - Rapid Amplification of cDNA Ends

RACE - Rapid Amplification of cDNA Ends

Rapid Amplification of cDNA Ends, or RACE, is one of the most effective methods to obtain a full-length cDNA from an mRNA sequence between a known internal region to the unknown sequence at the 5’ or 3’ end. The unknown region is cloned in the cDNA by a gene-specific primer that binds the known end, and a hybrid primer that attaches a predefined anchor sequence to the unknown end of the cDNA. The sequence in between is amplified by PCR with an anchor primer and a gene-specific primer.
Since the...

Combinatorial Gene Control

Combinatorial Gene Control

Combinatorial gene control is the synergistic action of several transcriptional factors to regulate the expression of a single gene. The absence of one or more of these factors may lead to a significant difference in the level of gene expression or repression.
The expression of more than 30,000 genes is controlled by approximately 2000-3000 transcription factors. This is possible because a single transcription factor can recognize more than one regulatory sequence. The specificity in gene...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Transforming multi-omics data into images for disease classification: A review of techniques and tools.

Journal of pathology informatics·2026

Same author

Multi-criteria decision making and its application to in silico discovery of vaccine candidates for Toxoplasma gondii.

Vaccine·2025

Same author

Identification of cancer risk groups through multi-omics integration using autoencoder and tensor analysis.

Scientific reports·2024

Same author

An Approach to Evaluate the Costs and Outputs of Academic Biobanks.

Biopreservation and biobanking·2024

Same author

DROSHA Regulates Mesenchymal Gene Expression in Wilms Tumor.

Molecular cancer research : MCR·2024

Same author

Understanding cancer patient cohorts in virtual reality environment for better clinical decisions: a usability study.

BMC medical informatics and decision making·2023

Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026

Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026

Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026

Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026

Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026

Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 8, 2026

Generating the Transcriptional Regulation View of Transcriptomic Features for Prediction Task and Dark Biomarker Detection on Small Datasets

Generating the Transcriptional Regulation View of Transcriptomic Features for Prediction Task and Dark Biomarker Detection on Small Datasets

Published on: March 1, 2024

A balanced iterative random forest for gene selection from microarray data.

Ali Anaissi¹, Paul J Kennedy, Madhu Goyal

¹Centre for Quantum Computation & Intelligent Systems (QCIS), Faculty of Engineering and Information Technology (FEIT), University of Technology, Sydney (UTS), Broadway New South Wales 2007, Australia. ali.anaissi@uts.edu.au.

BMC Bioinformatics

|August 29, 2013

Summary

This summary is machine-generated.

This study introduces the Balanced Iterative Random Forest (BIRF) algorithm for identifying disease biomarkers from imbalanced gene expression data. BIRF effectively selects informative genes, outperforming other methods, especially for complex datasets.

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research

Published on: August 16, 2017

Related Experiment Videos

Last Updated: May 8, 2026

Generating the Transcriptional Regulation View of Transcriptomic Features for Prediction Task and Dark Biomarker Detection on Small Datasets

Generating the Transcriptional Regulation View of Transcriptomic Features for Prediction Task and Dark Biomarker Detection on Small Datasets

Published on: March 1, 2024

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research

A Protocol for Using Gene Set Enrichment Analysis to Identify the Appropriate Animal Model for Translational Research

Published on: August 16, 2017

Area of Science:

Bioinformatics
Computational Biology
Genomics

Background:

High-throughput microarray technologies generate complex, high-dimensional gene expression datasets.
Imbalanced class distribution in biological data poses challenges for biomarker discovery.
Identifying informative genes is crucial for disease diagnosis and understanding.

Purpose of the Study:

Introduce the Balanced Iterative Random Forest (BIRF) algorithm.
Select relevant genes from imbalanced high-throughput gene expression microarray data.
Validate the selected genes as reliable biomarkers.

Main Methods:

Application of the BIRF algorithm on four cancer microarray datasets.
Comparison of BIRF performance against Support Vector Machine-Recursive Feature Elimination (SVM-RFE), Multi-class SVM-RFE (MSVM-RFE), Random Forest (RF), and Naive Bayes (NB).
Validation of selected informative biomarkers through repeated training experiments.

Main Results:

BIRF outperforms state-of-the-art methods, particularly on imbalanced datasets.
Achieved 7%-12% higher accuracy than MSVM-RFE on a childhood leukaemia dataset, improving prediction for the minor class.
64% of top genes consistently appeared across validation experiments, indicating robust biomarker selection.

Conclusions:

The BIRF algorithm is effective for gene selection from imbalanced high-throughput gene expression data.
BIRF demonstrates superior performance compared to existing methods, especially in handling class imbalance.
BIRF facilitates distinguishing truly predictive genes from those that appear predictive by chance.