Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Classification of Illness01:17

Classification of Illness

8.0K
The meaning of illness is individualized to each person who experiences an alteration in health. In contrast, disease is a medical term indicating a pathological change in the structure and function of the body or mind. It is a condition that has specific symptoms and boundaries.
An illness is a response to a disease in which the person's level of functioning is changed compared with a previous level. The general classification of illness includes acute and chronic.
Acute illness is severe...
8.0K
Classification of Systems-I01:26

Classification of Systems-I

334
Linearity is a system property characterized by a direct input-output relationship, combining homogeneity and additivity.
Homogeneity dictates that if an input x(t) is multiplied by a constant c, the output y(t) is multiplied by the same constant. Mathematically, this is expressed as:
334
Classification of Systems-II01:31

Classification of Systems-II

245
Continuous-time systems have continuous input and output signals, with time measured continuously. These systems are generally defined by differential or algebraic equations. For instance, in an RC circuit, the relationship between input and output voltage is expressed through a differential equation derived from Ohm's law and the capacitor relation,
245
Classification of Leukocytes01:30

Classification of Leukocytes

2.9K
Leukocytes are classified into two groups based on the presence or absence of cytoplasmic granules. Granular leukocytes, which contain granules, belong to the myeloid lineage and are divided into three subtypes: neutrophils, eosinophils, and basophils. These cells are roughly spherical and characterized by the granules in their cytoplasm.
Neutrophils are the most abundant type of granular leukocytes, comprising 50-70% of all leukocytes. They feature small, evenly distributed granules and a...
2.9K
Aggregates Classification01:29

Aggregates Classification

395
Aggregate classification is generally based on its size, petrographic characteristics, weight, and source. Size classification ranges from coarse to fine aggregates, defined by the size of the particles. Coarse aggregates are particles that do not pass through ASTM sieve No. 4, and aggregates that pass through the sieve are fine aggregates.
Petrographic classification groups aggregates based on common mineralogical characteristics. Some of the common mineral groups found in aggregates are...
395

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Optical coherence tomography-derived macrophage arc as a novel biomarker for predicting adverse cardiovascular events in coronary artery disease: a multicentre study.

European heart journal. Imaging methods and practice·2026
Same author

Low-substrate nitrogen drives functional succession toward a cooperative Candidatus Brocadia consortium in anammox systems.

Bioresource technology·2026
Same author

SAS-bench: A fine-grained benchmark for evaluating short answer scoring with large language models.

Neural networks : the official journal of the International Neural Network Society·2026
Same author

Integrated metagenomics unravels the microbial mechanisms driving greenhouse gas and odor emissions during composting.

Bioresource technology·2026
Same author

Phytohormone-based metabolic regulation: From endogenous secretion discovery to Indole-3-butyric acid strategies for enhancing low-temperature denitrification.

Bioresource technology·2026
Same author

Work function-regulated two-dimensional porous C<sub>7</sub>N<sub>6</sub>-based single-atom catalysts for the hydrogen evolution reaction.

Physical chemistry chemical physics : PCCP·2026
Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026
Same journal

KASSPer: Kinase Active Site Structure Prediction using Protein and Ligand Language Models and Its Application to Virtual Screening.

Bioinformatics (Oxford, England)·2026
Same journal

IDR searcher: a search engine solution for public image resources.

Bioinformatics (Oxford, England)·2026
Same journal

KCFtools: Rapid alignment-free method for introgression screening and GWAS using k-mer profiles.

Bioinformatics (Oxford, England)·2026
Same journal

Meta2DB: Curated shotgun metagenomic feature sets and metadata for health state prediction.

Bioinformatics (Oxford, England)·2026
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: Sep 22, 2025

Constructing and Visualizing Models using Mime-based Machine-learning Framework
06:19

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

829

AutoDC: an automatic machine learning framework for disease classification.

Yang Bai1, Yang Li1, Yu Shen1

  • 1Key Laboratory of High Confidence Software Technologies (MOE), School of CS, Peking University, Beijing, China.

Bioinformatics (Oxford, England)
|May 18, 2022
PubMed
Summary
This summary is machine-generated.

This article introduces AutoDC, a new automated machine learning system designed to improve how researchers classify diseases using complex gene expression data. By simultaneously optimizing feature selection and model parameters, it overcomes limitations in existing tools that often struggle with redundant information. Testing on public datasets demonstrates that this approach achieves higher predictive accuracy than current standard methods.

Keywords:
machine learninggenomicsfeature selectionpredictive modeling

Frequently Asked Questions

More Related Videos

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images
08:20

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images

Published on: October 27, 2023

1.7K
A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data
09:34

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

Published on: September 25, 2021

4.1K

Related Experiment Videos

Last Updated: Sep 22, 2025

Constructing and Visualizing Models using Mime-based Machine-learning Framework
06:19

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

829
Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images
08:20

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images

Published on: October 27, 2023

1.7K
A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data
09:34

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

Published on: September 25, 2021

4.1K

Area of Science:

  • Bioinformatics and computational biology
  • AutoDC disease classification within clinical genomics

Background:

Next-generation sequencing technologies provide vast amounts of molecular data for understanding human health. Researchers frequently utilize automated machine learning platforms to interpret these complex genomic and epigenomic datasets. Existing computational pipelines often struggle to manage high-dimensional information effectively. Many current systems fail to remove unnecessary variables from raw input files. These platforms typically perform feature engineering before adjusting algorithm parameters, which often results in less accurate predictions. This sequence of operations limits the overall effectiveness of standard analytical tools. No prior work had resolved the challenge of integrating these distinct optimization steps into a single, cohesive process. That uncertainty drove the development of a more robust framework for analyzing omics data.

Purpose Of The Study:

The primary aim of this study is to introduce AutoDC, an automated machine learning framework designed for disease classification. Researchers identified a significant gap in existing tools regarding the management of high-dimensional omics data. Current frameworks often fail to remove redundant features, which compromises the quality of the final classification results. Furthermore, the standard practice of performing feature engineering before hyper-parameter tuning often leads to sub-optimal performance. This project seeks to overcome these limitations by proposing a more efficient, integrated approach. The authors designed two novel optimization strategies to enhance the predictive capabilities of their system. They specifically focused on improving the accuracy of disease diagnosis using gene expression datasets. This work addresses the urgent need for more sophisticated analytical tools in the field of clinical genomics.

Main Methods:

The research team developed a tailored automated machine learning framework for processing complex gene expression profiles. Their review approach involved evaluating the system against three existing state-of-the-art platforms. They utilized two public datasets to validate the performance of their proposed methodology. The design incorporates a two-stage feature selection process to isolate highly informative variables. A two-layer multi-armed bandit strategy serves as the primary optimization engine for the entire pipeline. This approach enables the simultaneous adjustment of feature engineering and algorithm selection. The investigators also performed hyper-parameter tuning within this unified optimization structure. They ensured reproducibility by making all source code and data publicly accessible via an online repository.

Main Results:

The proposed framework consistently achieves higher predictive accuracy compared to three state-of-the-art automated machine learning systems. Key findings from the literature indicate that the two-stage feature selection method effectively isolates genes with high contribution scores. By integrating feature engineering and algorithm tuning, the system avoids the sub-optimal outcomes observed in sequential pipelines. The two-layer multi-armed bandit strategy successfully coordinates these complex optimization tasks. Testing on two public gene expression datasets confirmed the superior performance of this new approach. The framework demonstrates a robust ability to filter out redundant features from high-dimensional data. These results highlight the efficiency of the joint optimization strategy in clinical classification tasks. The system provides a significant improvement over existing tools that rely on rigid, staged workflows.

Conclusions:

The authors propose that their integrated optimization strategy enhances disease classification performance. This system addresses the limitations of sequential pipelines by performing joint tuning of all model components. The two-stage feature selection process successfully identifies variables with high biological relevance. Using a multi-armed bandit approach allows for simultaneous refinement of feature engineering and algorithm settings. Empirical testing shows that this method outperforms three established automated frameworks in predictive accuracy. These results suggest that joint optimization is superior to traditional staged approaches for high-dimensional data. The researchers conclude that their framework provides a more effective tool for analyzing gene expression profiles. This study demonstrates the potential for automated systems to improve diagnostic accuracy in clinical genomics.

The researchers propose a two-layer Multi-Armed Bandit framework. This mechanism allows the system to simultaneously refine feature engineering, select appropriate algorithms, and adjust hyper-parameters, avoiding the sub-optimal results common in traditional staged pipelines.

AutoDC utilizes a two-stage feature selection method. This component specifically targets the identification of genes with high contribution scores, effectively filtering out redundant information from high-dimensional datasets before the classification process begins.

A two-layer structure is necessary to manage the complexity of joint optimization. According to the authors, this architecture allows the framework to balance feature engineering and algorithm tuning simultaneously, rather than treating them as independent, sequential tasks.

The framework processes gene expression data. This input type is critical for the system, as the two-stage selection method relies on calculating gene contribution scores to prioritize relevant biological signals over noise.

The researchers measure predictive accuracy. They compared their system against three state-of-the-art AutoML frameworks using two public datasets, finding that their approach consistently achieved higher classification performance than the alternative methods.

The authors claim that joint optimization of all pipeline stages is superior to traditional methods. They suggest that decoupling feature engineering from hyper-parameter tuning often leads to sub-optimal outcomes in high-dimensional omics analysis.