Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Statistical Software for Data Analysis and Clinical Trials

Statistical Software for Data Analysis and Clinical Trials

Statistical software is pivotal in data analysis and clinical trials by providing tools to analyze data, draw conclusions, and make predictions. These software packages range from simple data management applications to complex analytical platforms, supporting various statistical tests, models, and simulation techniques. Their significance lies in their ability to handle vast amounts of data with precision and efficiency, enabling researchers to validate hypotheses, identify trends, and make...

Aggregates Classification

Aggregates Classification

Aggregate classification is generally based on its size, petrographic characteristics, weight, and source. Size classification ranges from coarse to fine aggregates, defined by the size of the particles. Coarse aggregates are particles that do not pass through ASTM sieve No. 4, and aggregates that pass through the sieve are fine aggregates.
Petrographic classification groups aggregates based on common mineralogical characteristics. Some of the common mineral groups found in aggregates are...

Classification of Systems-I

Classification of Systems-I

Linearity is a system property characterized by a direct input-output relationship, combining homogeneity and additivity.
Homogeneity dictates that if an input x(t) is multiplied by a constant c, the output y(t) is multiplied by the same constant. Mathematically, this is expressed as:

Biostatistics: Overview

Biostatistics: Overview

Biostatistics plays a crucial role in understanding and analyzing data in healthcare and biology. Biostatisticians conduct experiments, gather evidence, and draw meaningful conclusions using statistical methods and techniques. Different variables form the foundation of biostatistical analysis, allowing researchers to understand and interpret data effectively. These variables are classified into different types, each serving a specific purpose in statistical analysis.
Discrete variables are...

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Improving Translational Accuracy

Improving Translational Accuracy

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

A review of machine learning applications in heart health.

Biomedical engineering online·2025

Same author

Using machine learning to identify patient characteristics to predict mortality of in-patients with COVID-19 in south Florida.

Frontiers in digital health·2023

Same author

Data-Centric AI for Healthcare Fraud Detection.

SN computer science·2023

Same author

Text Data Augmentation for Deep Learning.

Journal of big data·2021

Same author

Deep Learning applications for COVID-19.

Journal of big data·2021

Same author

Utility of MemTrax and Machine Learning Modeling in Classification of Mild Cognitive Impairment.

Journal of Alzheimer's disease : JAD·2020

Same journal

CardiaTics: An explainable AI integrated heart disease diagnosis model with feature engineering and stacked ensemble approach.

Journal of big data·2026

Same journal

Comprehensive representation of health-related phenotypes in one million dogs using topic modelling of electronic health records.

Journal of big data·2026

Same journal

UniqueNOSD: a novel framework for NoSQL over SQL databases.

Journal of big data·2025

Same journal

<i>F</i>u<i>n</i>Da: scalable serverless data analytics and in situ query processing.

Journal of big data·2025

Same journal

Integrating Big Data, Artificial Intelligence, and motion analysis for emerging precision medicine applications in Parkinson's Disease.

Journal of big data·2024

Same journal

Interpolation-split: a data-centric deep learning approach with big interpolated data to boost airway segmentation performance.

Journal of big data·2024

See all related articles

Search research articles

Related Experiment Video

Updated: Dec 1, 2025

A User-friendly and Powerful R Analysis of Large-scale Datasets

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

CatBoost for big data: an interdisciplinary review.

John T Hancock¹, Taghi M Khoshgoftaar¹

¹Florida Atlantic University, 777 Glades Road, Boca Raton, FL USA.

Journal of Big Data

|November 10, 2020

Summary

This summary is machine-generated.

This review surveys recent research on CatBoost, a Gradient Boosted Decision Trees (GBDT) algorithm, highlighting its strengths and weaknesses for Big Data machine learning tasks. It emphasizes best practices and hyper-parameter tuning for effective application.

Keywords:

Big data CatBoost Categorical variable encoding Decision tree Ensemble methods Machine learning

More Related Videos

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance

Published on: December 13, 2024

Related Experiment Videos

Last Updated: Dec 1, 2025

A User-friendly and Powerful R Analysis of Large-scale Datasets

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance

Published on: December 13, 2024

Area of Science:

Machine Learning
Data Science
Computer Science

Background:

Gradient Boosted Decision Trees (GBDTs) are essential for Big Data classification and regression.
Understanding GBDT implementations like CatBoost is crucial for effective research.
CatBoost, a GBDT technique, has gained traction in Big Data machine learning since late 2018.

Purpose of the Study:

To review recent interdisciplinary research on CatBoost in Big Data.
To identify best practices and limitations of CatBoost from diverse studies.
To provide researchers with a comprehensive understanding for proper application of CatBoost.

Main Methods:

Systematic literature review of studies involving CatBoost.
Analysis of research across multiple disciplines focusing on classification and regression tasks.
Examination of CatBoost's performance, including its suitability for categorical data.

Main Results:

CatBoost demonstrates effectiveness in various Big Data classification and regression tasks.
The algorithm is well-suited for heterogeneous and categorical data.
Sensitivity to hyper-parameters and the importance of tuning are recurring themes.

Conclusions:

CatBoost is a valuable GBDT tool for Big Data, particularly with categorical features.
Awareness of its strengths, weaknesses, and hyper-parameter sensitivity is key for optimal use.
This survey offers an interdisciplinary perspective, consolidating knowledge on CatBoost applications.