Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a survival tree begins...

Statistical Analysis: Overview

Statistical Analysis: Overview

When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...

Statistical Software for Data Analysis and Clinical Trials

Statistical Software for Data Analysis and Clinical Trials

Statistical software is pivotal in data analysis and clinical trials by providing tools to analyze data, draw conclusions, and make predictions. These software packages range from simple data management applications to complex analytical platforms, supporting various statistical tests, models, and simulation techniques. Their significance lies in their ability to handle vast amounts of data with precision and efficiency, enabling researchers to validate hypotheses, identify trends, and make...

Statgraphics

Statgraphics

Statgraphics is a comprehensive statistical software suite designed for both basic and advanced data analysis. Originating in 1980 at Princeton University under Dr. Neil W. Polhemus, it was one of the pioneering tools for statistical computing on personal computers, with its public release in 1982 marking an early milestone in data science software. Over the years, it has evolved into a robust platform for data science, offering tools for regression analysis, ANOVA, multivariate statistics,...

Introduction to Statistics

Introduction to Statistics

The science of statistics involves collecting, analyzing, interpreting, and presenting data. The method of collecting, organizing, and summarizing data is called descriptive statistics. The systematic method of drawing inferences from the sample data and predicting unknown characteristics of a population is called inferential statistics.
In statistics, the collection of individuals or objects under study is called population. The idea of sampling is to select a portion of the larger population...

Statistical Analysis System (SAS)

Statistical Analysis System (SAS)

SAS, short for Statistical Analysis System, is a powerful data analysis, management, and visualization tool. Developed by the SAS Institute in the early 1970s, SAS has evolved into a comprehensive software suite used across various industries for statistical analysis, business intelligence, and predictive modeling.
Applications: SAS finds applications in numerous fields, including healthcare for clinical trial analysis, finance for risk assessment, marketing for customer data analysis, and...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Automated Behavior Analysis in the Novel Object Recognition Test.

Neurocomputing·2026

Same author

Toward a Comprehensive Pea Aphid Saliva-Proteomewith Insights from Transcripts from the Whitefly <i>Bemisia tabaci</i>.

Biochemistry & molecular biology journal·2026

Same author

Rodent Social Behavior Recognition Using a Global Context-Aware Vision Transformer Network.

AI (Basel, Switzerland)·2026

Same author

Deep learning for sorghum yield forecasting using uncrewed aerial systems and lab-derived imagery.

Plant phenomics (Washington, D.C.)·2026

Same author

AutoPK: Leveraging LLMs and a Hybrid Similarity Metric for Advanced Retrieval of Pharmacokinetic Data from Complex Tables and Documents.

International Conference on Tools with Artificial Intelligence : [proceedings]. International Conference on Tools for Artificial Intelligence·2026

Same author

Semi-Supervised Relation Extraction Informed by Area Under the Margin Ranking and Large Language Models.

Proceedings of the ... International Conference on Data Science and Advanced Analytics. IEEE International Conference on Data Science and Advanced Analytics·2026

See all related articles

Search research articles

Related Experiment Videos

A Framework for Learning from Distributed Data Using Sufficient Statistics and its Application to Learning Decision

Doina Caragea¹, Adrian Silvescu, Vasant Honavar

¹Artificial Intelligence Research Laboratory, Computer Science Department, Iowa State University, 226 Atanasoff Hall, Ames, IA 50011-1040, USA, dcaragea@cs.iastate.edu , silvescu@cs.iastate.edu , honavar@cs.iastate.edu.

International Journal of Hybrid Intelligent Systems

|September 28, 2011

Summary

This summary is machine-generated.

This study introduces methods for machine learning from distributed data, creating identical decision trees to centralized methods. These algorithms offer improved time and communication efficiency for distributed machine learning tasks.

Related Experiment Videos

Area of Science:

Machine Learning
Distributed Systems
Data Science

Background:

Traditional machine learning often relies on centralized data, which can be inefficient or impractical for large datasets.
Distributed data presents unique challenges for algorithm design and analysis.
Existing methods may not scale effectively or maintain data integrity in distributed environments.

Purpose of the Study:

To formulate the problem of learning from distributed data.
To develop a general strategy for adapting traditional machine learning algorithms to distributed settings.
To create exact algorithms for decision tree induction from distributed data.

Main Methods:

A general strategy for transforming centralized machine learning algorithms into distributed versions.
Application of this strategy to decision tree induction.
Analysis of time and communication complexity in distributed versus centralized settings.

Main Results:

Provably exact algorithms for decision tree induction from distributed data, yielding identical results to centralized approaches.
Identification of conditions where distributed algorithms outperform centralized ones in efficiency.
Demonstration of superior time and communication complexity for the proposed distributed algorithms.

Conclusions:

The proposed strategy effectively enables exact machine learning from distributed data.
The developed algorithms offer significant efficiency gains over centralized methods in distributed settings.
Extensions for heterogeneous data and privacy-preserving learning are feasible.