Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Data: Types and Distribution

Data: Types and Distribution

In biostatistics, data are the observations collected for analysis. There are two main types: parametric and non-parametric. Parametric data, which include continuous (e.g., weight) and discrete numerical data (e.g., number of tablets), assume a particular distribution pattern, often the normal distribution. Non-parametric data do not adhere to a specific distribution and typically comprise nominal (e.g., gender) and ordinal categorical data (e.g., pain scale ratings).
Distributions in...

How Data are Classified: Categorical Data

How Data are Classified: Categorical Data

A variable, usually notated by capital letters such as X and Y, is a characteristic or measurement that can be determined for each member of a population. Data are the actual values of variables. They may be numbers, or they may be words. Datum is a single value.
Data are classified based on whether they are measurable or not. Categorical data cannot be measured; instead, it can be divided into categories. For example, if Y denotes a person's party affiliation, some examples of Y include...

Review and Preview

Review and Preview

Data are individual items of information obtained from a population or sample. Data may be classified as qualitative (categorical), quantitative continuous, or quantitative discrete. Because it is not practical to measure the entire population in a study, researchers use samples to represent the population. A random sample is a representative group from the population chosen by using a method that gives each individual in the population an equal chance of being included in the sample. Random...

Dimensional Analysis

Dimensional Analysis

Dimensional analysis is a powerful tool that is used in physics and engineering to understand and predict the behavior of physical systems. The basic idea behind dimensional analysis is to express physical quantities in terms of fundamental dimensions such as the mass, length, and time. Derived dimensions like the velocity, acceleration, and force are derived from the combinations of these fundamental dimensions.
Dimensional analysis allows us to analyze and compare physical quantities on a...

What is Central Tendency?

What is Central Tendency?

Descriptive statistics describe or summarize relevant characteristics of a sample and aid in the analysis of data of interest. When analyzing large quantities of data and developing an inference, one needs to identify a value representative of the entire data set. Characteristics such as central tendency, extreme values, range of measurements, or the most repeated value can help better understand the data.
The central tendency is the most conventionally used data characteristic. It is a...

How Data are Classified: Numerical Data

How Data are Classified: Numerical Data

Data that are countable or measurable in specific units are called numerical or quantitative data. Quantitative data are always numbers. Quantitative data are the result of counting or measuring the attributes of a population. Amount of money, pulse rate, weight, number of people living in a town, and number of students who opt for statistics are examples of quantitative data.
Quantitative data may be either discrete or continuous. All quantitative data that take on only specific numerical...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Impact of electron-phonon interaction on the electronic structure of interfaces between organic molecules and a MoS2 monolayer.

The Journal of chemical physics·2026

Same author

Hybrid Frenkel-Wannier excitons facilitate ultrafast energy transfer at a 2D-organic interface.

Nature physics·2025

Same author

Towards an interoperable perovskite description or how to keep track of 300 perovskite ions.

Nature communications·2025

Same author

Band-Gap Regression with Architecture-Optimized Message-Passing Neural Networks.

Chemistry of materials : a publication of the American Chemical Society·2025

Same author

Decoupling many-body interactions in the CeO<sub>2</sub>(111) oxygen vacancy structure with statistical learning and cluster expansion.

Nanoscale·2025

Same author

Discovering synthesis targets: general discussion.

Faraday discussions·2024

Same journal

Ambient stability and surface adhesion of 2D polyaramid nanofilms.

Faraday discussions·2026

Same journal

Spiers Memorial Lecture: Spin-mediated promotion of magnetic metal catalysts.

Faraday discussions·2026

Same journal

Helium spin-echo as a surface-sensitive probe of vibrational energy dissipation.

Faraday discussions·2026

Same journal

Near-infrared vibrational second harmonic generation: a new nonlinear interfacial vibrational spectroscopy.

Faraday discussions·2026

Same journal

CO on a Rh/Fe<sub>3</sub>O<sub>4</sub> single-atom catalyst: high-resolution infrared spectroscopy and near-ambient-pressure scanning tunnelling microscopy.

Faraday discussions·2026

Same journal

Evolution of size-selected Pt cluster catalysts on prototypical oxide supports.

Faraday discussions·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 12, 2025

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

How big is big data?

Daniel Speckhard^1,2, Tim Bechtel^1,2, Luca M Ghiringhelli³

¹Physics Department and CSMB, Humboldt-Universität zu Berlin, Zum Großen Windkanal 2, 12489 Berlin, Germany. claudia.draxl@physik.hu-berlin.de.

Faraday Discussions

|September 24, 2024

Summary

This summary is machine-generated.

Big data in materials science machine learning presents challenges beyond volume, including data quality, veracity, and infrastructure. Addressing these is crucial for advancing predictive modeling in the field.

More Related Videos

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

Measuring the Structure, Composition, and Change of Underwater Environments with Large-area Imaging

Measuring the Structure, Composition, and Change of Underwater Environments with Large-area Imaging

Published on: April 18, 2025

Related Experiment Videos

Last Updated: Jun 12, 2025

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

Measuring the Structure, Composition, and Change of Underwater Environments with Large-area Imaging

Measuring the Structure, Composition, and Change of Underwater Environments with Large-area Imaging

Published on: April 18, 2025

Area of Science:

Materials Science
Computer Science
Data Science

Background:

Machine learning models are increasingly used for predictive tasks in materials science.
The definition and implications of 'big data' in this domain require careful examination.

Purpose of the Study:

To define 'big data' in the context of materials science machine learning.
To investigate challenges related to data volume, quality, veracity, and infrastructure.
To explore model generalization, data aggregation, feature engineering, and computational requirements.

Main Methods:

Analysis of typical materials science machine learning problems.
Evaluation of model generalization across datasets.
Case studies on gathering high-quality data from heterogeneous sources.
Assessment of feature set and model complexity impact on expressivity.
Examination of infrastructure needs for large-scale data and model training.

Main Results:

'Big data' in materials science involves complex interplay of data volume, quality, and veracity.
Model generalization varies significantly with dataset characteristics.
Effective aggregation of heterogeneous data sources is challenging but feasible.
Feature engineering and model complexity are critical for predictive accuracy.
Substantial infrastructure is required for managing and training on large materials datasets.

Conclusions:

Big data in materials science machine learning poses multifaceted challenges.
Further research is needed to address data quality, infrastructure, and model development.
Optimizing data handling and model training is essential for unlocking predictive potential.