Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Data: Types and Distribution01:19

Data: Types and Distribution

704
In biostatistics, data are the observations collected for analysis. There are two main types: parametric and non-parametric. Parametric data, which include continuous (e.g., weight) and discrete numerical data (e.g., number of tablets), assume a particular distribution pattern, often the normal distribution. Non-parametric data do not adhere to a specific distribution and typically comprise nominal (e.g., gender) and ordinal categorical data (e.g., pain scale ratings).
Distributions in...
704
How Data are Classified: Categorical Data01:11

How Data are Classified: Categorical Data

31.9K
A variable, usually notated by capital letters such as X and Y, is a characteristic or measurement that can be determined for each member of a population. Data are the actual values of variables. They may be numbers, or they may be words. Datum is a single value.
Data are classified based on whether they are measurable or not. Categorical data cannot be measured; instead, it can be divided into categories. For example, if Y denotes a person's party affiliation, some examples of Y include...
31.9K
Review and Preview01:13

Review and Preview

8.9K
Data are individual items of information obtained from a population or sample. Data may be classified as qualitative (categorical), quantitative continuous, or quantitative discrete. Because it is not practical to measure the entire population in a study, researchers use samples to represent the population. A random sample is a representative group from the population chosen by using a method that gives each individual in the population an equal chance of being included in the sample. Random...
8.9K
Dimensional Analysis01:23

Dimensional Analysis

853
Dimensional analysis is a powerful tool that is used in physics and engineering to understand and predict the behavior of physical systems. The basic idea behind dimensional analysis is to express physical quantities in terms of fundamental dimensions such as the mass, length, and time. Derived dimensions like the velocity, acceleration, and force are derived from the combinations of these fundamental dimensions.
Dimensional analysis allows us to analyze and compare physical quantities on a...
853
What is Central Tendency?01:14

What is Central Tendency?

14.1K
Descriptive statistics describe or summarize relevant characteristics of a sample and aid in the analysis of data of interest. When analyzing large quantities of data and developing an inference, one needs to identify a value representative of the entire data set. Characteristics such as central tendency, extreme values, range of measurements, or the most repeated value can help better understand the data.
The central tendency is the most conventionally used data characteristic. It is a...
14.1K
How Data are Classified: Numerical Data00:59

How Data are Classified: Numerical Data

27.9K
Data that are countable or measurable in specific units are called numerical or quantitative data. Quantitative data are always numbers. Quantitative data are the result of counting or measuring the attributes of a population. Amount of money, pulse rate, weight, number of people living in a town, and number of students who opt for statistics are examples of quantitative data.
Quantitative data may be either discrete or continuous. All quantitative data that take on only specific numerical...
27.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Impact of electron-phonon interaction on the electronic structure of interfaces between organic molecules and a MoS2 monolayer.

The Journal of chemical physics·2026
Same author

Hybrid Frenkel-Wannier excitons facilitate ultrafast energy transfer at a 2D-organic interface.

Nature physics·2025
Same author

Towards an interoperable perovskite description or how to keep track of 300 perovskite ions.

Nature communications·2025
Same author

Band-Gap Regression with Architecture-Optimized Message-Passing Neural Networks.

Chemistry of materials : a publication of the American Chemical Society·2025
Same author

Decoupling many-body interactions in the CeO<sub>2</sub>(111) oxygen vacancy structure with statistical learning and cluster expansion.

Nanoscale·2025
Same author

Discovering synthesis targets: general discussion.

Faraday discussions·2024
Same journal

Ambient stability and surface adhesion of 2D polyaramid nanofilms.

Faraday discussions·2026
Same journal

Spiers Memorial Lecture: Spin-mediated promotion of magnetic metal catalysts.

Faraday discussions·2026
Same journal

Helium spin-echo as a surface-sensitive probe of vibrational energy dissipation.

Faraday discussions·2026
Same journal

Near-infrared vibrational second harmonic generation: a new nonlinear interfacial vibrational spectroscopy.

Faraday discussions·2026
Same journal

CO on a Rh/Fe<sub>3</sub>O<sub>4</sub> single-atom catalyst: high-resolution infrared spectroscopy and near-ambient-pressure scanning tunnelling microscopy.

Faraday discussions·2026
Same journal

Evolution of size-selected Pt cluster catalysts on prototypical oxide supports.

Faraday discussions·2026
See all related articles

Related Experiment Video

Updated: Jun 12, 2025

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering
09:43

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

6.2K

How big is big data?

Daniel Speckhard1,2, Tim Bechtel1,2, Luca M Ghiringhelli3

  • 1Physics Department and CSMB, Humboldt-Universität zu Berlin, Zum Großen Windkanal 2, 12489 Berlin, Germany. claudia.draxl@physik.hu-berlin.de.

Faraday Discussions
|September 24, 2024
PubMed
Summary
This summary is machine-generated.

Big data in materials science machine learning presents challenges beyond volume, including data quality, veracity, and infrastructure. Addressing these is crucial for advancing predictive modeling in the field.

More Related Videos

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications
09:20

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

8.7K
Measuring the Structure, Composition, and Change of Underwater Environments with Large-area Imaging
09:19

Measuring the Structure, Composition, and Change of Underwater Environments with Large-area Imaging

Published on: April 18, 2025

369

Related Experiment Videos

Last Updated: Jun 12, 2025

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering
09:43

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

6.2K
Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications
09:20

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

8.7K
Measuring the Structure, Composition, and Change of Underwater Environments with Large-area Imaging
09:19

Measuring the Structure, Composition, and Change of Underwater Environments with Large-area Imaging

Published on: April 18, 2025

369

Area of Science:

  • Materials Science
  • Computer Science
  • Data Science

Background:

  • Machine learning models are increasingly used for predictive tasks in materials science.
  • The definition and implications of 'big data' in this domain require careful examination.

Purpose of the Study:

  • To define 'big data' in the context of materials science machine learning.
  • To investigate challenges related to data volume, quality, veracity, and infrastructure.
  • To explore model generalization, data aggregation, feature engineering, and computational requirements.

Main Methods:

  • Analysis of typical materials science machine learning problems.
  • Evaluation of model generalization across datasets.
  • Case studies on gathering high-quality data from heterogeneous sources.
  • Assessment of feature set and model complexity impact on expressivity.
  • Examination of infrastructure needs for large-scale data and model training.

Main Results:

  • 'Big data' in materials science involves complex interplay of data volume, quality, and veracity.
  • Model generalization varies significantly with dataset characteristics.
  • Effective aggregation of heterogeneous data sources is challenging but feasible.
  • Feature engineering and model complexity are critical for predictive accuracy.
  • Substantial infrastructure is required for managing and training on large materials datasets.

Conclusions:

  • Big data in materials science machine learning poses multifaceted challenges.
  • Further research is needed to address data quality, infrastructure, and model development.
  • Optimizing data handling and model training is essential for unlocking predictive potential.