Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Energy Bands in Solids01:01

Energy Bands in Solids

1.3K
Isolated atoms have discrete energy levels that are well described by the Bohr model. And, it quantifies the energy of an electron in a hydrogen atom as En. Higher quantum numbers 'n' yield less negative, closer electron energy levels.
 Band Formation:
When atoms are brought close together, as in a solid, these discrete energy levels begin to split due to the overlap of electron orbitals from adjacent atoms. This split occurs because of the Pauli exclusion principle, which states...
1.3K
Semiconductors01:22

Semiconductors

962
There is variation in the electrical conductivity of materials - metals, semiconductors, and insulators that are showcased with the help of the energy band diagrams.
Metals such as copper (Cu), zinc (Zn), or lead (Pb) have low resistivity and feature conduction bands that are either not fully occupied or overlap with the valence band, making a bandgap non-existent. This allows electrons in the highest energy levels of the valence band to easily transition to the conduction band upon gaining...
962
Band Theory02:35

Band Theory

15.7K
When two or more atoms come together to form a molecule, their atomic orbitals combine and molecular orbitals of distinct energies result. In a solid, there are a large number of atoms, and therefore a large number of atomic orbitals that may be combined into molecular orbitals. These groups of molecular orbitals are so closely placed together to form continuous regions of energies, known as the bands.
The energy difference between these bands is known as the band gap.
Conductor, Semiconductor,...
15.7K
Fermi Level01:18

Fermi Level

874
The Fermi-Dirac function is represented by an S-shaped curve indicating the probability of an energy state being occupied by an electron at a given temperature. The Fermi level is the energy level at which there is a fifty percent chance of finding an electron, and it is positioned between the lower-energy valence band and the higher-energy conduction band.
At absolute zero temperature, electrons fill all energy states up to the Fermi level, leaving upper states empty. As the temperature rises,...
874
Fermi Level Dynamics01:12

Fermi Level Dynamics

364
The vacuum level denotes the energy threshold required for an electron to escape from a material surface. It is usually positioned above the conduction band of a semiconductor and acts as a benchmark for comparing electron energies within various materials.
Electron affinity in semiconductors refers to the energy gap between the minimum of its conduction band and the vacuum level and it is a critical parameter in determining how easily a semiconductor can accept additional electrons.
The work...
364

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Band Gap Prediction of Two-Dimensional Materials Using a Gradient-Boosted Feature Selection Approach.

Journal of chemical information and modeling·2026
Same author

Machine-Learning Predictions of Photoluminescence in Molecules Exhibiting Thermally Activated Delayed Fluorescence with Implicit Experimental Validation.

Journal of chemical information and modeling·2026
Same author

Automatic Generation of a Mechanical Properties Question-Answering Data Set for Language Model Benchmarking: A Comparative Study of BERT, XLNet, and LLaMA Models.

Journal of chemical information and modeling·2026
Same author

A dataset of Curie and Néel temperatures auto-generated with ChemDataExtractor and the Snowball algorithm.

Scientific data·2025
Same author

Automated Determination of the Molecular Substructure from Nuclear Magnetic Resonance Spectra Using Neural Networks.

Journal of chemical information and modeling·2025
Same author

Autogenerating a Domain-Specific Question-Answering Data Set from a Thermoelectric Materials Database to Enable High-Performing BERT Models.

Journal of chemical information and modeling·2025
Same journal

Dataset of Optimized Structures of Aliphatic Chains Chemisorbed on Si(110) and Si(111) Surfaces via First-Principles Methods.

Scientific data·2026
Same journal

EURO-PROBE - Manual segmentations of the prostate and intraprostatic urethra on T2-weighted MRI.

Scientific data·2026
Same journal

Chromosome-Level Genome Assembly of Southern Africa Mozambique Tilapia (Oreochromis mossambicus) using PacBio HiFi and Omni-C sequencing.

Scientific data·2026
Same journal

Ovarian Stainology: Database of evidence-based immunohistochemical antigen expression in ovarian tumors.

Scientific data·2026
Same journal

A dataset of small protein conformational ensembles from all-atom molecular dynamics simulations.

Scientific data·2026
Same journal

A real-world Fitbit-derived dataset of activity, sleep, and heart rate with matched clinical factors in on-treatment lung cancer patients.

Scientific data·2026
See all related articles

Related Experiment Video

Updated: Sep 24, 2025

Probe Type II Band Alignment in One-Dimensional Van Der Waals Heterostructures Using First-Principles Calculations
13:56

Probe Type II Band Alignment in One-Dimensional Van Der Waals Heterostructures Using First-Principles Calculations

Published on: October 12, 2019

7.7K

Auto-generated database of semiconductor band gaps using ChemDataExtractor.

Qingyang Dong1, Jacqueline M Cole2,3,4

  • 1Cavendish Laboratory, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK.

Scientific Data
|May 3, 2022
PubMed
Summary
This summary is machine-generated.

This study created the largest open-source semiconductor band gap database, extracting over 100,000 records from scientific literature using advanced NLP and machine learning. This resource aids semiconductor research and materials discovery.

More Related Videos

Fabricating Nanogaps by Nanoskiving
07:36

Fabricating Nanogaps by Nanoskiving

Published on: May 13, 2013

11.2K
Online Size-exclusion and Ion-exchange Chromatography on a SAXS Beamline
11:09

Online Size-exclusion and Ion-exchange Chromatography on a SAXS Beamline

Published on: January 5, 2017

17.4K

Related Experiment Videos

Last Updated: Sep 24, 2025

Probe Type II Band Alignment in One-Dimensional Van Der Waals Heterostructures Using First-Principles Calculations
13:56

Probe Type II Band Alignment in One-Dimensional Van Der Waals Heterostructures Using First-Principles Calculations

Published on: October 12, 2019

7.7K
Fabricating Nanogaps by Nanoskiving
07:36

Fabricating Nanogaps by Nanoskiving

Published on: May 13, 2013

11.2K
Online Size-exclusion and Ion-exchange Chromatography on a SAXS Beamline
11:09

Online Size-exclusion and Ion-exchange Chromatography on a SAXS Beamline

Published on: January 5, 2017

17.4K

Area of Science:

  • Materials Science
  • Computational Chemistry
  • Data Science

Background:

  • Large-scale semiconductor band gap data is crucial for materials research and computational databases.
  • Existing curated databases are limited in scope and accessibility.
  • Automated extraction from scientific literature offers a scalable solution.

Purpose of the Study:

  • To develop the largest open-source, non-computational database of semiconductor band gap records.
  • To leverage Natural Language Processing (NLP) and machine learning for data extraction from scientific literature.
  • To provide a machine-readable dataset for data mining and semiconductor discovery.

Main Methods:

  • Utilized ChemDataExtractor version 2.0, a chemistry-aware software toolkit.
  • Employed extended Snowball algorithm with nested models and hyperparameter optimization.
  • Processed 128,776 journal articles to extract 100,236 semiconductor band gap records with temperature information.

Main Results:

  • Generated a database of 100,236 semiconductor band gap records.
  • Achieved a weighted precision of 84% and a weighted recall of 65% in data extraction.
  • Database is the largest open-source, non-computational band gap dataset to date.

Conclusions:

  • The auto-generated database significantly enhances resources for semiconductor materials research.
  • The NLP-driven approach provides a scalable and efficient method for scientific data curation.
  • Machine-readable formats (CSV, JSON, MongoDB) facilitate data mining and accelerate materials discovery.