Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Parallel Processing01:20

Parallel Processing

543
The brain processes sensory information rapidly due to parallel processing, which involves sending data across multiple neural pathways at the same time. This method allows the brain to manage various sensory qualities, such as shapes, colors, movements, and locations, all concurrently. For instance, when observing a forest landscape, the brain simultaneously processes the movement of leaves, the shapes of trees, the depth between them, and the various shades of green. This enables a quick and...
543
Maximum Size of Aggregate01:12

Maximum Size of Aggregate

427
The maximum size of aggregate is defined as the aperture of the sieve retaining 15 percent or more of the particles present in the aggregate sample. The aggregate's maximum size impacts the concrete's water requirement, workability, and strength. Larger aggregates reduce the surface area needing cement paste coverage, which can lower water needs, thereby allowing a decrease in the water-to-cement ratio when the desired workability and richness of the mix are to be maintained, which can...
427
Data Reporting and Recording01:24

Data Reporting and Recording

5.3K
Reporting and recording are crucial in data documentation. The timely, thorough, and accurate documentation of facts is essential when recording patient data. Failure to record findings during an assessment or interpretation of a problem will result in loss of information and make the patient document unreliable. The reader is left with general impressions if the information is not specific. A recording is documenting data of the individual's health information in a traceable, secure, and...
5.3K
Cluster Sampling Method01:20

Cluster Sampling Method

13.8K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
13.8K
Mass Analyzers: Overview01:13

Mass Analyzers: Overview

1.5K
The mass analyzer is a crucial component of the mass spectrometer. In the ionization chamber, the vaporized sample is bombarded with a high-energy electron beam to generate a radical cation and further fragment into neutral molecules, radicals, and cations. A series of negatively charged accelerator plates accelerate the cations into the mass analyzer. The mass analyzer separates ions according to their mass-to-charge (m/z) ratios and then directs them to the detector. The common types of mass...
1.5K
Data Collection I01:30

Data Collection I

7.7K
Data collection gathers information needed to make accurate judgments about a patient's present condition. During a health history interview, subjective data is collected from the patient, their caregivers, or family members, and objective data is collected through observations and physical assessment. Patients are the primary source of subjective data. Thus information gathered from patients through interviews, observations, and physical examination is primary data. Secondary sources of...
7.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Cell painting and thermal proteome profiling for inference of drug targets and mechanism of action.

Molecular systems biology·2026
Same author

Benign-by-design chemistry: Reinventing ligand-based drug design at the edge of AI.

Drug discovery today·2026
Same author

AI agents in drug discovery: applications and case studies.

Drug discovery today·2026
Same author

Counting cells can accurately predict small-molecule bioactivity benchmarks.

Nature communications·2026
Same author

Co-exposure to PFAS and hydroxylated PCBs is associated with increased odds of multiple sclerosis.

Environment international·2025
Same author

PKSmart: an open-source computational model to predict intravenous pharmacokinetics of small molecules.

Journal of cheminformatics·2025
Same journal

NanoporeDB: A Structural Resource Of Multimeric Protein Nanopores For Single-Molecule Sensing.

GigaScience·2026
Same journal

From the Brain Cell Atlas to Precision Neurology: A review of the application of AI-driven multi-omics in brain science.

GigaScience·2026
Same journal

Comparison of Deep Learning Approaches for Extreme Low-SNR Image Restoration.

GigaScience·2026
Same journal

ScopeViewer: A Browser-Based Solution for Visualizing Large Biological Images.

GigaScience·2026
Same journal

ChatMDV: Reducing Technical Barriers in Bioinformatics Analysis using Large Language Models.

GigaScience·2026
Same journal

ClusterGraph: a new tool for visualisation and compression of multidimensional data.

GigaScience·2026
See all related articles

Related Experiment Video

Updated: Dec 22, 2025

gP2S, an Information Management System for CryoEM Experiments
13:01

gP2S, an Information Management System for CryoEM Experiments

Published on: June 10, 2021

5.8K

MaRe: Processing Big Data with application containers on Apache Spark.

Marco Capuccini1,2, Martin Dahlö2,3,4, Salman Toor1

  • 1Department of Information Technology, Uppsala University, Box 337, 75105, Uppsala, Sweden.

Gigascience
|May 6, 2020
PubMed
Summary
This summary is machine-generated.

MaRe integrates Docker containers into Apache Spark for scalable Big Data analytics in life sciences. This open-source library enhances bioinformatics pipelines by enabling tool reuse and efficient data processing.

Keywords:
Apache SparkBig DataMapReduceapplication containersworkflows

More Related Videos

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts
08:51

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Published on: September 20, 2024

1.9K
Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering
09:43

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

6.6K

Related Experiment Videos

Last Updated: Dec 22, 2025

gP2S, an Information Management System for CryoEM Experiments
13:01

gP2S, an Information Management System for CryoEM Experiments

Published on: June 10, 2021

5.8K
Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts
08:51

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Published on: September 20, 2024

1.9K
Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering
09:43

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

6.6K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Data Science

Background:

  • Life science research heavily relies on Big Data analytics.
  • Existing MapReduce frameworks lack support for bioinformatics tool reuse and application containers.
  • Application containers are gaining traction in scientific data processing.

Purpose of the Study:

  • To introduce MaRe, an open-source library for integrating Docker containers with Apache Spark.
  • To enhance interoperability within the scientific software ecosystem.
  • To facilitate data-intensive analyses in life sciences.

Main Methods:

  • Developed MaRe, a programming library enabling Docker container support in Apache Spark.
  • Leveraged Apache Spark as the MapReduce framework and Docker as the container engine.
  • Demonstrated MaRe's functionality on two data-intensive life science applications.

Main Results:

  • MaRe successfully integrates Docker containers into Apache Spark.
  • The library demonstrates ease of use and scalability in life science applications.
  • MaRe provides interoperability with a wide range of scientific software.

Conclusions:

  • MaRe enables scalable, containerized data processing for life sciences using Apache Spark.
  • Offers advantages over traditional workflow systems, including data locality and interactive processing.
  • MaRe is a generally applicable, open-source solution for modern bioinformatics pipelines.