Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Parallel Processing

Parallel Processing

The brain processes sensory information rapidly due to parallel processing, which involves sending data across multiple neural pathways at the same time. This method allows the brain to manage various sensory qualities, such as shapes, colors, movements, and locations, all concurrently. For instance, when observing a forest landscape, the brain simultaneously processes the movement of leaves, the shapes of trees, the depth between them, and the various shades of green. This enables a quick and...

Maximum Size of Aggregate

Maximum Size of Aggregate

The maximum size of aggregate is defined as the aperture of the sieve retaining 15 percent or more of the particles present in the aggregate sample. The aggregate's maximum size impacts the concrete's water requirement, workability, and strength. Larger aggregates reduce the surface area needing cement paste coverage, which can lower water needs, thereby allowing a decrease in the water-to-cement ratio when the desired workability and richness of the mix are to be maintained, which can...

Data Reporting and Recording

Data Reporting and Recording

Reporting and recording are crucial in data documentation. The timely, thorough, and accurate documentation of facts is essential when recording patient data. Failure to record findings during an assessment or interpretation of a problem will result in loss of information and make the patient document unreliable. The reader is left with general impressions if the information is not specific. A recording is documenting data of the individual's health information in a traceable, secure, and...

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Mass Analyzers: Overview

Mass Analyzers: Overview

The mass analyzer is a crucial component of the mass spectrometer. In the ionization chamber, the vaporized sample is bombarded with a high-energy electron beam to generate a radical cation and further fragment into neutral molecules, radicals, and cations. A series of negatively charged accelerator plates accelerate the cations into the mass analyzer. The mass analyzer separates ions according to their mass-to-charge (m/z) ratios and then directs them to the detector. The common types of mass...

Data Collection I

Data Collection I

Data collection gathers information needed to make accurate judgments about a patient's present condition. During a health history interview, subjective data is collected from the patient, their caregivers, or family members, and objective data is collected through observations and physical assessment. Patients are the primary source of subjective data. Thus information gathered from patients through interviews, observations, and physical examination is primary data. Secondary sources of...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Cell painting and thermal proteome profiling for inference of drug targets and mechanism of action.

Molecular systems biology·2026

Same author

Benign-by-design chemistry: Reinventing ligand-based drug design at the edge of AI.

Drug discovery today·2026

Same author

AI agents in drug discovery: applications and case studies.

Drug discovery today·2026

Same author

Counting cells can accurately predict small-molecule bioactivity benchmarks.

Nature communications·2026

Same author

Co-exposure to PFAS and hydroxylated PCBs is associated with increased odds of multiple sclerosis.

Environment international·2025

Same author

PKSmart: an open-source computational model to predict intravenous pharmacokinetics of small molecules.

Journal of cheminformatics·2025

Same journal

NanoporeDB: A Structural Resource Of Multimeric Protein Nanopores For Single-Molecule Sensing.

GigaScience·2026

Same journal

From the Brain Cell Atlas to Precision Neurology: A review of the application of AI-driven multi-omics in brain science.

GigaScience·2026

Same journal

Comparison of Deep Learning Approaches for Extreme Low-SNR Image Restoration.

GigaScience·2026

Same journal

ScopeViewer: A Browser-Based Solution for Visualizing Large Biological Images.

GigaScience·2026

Same journal

ChatMDV: Reducing Technical Barriers in Bioinformatics Analysis using Large Language Models.

GigaScience·2026

Same journal

ClusterGraph: a new tool for visualisation and compression of multidimensional data.

GigaScience·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Dec 22, 2025

gP2S, an Information Management System for CryoEM Experiments

gP2S, an Information Management System for CryoEM Experiments

Published on: June 10, 2021

MaRe: Processing Big Data with application containers on Apache Spark.

Marco Capuccini^1,2, Martin Dahlö^2,3,4, Salman Toor¹

¹Department of Information Technology, Uppsala University, Box 337, 75105, Uppsala, Sweden.

|May 6, 2020

Summary

This summary is machine-generated.

MaRe integrates Docker containers into Apache Spark for scalable Big Data analytics in life sciences. This open-source library enhances bioinformatics pipelines by enabling tool reuse and efficient data processing.

Keywords:

Apache Spark Big Data MapReduce application containers workflows

More Related Videos

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Published on: September 20, 2024

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

Related Experiment Videos

Last Updated: Dec 22, 2025

gP2S, an Information Management System for CryoEM Experiments

gP2S, an Information Management System for CryoEM Experiments

Published on: June 10, 2021

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Published on: September 20, 2024

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

Area of Science:

Bioinformatics
Computational Biology
Data Science

Background:

Life science research heavily relies on Big Data analytics.
Existing MapReduce frameworks lack support for bioinformatics tool reuse and application containers.
Application containers are gaining traction in scientific data processing.

Purpose of the Study:

To introduce MaRe, an open-source library for integrating Docker containers with Apache Spark.
To enhance interoperability within the scientific software ecosystem.
To facilitate data-intensive analyses in life sciences.

Main Methods:

Developed MaRe, a programming library enabling Docker container support in Apache Spark.
Leveraged Apache Spark as the MapReduce framework and Docker as the container engine.
Demonstrated MaRe's functionality on two data-intensive life science applications.

Main Results:

MaRe successfully integrates Docker containers into Apache Spark.
The library demonstrates ease of use and scalability in life science applications.
MaRe provides interoperability with a wide range of scientific software.

Conclusions:

MaRe enables scalable, containerized data processing for life sciences using Apache Spark.
Offers advantages over traditional workflow systems, including data locality and interactive processing.
MaRe is a generally applicable, open-source solution for modern bioinformatics pipelines.