Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Selected Data About Geographic Locations01:25

Selected Data About Geographic Locations

114
Geographic Information Systems (GIS) rely on two core types of data: spatial data and attribute data.Spatial DataSpatial data defines the physical location of features within a coordinate system, typically expressed in terms of latitude and longitude. It provides precise positioning for elements like roads, rivers, or buildings.Attribute DataAttribute data complements spatial data by adding descriptive information about these features. For example, a road's spatial data includes its start and...
114
GIS Software, Hardware, and Sources of GIS Data01:23

GIS Software, Hardware, and Sources of GIS Data

327
A Geographic Information System (GIS) combines specialized software and hardware to effectively manage, analyze, and present spatial and related data. GIS software includes critical functionalities such as a user interface for easy navigation, database management tools for handling spatial and attribute data, and data retrieval features for efficient access. Analytical tools transform raw data into insights, while display functions produce maps and reports in various formats for effective...
327
Manipulation and Analysis01:21

Manipulation and Analysis

122
GIS manipulation and analysis functions are vital for decision-making and planning. These activities range from data retrieval tasks, such as selecting information based on specific criteria, to advanced analytical techniques that address complex spatial problems.One critical GIS analysis method is overlaying, which combines multiple data layers to examine impacts. For example, overlaying a river-dammed lake boundary with road networks can identify affected infrastructure. Another common...
122
Levels of Use of a GIS01:29

Levels of Use of a GIS

146
Geographic Information Systems (GIS) operate across three levels of application, each representing an increasing degree of complexity: data management, analysis, and prediction. These levels reflect the expanding functionality and versatility of GIS technology in handling spatial data for diverse purposes.Data ManagementAt its foundational level, GIS serves as a tool for data management, enabling the input, storage, retrieval, and organization of spatial data. This level is often employed in...
146
Storage01:23

Storage

185
A schema is a mental framework that helps individuals organize and interpret information. Schemata, formed from previous experiences, influence how we process new information: how we encode it, the inferences we make, and how we retrieve it. For instance, a schema for what a typical classroom looks like might include desks, a teacher's desk, a whiteboard, and students in such an environment. This expectation helps us quickly understand and navigate new classrooms without needing to analyze...
185
Cluster Sampling Method01:20

Cluster Sampling Method

13.3K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
13.3K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

The Fragility of Bioactivity Prediction: Rigorous Dataset Splits Expose the Illusion of ML Accuracy.

Chemistry (Weinheim an der Bergstrasse, Germany)·2026
Same author

Plasmonic and surface-enhanced Raman nanobiosensors for quantitative molecular detection.

Discover nano·2026
Same author

Waste-to-sensor upcycling of polyethylene terephthalate over Ag/Zr-MOF photocatalyst for microplastic degradation and AI-assisted heavy metal detection.

Journal of biological engineering·2026
Same author

Hybrid Computational Strategy for Predicting Complex Ligand-Metal Architectures.

Angewandte Chemie (International ed. in English)·2026
Same author

Image registration using MR-based synthetic CT (sCT) generated by cycle-consistent adversarial networks.

Biomedical engineering letters·2026
Same author

Analyzing Tweeting Patterns and Public Engagement on Twitter During the Recognition Period of the COVID-19 Pandemic: A Study of Two U.S. States.

IEEE access : practical innovations, open solutions·2025
Same journal

Battle royale optimizer for multilevel image thresholding.

The Journal of supercomputing·2025
Same journal

MOBRO: multi-objective battle royale optimizer.

The Journal of supercomputing·2025
Same journal

Optimizing inference of segmentation on high-resolution images in MLExchange.

The Journal of supercomputing·2025
Same journal

Topic sentiment analysis based on deep neural network using document embedding technique.

The Journal of supercomputing·2023
Same journal

AEGA: enhanced feature selection based on ANOVA and extended genetic algorithm for online customer review analysis.

The Journal of supercomputing·2023
Same journal

A Fechner multiscale local descriptor for face recognition.

The Journal of supercomputing·2023
See all related articles

Related Experiment Video

Updated: Oct 29, 2025

Utilizing Electroencephalography Measurements for Comparison of Task-Specific Neural Efficiencies: Spatial Intelligence Tasks
06:57

Utilizing Electroencephalography Measurements for Comparison of Task-Specific Neural Efficiencies: Spatial Intelligence Tasks

Published on: August 9, 2016

11.6K

A comparative experimental study of distributed storage engines for big spatial data processing using GeoSpark.

Hansub Shin1, Kisung Lee2, Hyuk-Yoon Kwon1

  • 1Department of Industrial Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea.

The Journal of Supercomputing
|July 6, 2021
PubMed
Summary
This summary is machine-generated.

This study evaluates distributed storage engines for big spatial data processing with GeoSpark. HDFS and Amazon S3 generally outperform MongoDB, though spatial sharding improves MongoDB performance for large datasets.

Keywords:
Comparative studyDistributed storage enginesGeoSparkQuery performanceSpatial data

More Related Videos

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering
09:43

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

6.5K
Executing Complexity-Increasing Queries in Relational MySQL and NoSQL MongoDB and EXist Size-Growing ISO/EN 13606 Standardized EHR Databases
07:26

Executing Complexity-Increasing Queries in Relational MySQL and NoSQL MongoDB and EXist Size-Growing ISO/EN 13606 Standardized EHR Databases

Published on: March 19, 2018

9.5K

Related Experiment Videos

Last Updated: Oct 29, 2025

Utilizing Electroencephalography Measurements for Comparison of Task-Specific Neural Efficiencies: Spatial Intelligence Tasks
06:57

Utilizing Electroencephalography Measurements for Comparison of Task-Specific Neural Efficiencies: Spatial Intelligence Tasks

Published on: August 9, 2016

11.6K
Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering
09:43

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

6.5K
Executing Complexity-Increasing Queries in Relational MySQL and NoSQL MongoDB and EXist Size-Growing ISO/EN 13606 Standardized EHR Databases
07:26

Executing Complexity-Increasing Queries in Relational MySQL and NoSQL MongoDB and EXist Size-Growing ISO/EN 13606 Standardized EHR Databases

Published on: March 19, 2018

9.5K

Area of Science:

  • Computer Science
  • Data Engineering
  • Geospatial Computing

Background:

  • Mobile devices generate vast amounts of spatial data.
  • Efficient management of this big spatial data is crucial.
  • Existing distributed systems like GeoSpark (Apache Sedona) lack comprehensive storage engine performance studies.

Purpose of the Study:

  • To evaluate the performance of different distributed storage engines for large-scale spatial data processing using GeoSpark.
  • To compare HDFS, MongoDB, and Amazon S3 as storage backends for GeoSpark.
  • To identify optimal storage solutions for big spatial data.

Main Methods:

  • Utilized GeoSpark on Apache Spark for distributed spatial data processing.
  • Experimented with HDFS, MongoDB, and Amazon S3 as storage engines.
  • Generated large datasets (up to 1 billion records) with varied distributions and sizes.
  • Conducted experiments on Amazon EMR cloud instances.
  • Analyzed performance based on sharding strategies, caching, data characteristics, and system scale.

Main Results:

  • HDFS and Amazon S3 generally outperformed MongoDB for GeoSpark spatial data processing.
  • MongoDB performance improved with larger datasets and spatial proximity-based sharding.
  • HDFS and S3 demonstrated better scalability with increased executors and storage nodes.
  • Caching significantly enhanced overall spatial data processing performance.
  • HDFS and S3 exhibited comparable performance across tested environments.

Conclusions:

  • Storage engine choice significantly impacts big spatial data processing performance.
  • HDFS and Amazon S3 are robust, scalable options for GeoSpark.
  • MongoDB can be viable for large-scale data with optimized sharding.
  • Caching is a critical factor for optimizing distributed spatial data processing.