Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Quartile01:15

Quartile

4.2K
Quartiles are numbers that separate the data into quarters. Quartiles may or may not be part of the data. To find the quartiles, first, find the median or second quartile. The first quartile, Q1, is the middle value of the lower half of the data, and the third quartile, Q3, is the middle value, or median, of the upper half of the data. To get the idea, consider the same data set:
1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5
The median or second quartile is seven. The lower half of the...
4.2K
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

6.1K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
6.1K
Bond Polarity, Dipole Moment, and Percent Ionic Character02:48

Bond Polarity, Dipole Moment, and Percent Ionic Character

28.9K
Bond Polarity
28.9K
Data Collection I01:30

Data Collection I

6.2K
Data collection gathers information needed to make accurate judgments about a patient's present condition. During a health history interview, subjective data is collected from the patient, their caregivers, or family members, and objective data is collected through observations and physical assessment. Patients are the primary source of subjective data. Thus information gathered from patients through interviews, observations, and physical examination is primary data. Secondary sources of...
6.2K
z Scores and Unusual Values01:07

z Scores and Unusual Values

9.7K
The z score is one of the three measures of relative standing. It describes the location of a value in a dataset relative to the mean. z scores are obtained after the standardization of the values in a dataset. The z score for the mean is 0.
 This score indicates how far a value is from the mean in terms of standard deviation. For example, if a data value has a z score of +1, the researcher can infer that the particular data value is one standard deviation above the mean. If another data...
9.7K
Data: Types and Distribution01:19

Data: Types and Distribution

722
In biostatistics, data are the observations collected for analysis. There are two main types: parametric and non-parametric. Parametric data, which include continuous (e.g., weight) and discrete numerical data (e.g., number of tablets), assume a particular distribution pattern, often the normal distribution. Non-parametric data do not adhere to a specific distribution and typically comprise nominal (e.g., gender) and ordinal categorical data (e.g., pain scale ratings).
Distributions in...
722

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Ethmoid sinus CBCT imaging as a biometric instrument: dataset creation for deep learning identification.

European journal of radiology·2026
Same author

MADOran: A morphologically annotated dataset of Oran.

Data in brief·2025
Same author

Morphologically-analyzed and syntactically-annotated Quran dataset.

Data in brief·2025
Same author

Perception and knowledge of learners about the use of 3D technologies in manual therapy education - a qualitative study.

BMC medical education·2023
Same author

Deep learning for Covid-19 forecasting: State-of-the-art review.

Neurocomputing·2022
Same author

Recent advances of bat-inspired algorithm, its versions and applications.

Neural computing & applications·2022
Same journal

A harmonized fast-fashion garment-variant dataset for textile circularity and sustainability assessment.

Data in brief·2026
Same journal

Terahertz reflectivity dataset: Reading text on both sides of the page.

Data in brief·2026
Same journal

High-quality draft genome sequence data of <i>Levilactobacillus brevis</i> 3LB isolated from fermented milk koumiss.

Data in brief·2026
Same journal

Interview dataset: Encouraging the development of industrial symbiosis networks in Slovenia - transition to the circular economy.

Data in brief·2026
Same journal

Timeseries of multispectral and radar data and vegetation indices from Sentinel-1, Sentinel-2 and Landsat-8 at field scale.

Data in brief·2026
Same journal

BACI-VI-Bench: A dataset of variational inequality benchmark instances for multi-agent trade-network equilibrium.

Data in brief·2026
See all related articles

Related Experiment Video

Updated: Jul 3, 2025

Comparing Bibliometric Analysis Using PubMed, Scopus, and Web of Science Databases
05:02

Comparing Bibliometric Analysis Using PubMed, Scopus, and Web of Science Databases

Published on: October 24, 2019

31.4K

Arabic punctuation dataset.

Sane Yagi1, Ashraf Elnagar2, Esra Yaghi3

  • 1Department of Foreign Languages, University of Sharjah, the United Arab Emirates.

Data in Brief
|February 13, 2024
PubMed
Summary
This summary is machine-generated.

Arabic punctuation inconsistency hinders NLP. The Arabic Punctuation Dataset (APD) offers annotated Modern Standard Arabic texts to train models for sentence boundary identification and punctuation prediction, improving Arabic NLP tasks.

Keywords:
Automatic punctuationPunctuation corpusSentence boundary identificationTheme-rhemeTopic and comment

More Related Videos

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

454
Collection and Analysis of Arabidopsis Phloem Exudates Using the EDTA-facilitated Method
09:38

Collection and Analysis of Arabidopsis Phloem Exudates Using the EDTA-facilitated Method

Published on: October 23, 2013

24.5K

Related Experiment Videos

Last Updated: Jul 3, 2025

Comparing Bibliometric Analysis Using PubMed, Scopus, and Web of Science Databases
05:02

Comparing Bibliometric Analysis Using PubMed, Scopus, and Web of Science Databases

Published on: October 24, 2019

31.4K
Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

454
Collection and Analysis of Arabidopsis Phloem Exudates Using the EDTA-facilitated Method
09:38

Collection and Analysis of Arabidopsis Phloem Exudates Using the EDTA-facilitated Method

Published on: October 23, 2013

24.5K

Area of Science:

  • Computational Linguistics
  • Natural Language Processing

Background:

  • Arabic exhibits significant punctuation inconsistency, creating challenges for Natural Language Processing (NLP) applications.
  • Developing robust NLP tools for Arabic requires addressing this punctuation variability.

Purpose of the Study:

  • To introduce the Arabic Punctuation Dataset (APD), a novel resource for improving Arabic NLP.
  • To facilitate machine learning model training for sentence boundary identification and punctuation prediction in Modern Standard Arabic.

Main Methods:

  • The Arabic Punctuation Dataset (APD) was created using the "theme-rheme completion" principle, linking grammar to punctuation.
  • APD comprises 312 million words across 12 million sentences, including manually annotated book chapters (ABC), parallel translations (CBT), and scrambled sentences (SSAC-UNPC).

Main Results:

  • APD provides a large-scale, annotated corpus for training NLP models specific to Arabic punctuation.
  • The dataset's diverse components cater to various NLP tasks, from basic boundary identification to complex punctuation restoration.

Conclusions:

  • The Arabic Punctuation Dataset (APD) is a foundational resource for advancing Arabic NLP.
  • APD's grammar-based approach enhances machine-generated text clarity, benefiting applications like machine translation and speech recognition.