A review of synthetic data terminology for privacy preserving use cases
View abstract on PubMed
Summary
This summary is machine-generated.Standardized terminology is crucial for the secure generation and use of synthetic data in research. This paper proposes definitions for key terms like synthetic data, utility, utility measure, and fidelity to improve consistency and sharing.
Area Of Science
- Health Informatics
- Data Science
- Privacy-Preserving Technologies
Background
- Synthetic data generation is vital for research using secure administrative and health data.
- Lack of standardized terminology impedes quality standards, governance, and knowledge sharing for synthetic data.
- This is particularly critical for synthetic data derived from protected personal information.
Purpose Of The Study
- To review existing literature on synthetic data terminology.
- To explore current definitions and identify inconsistencies, especially for privacy-preserving use cases.
- To propose standardized definitions for key synthetic data terms.
Main Methods
- Literature review of synthetic data generation and terminology.
- Analysis of term definitions in the context of privacy-preserving applications.
- Development of proposed broad definitions for core terminology.
Main Results
- Existing terminology for synthetic data properties is often inconsistent and lacking.
- The diversity of synthetic data types and use cases complicates universal definitions.
- Privacy-preserving synthetic data presents unique definitional challenges due to protected source data characteristics.
Conclusions
- Consensus on terminology is essential for advancing synthetic data research and application.
- Clearer descriptions in future literature should specify data intended use and evaluation measures.
- Proposed definitions for synthetic data, utility, utility measure, and fidelity aim to foster consistency.
Related Concept Videos
Data are individual items of information obtained from a population or sample. Data may be classified as qualitative (categorical), quantitative continuous, or quantitative discrete. Because it is not practical to measure the entire population in a study, researchers use samples to represent the population. A random sample is a representative group from the population chosen by using a method that gives each individual in the population an equal chance of being included in the sample. Random...
In statistics, several tools are used to interpret the data. Measures of central tendency represent the characteristics of the data, such as mean, median, and mode. Additionally, measures of variance like standard deviation and range are used to find the spread of data from the mean. Relative standing measures the distance between data locations. Commonly used measures of relative standings are percentile, z score, and quartiles.
Percentiles are a type of fractile that partition data into...
Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...
Ethical standards are the backbone of nursing practice, guiding nurses as they interact with patients, families, and colleagues. These standards are crucial for providing safe, empathetic care centered on the patient's needs.
Nurses are entrusted with upholding various ethical principles and standards. Nurses forge solid therapeutic relationships using trust, empathy, autonomy, confidentiality, and professional competence.
Confidentiality is crucial, embodying respect for individual privacy...
The American Nurses Association (ANA) created and implemented the first nationally accepted Code of Ethics for Nurses with Interpretive Statements. The Code of Ethics is a living document regularly updated by the ANA and establishes an ethical standard that is non-negotiable for nurses in all roles and settings.
The Code of Ethics provisions outline the nurse's duty to the patient, the healthcare team, the profession, and society. The Code's fundamental principles include advocacy,...
Some researchers gain access to large amounts of data without interacting with a single research participant. Instead, they use existing records to answer various research questions. This type of research approach is known as archival research. Archival research relies on looking at past records or data sets to look for interesting patterns or relationships. For example, a researcher might access the academic records of all individuals who enrolled in college within the past ten years and...

