Real-World Data Integration in Practice: Why Communication with Domain Experts is Key
View abstract on PubMed
Summary
This summary is machine-generated.Informatics infrastructures enhance translational research by enabling data access. Addressing data quality and representation issues requires customized, use-case-specific solutions aligned with domain experts and agile ETL processes for meaningful data reuse.
Area Of Science
- Health Informatics
- Translational Research
- Data Science
Background
- Informatics infrastructures are crucial for translational research.
- These systems provide researchers with self-service access to health data.
- Real-world data integration presents significant challenges.
Purpose Of The Study
- To identify common data quality and representation issues in health data integration.
- To present recommendations and best practices for addressing these issues.
- To emphasize the need for use-case-specific solutions in data integration.
Main Methods
- Analysis of common data quality and representation issues in real-world data integration.
- Development of recommendations and best practices.
- Emphasis on aligning technical solutions with domain expertise and use cases.
Main Results
- Data quality and representation issues are prevalent in health data integration.
- Technical solutions alone are insufficient; customization based on intended use case is essential.
- Alignment with domain experts and agile ETL processes are key for meaningful data reuse.
Conclusions
- Effective informatics infrastructures require addressing data quality and representation challenges.
- Solutions must be tailored to specific use cases and involve domain experts.
- Agile ETL processes facilitate the meaningful reuse of integrated health data.
Related Concept Videos
The intensity of a signal, which can be represented by the area under the peak, depends on the number of protons contributing to that signal. The area under each peak is shown as a vertical line called an integral, with the integral value listed under it, as seen in the proton NMR spectrum of benzyl acetate. Each integral value is divided by the smallest integral value to obtain the ratio of the number of protons producing each signal. The ratio reveals the relative number of protons and not...
GIS manipulation and analysis functions are vital for decision-making and planning. These activities range from data retrieval tasks, such as selecting information based on specific criteria, to advanced analytical techniques that address complex spatial problems.One critical GIS analysis method is overlaying, which combines multiple data layers to examine impacts. For example, overlaying a river-dammed lake boundary with road networks can identify affected infrastructure. Another common...
Data validation is an essential part of a comprehensive assessment. Validation is confirming or verifying and opening the door to gathering more assessment data as it clarifies vague or unclear data. The process of checking and verifying the collected information is called data validation. The primary purpose of data validation is to ensure data is as free from error, bias, and misinterpretation as possible.
Nursing assessment guides are generally based on holistic models rather than medical...
The concept of dimension is important because every mathematical equation linking physical quantities must be dimensionally consistent, implying that mathematical equations must meet the following two rules. The first rule is that, in an equation, the expressions on each side of the equal sign must have the same dimensions. This is fairly intuitive since we can only add or subtract quantities of the same type (dimension). The second rule states that, in an equation, the arguments of any of the...
Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.
Unfortunately, being consistent in measurement does not necessarily mean that you have measured something correctly. To illustrate this concept, consider a kitchen...
Statgraphics is a comprehensive statistical software suite designed for both basic and advanced data analysis. Originating in 1980 at Princeton University under Dr. Neil W. Polhemus, it was one of the pioneering tools for statistical computing on personal computers, with its public release in 1982 marking an early milestone in data science software. Over the years, it has evolved into a robust platform for data science, offering tools for regression analysis, ANOVA, multivariate statistics,...

