Multi-view representation learning for tabular data integration using inter-feature relationships
View abstract on PubMed
Summary
This summary is machine-generated.Harmonizing data sources without metadata is crucial for robust algorithms. Inter-feature relationships effectively map features across datasets, with contrastive learning showing superior performance in matching and reconstruction.
Area Of Science
- Data Science
- Machine Learning
- Bioinformatics
Background
- Harmonizing diverse data sources is a key challenge in data science, particularly in healthcare.
- Integrating data from multiple origins with unmapped features is essential for developing generalizable algorithms.
- Existing methods often rely on ambiguous or unavailable metadata, necessitating new approaches.
Purpose Of The Study
- To design and evaluate methods for mapping structured tabular datasets from electronic health records (EHRs) independent of metadata.
- To identify effective strategies for feature mapping when only a small set of features are initially known.
Main Methods
- Comparison of contrastive learning, partial auto-encoders, mutual-information graph optimizers, and statistical baselines.
- Evaluation on simulated data, public datasets, MIMIC-III, and perioperative records.
- Performance assessment based on feature mapping accuracy and data reconstruction.
Main Results
- Contrastive learning methods demonstrated superior performance in feature matching and reconstruction, especially on real-world data.
- Partial auto-encoders performed comparably to contrastive methods in many scenarios.
- A novel statistical method showed reasonable performance with less hyperparameter tuning.
Conclusions
- Inter-feature relationships are effective for identifying matching features across tabular datasets lacking metadata.
- Decoder architectures can effectively impute features when exact matches are not found.
Related Concept Videos
It is far more common for collisions to occur in two dimensions; that is, the initial velocity vectors are neither parallel nor antiparallel to each other. Let's see what complications arise from this. The first idea is that momentum is a vector. Like all vectors, it can be expressed as a sum of perpendicular components (usually, though not always, an x-component and a y-component, and a z-component if necessary). Thus, when the statement of conservation of momentum is written for a...
The two-compartment model divides the body into central and peripheral compartments to account for varying blood perfusion rates among organs and tissues, affecting drug distribution. The central compartment includes blood and highly perfused tissues with rapid drug distribution, while the peripheral compartment contains tissues with slower drug distribution. After a single IV bolus dose, the drug concentration is high in plasma and low in tissues. The drug distribution between compartments...
What do you think is the single most influential factor in determining with whom you become friends and whom you form romantic relationships? You might be surprised to learn that the answer is simple: the people with whom you have the most contact. This most important factor is proximity. You are more likely to be friends with people you have regular contact with. For example, there are decades of research that shows that you are more likely to become friends with people who live in your dorm,...
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
Correlation means that there is a relationship between two or more variables (such as ice cream consumption and crime), but this relationship does not necessarily imply cause and effect. When two variables are correlated, it simply means that as one variable changes, so does the other. We can measure correlation by calculating a statistic known as a correlation coefficient. A correlation coefficient is a number from -1 to +1 that indicates the strength and direction of the relationship between...
In multiple dimensions, the conservation of momentum applies in each direction independently. Hence, to solve collisions in multiple dimensions, we should write down the momentum conservation in each direction separately. To help understand collisions in multiple dimensions, consider an example.
A small car of mass 1,200 kg traveling east at 60 km/h collides at an intersection with a truck of mass 3,000 kg traveling due north at 40 km/h. The two vehicles are locked together. What is the...

