Deep tobit model: an integrated framework for high-dimensional censored regression with variable selection
View abstract on PubMed
Summary
This summary is machine-generated.This study introduces the Deep Tobit model for analyzing high-dimensional, left-censored data. The novel framework enhances variable selection and prediction accuracy, outperforming existing methods.
Area Of Science
- Statistics
- Machine Learning
- Data Science
Background
- High-dimensional data with left-censored responses present analytical challenges.
- Existing methods like classical Tobit models and deep learning have limitations in handling nonlinearity, variable selection, and interpretability.
Purpose Of The Study
- To propose an integrated deep learning framework, the Deep Tobit model, for analyzing high-dimensional left-censored data.
- To develop a robust two-stage feature selection algorithm with theoretical guarantees.
- To improve both variable selection and prediction accuracy in censored data analysis.
Main Methods
- Developed the Deep Tobit model using the negative Tobit log-likelihood as its loss function to address data censoring.
- Implemented a two-stage feature selection algorithm with proven convergence rate and selection consistency.
- Validated the model through extensive simulation studies and real-world applications.
Main Results
- The Deep Tobit model demonstrated superior performance compared to state-of-the-art baselines.
- The framework achieved high accuracy in both variable selection and prediction.
- Successful application to aero-engine vibration and HIV viral load datasets.
Conclusions
- The Deep Tobit model offers a powerful and effective solution for analyzing complex censored data.
- The integrated approach balances prediction performance with essential variable selection and interpretability.
- This framework advances the analysis of high-dimensional left-censored data in various scientific fields.
Related Concept Videos
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
In statistics, correlation describes the degree of association between two variables. In the subfield of linear regression, correlation is mathematically expressed by the correlation coefficient, which describes the strength and direction of the relationship between two variables. The coefficient is symbolically represented by 'r' and ranges from -1 to +1. A positive value indicates a positive correlation where the two variables move in the same direction. A negative value suggests a...
Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
In the equation, is the dependent...
Regression analysis in Microsoft Excel is a powerful statistical method for examining the relationship between a dependent variable and one or more independent variables. It's used extensively in fields such as economics, biology, and business to predict outcomes, understand relationships, and make data-driven decisions. The most common type is linear regression, which attempts to fit a straight line through the data points to model the relationship between variables.
To perform regression...

