Bayesian inference for Cox regression models using catalytic prior distributions

  • 0Department of Statistics & Data Science, National University of Singapore, 117546, Singapore.

|

|

Summary

This summary is machine-generated.

We introduce the Cox catalytic prior for Bayesian inference in Cox models, improving stability for small sample sizes. This method enhances survival data analysis by offering a robust alternative to standard inference techniques.

Area Of Science

  • Statistics
  • Biostatistics
  • Survival Analysis

Background

  • The Cox proportional hazards model (Cox model) is widely used for survival data.
  • Standard inference methods in Cox models face challenges with small sample sizes relative to model dimensions.
  • Existing methods may not sufficiently stabilize complex parametric models in high-dimensional settings.

Purpose Of The Study

  • To propose a novel Bayesian approach, the Cox catalytic prior, for enhancing Cox model inference.
  • To address the limitations of standard maximum partial likelihood inference in small sample, high-dimensional scenarios.
  • To provide a stable and consistent estimation method for Cox models.

Main Methods

  • Formulation of the Cox catalytic prior using synthetic data and a surrogate baseline hazard.
  • Generation of synthetic data from the predictive distribution of a simpler fitted model.
  • Derivation of an approximate marginal posterior mode as a regularized log partial likelihood estimator.

Main Results

  • The proposed Cox catalytic prior is proven to be proper under mild conditions.
  • The resulting estimator demonstrates consistency.
  • Simulation studies show superior performance compared to standard maximum partial likelihood inference and comparable results to existing shrinkage methods.

Conclusions

  • The Cox catalytic prior offers a robust and effective Bayesian approach for Cox model inference, particularly in challenging small sample size scenarios.
  • The method provides a stable and consistent estimator, outperforming traditional techniques.
  • The approach is applicable to real-world survival data analysis.

Related Concept Videos

Regression Toward the Mean 01:52

7.0K

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

Multiple Regression 01:25

4.0K

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Correlation and Regression 00:53

3.4K

In statistics, correlation describes the degree of association between two variables. In the subfield of linear regression, correlation is mathematically expressed by the correlation coefficient, which describes the strength and direction of the relationship between two variables. The coefficient is symbolically represented by 'r' and ranges from -1 to +1. A positive value indicates a positive correlation where the two variables move in the same direction. A negative value suggests a...

Regression Analysis 01:11

8.4K

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

In the equation,  is the dependent...

Turnover Number and Catalytic Efficiency 01:19

21.6K

The turnover number of an enzyme is the maximum number of substrate molecules it can transform per unit time. Turnover numbers for most enzymes range from 1 to 1000 molecules per second. Catalase has the known highest turnover number, capable of converting up to 2.8×106 molecules of hydrogen peroxide into water and oxygen per second. Lysozyme has the lowest known turnover number of half a molecule per second.
Chymotrypsin is a pancreatic enzyme that breaks down proteins during digestion....

Catalytically Perfect Enzymes 01:07

5.1K

The theory of catalytically perfect enzymes was first proposed by W.J. Albery and J. R. Knowles in 1976. These enzymes catalyze biochemical reactions at high-speed. Their catalytic efficiency values range from 108-109 M-1s-1. These enzymes are also called 'diffusion-controlled' as the only rate-limiting step in the catalysis is that of the substrate diffusion into the active site. Examples include triose phosphate isomerase, fumarase, and superoxide dismutase.
 
Most enzymes...