Predictive modeling of tax compliance risks: A comparative study of machine learning approaches

  • 0Southeast Bangkok University, Bangkok, Thailand.

|

|

Summary

This summary is machine-generated.

Machine learning models, particularly Random Forest, show high accuracy in assessing tax risks for manufacturing and service sectors. This enables smarter auditing and intelligent risk management for businesses.

Area Of Science

  • Financial Risk Management
  • Applied Machine Learning
  • Taxation and Auditing

Background

  • Enterprises face complex financial data and interconnected risks.
  • Machine learning presents opportunities for enhanced tax risk assessment and auditing.

Purpose Of The Study

  • To evaluate the efficacy of three machine learning models (SVM, XGBoost, Random Forest) for tax risk assessment.
  • To identify key risk indicators in manufacturing and service sectors.

Main Methods

  • Analysis of 3,232 tax records from regional manufacturing and service sectors (2021-2023).
  • Comparative evaluation of Support Vector Machine (SVM), XGBoost, and Random Forest predictive models.

Main Results

  • Random Forest achieved superior accuracy: 92.00% (manufacturing) and 93.39% (service).
  • Key manufacturing risk indicators: tax burden rate, profit fluctuation, audit frequency.
  • Key service sector risk indicator: profit volatility.

Conclusions

  • Machine learning, especially Random Forest, is effective for tax risk analysis.
  • Findings provide regulators with intelligent tools for risk management and smart auditing.

Related Concept Videos

Steps in Outbreak Investigation 01:18

492

In the ever-evolving field of public health, statistical analysis serves as a cornerstone for understanding and managing disease outbreaks. By leveraging various statistical tools, health professionals can predict potential outbreaks, analyze ongoing situations, and devise effective responses to mitigate impact. For that to happen, there are a few possible stages of the analysis:

Predicting Outbreaks
Predictive analytics, a branch of statistics, uses historical data, algorithmic models, and...

Mechanistic Models: Compartment Models in Individual and Population Analysis 01:23

249

Mechanistic models are utilized in individual analysis using single-source data, but imperfections arise due to data collection errors, preventing perfect prediction of observed data. The mathematical equation involves known values (Xi), observed concentrations (Ci), measurement errors (εi), model parameters (ϕj), and the related function (ƒi) for i number of values. Different least-squares metrics quantify differences between predicted and observed values. The ordinary least...

Regression Analysis 01:11

8.1K

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

In the equation,  is the dependent...

Survival Tree 01:19

388

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...

Multiple Regression 01:25

3.8K

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Actuarial Approach 01:20

288

The actuarial approach, a statistical method originally developed for life insurance risk assessment, is widely used to calculate survival rates in clinical and population studies. This method accounts for participants lost to follow-up or those who die from causes unrelated to the study, ensuring a more accurate representation of survival probabilities.
Consider the example of a high-risk surgical procedure with significant early-stage mortality. A two-year clinical study is conducted,...