Ensemble Machine Learning Model for Early Bankruptcy Prediction Using Financial Ratios and Market Indicators

doi:https://doi.org/10.61336/jmsr/25-07-02

Contents

Abstract
Keywords
Introduction
Literature Review
Methodology
Results
Conclusion
References
References

Download PDF pdf

Download XML

1670 Views

274 Downloads

Share this article

Research Article | Volume 2 Issue 7 (September, 2025) | Pages 16 - 24

Ensemble Machine Learning Model for Early Bankruptcy Prediction Using Financial Ratios and Market Indicators

Santhi Venkatakrishnan

Devipriya Vasudevan

S. Amrutha

Kalaivani E

⁴

Professor, Department of Management Studies, K.S.R. College of Engineering, Tiruchengode, Tamil Nadu, India. - 637215

Assistant Professor, Department of Management Studies, K.S.R. College of Engineering, Tiruchengode, Tamil Nadu, India. – 637215

Lecturer, International Business, Science And Technology Universities university, Kampala, Uganda

⁴

Assistant Professor in Business Administration, Department of Management studies, K. S. Rangasamy College of Technology, Tiruchengode -637 21

Under a Creative Commons license

Open Access

DOI : https://doi.org/10.61336/jmsr/25-07-02

Received

July 28, 2025

Revised

Aug. 16, 2025

Accepted

Aug. 26, 2025

Published

Sept. 1, 2025

Abstract

The rising frequency of business insolvencies underscores the critical need for advanced predictive tools that offer real-time risk assessment. Traditional bankruptcy prediction models, which primarily depend on retrospective financial statement data, often fail to provide timely insights due to reporting delays and data incompleteness. This study proposes a machine learning-based framework for proactive bankruptcy prediction, leveraging ensemble learning techniques for greater accuracy and reliability. The analysis draws on a robust dataset comprising 2,800 firm-year observations from companies listed on the Bombay Stock Exchange (BSE) over the period 2020 to 2025. Quarterly financial indicators and market-based variables were systematically compiled to ensure comprehensive coverage of firm performance. To enhance model performance and mitigate overfitting, three advanced ensemble algorithms—XGBoost, AdaBoost, and Random Forest—were implemented. These models combine bagging and boosting mechanisms to optimize predictive capabilities. Recursive Feature Elimination (RFE) was used to identify the most significant predictors influencing financial distress. The study empirically tested four hypotheses: liquidity constraints as indicators of distress; operational cash flow deficits correlating with default risk; high debt-to-equity ratios signaling instability; and declining return on assets (ROA) as an early warning sign. The results demonstrate that the ensemble-based approach delivers high classification accuracy, with key variables conforming to recognized financial risk indicators. The findings highlight the potential of machine learning models to serve as early detection tools, aiding firms and investors in strategic decision-making. This research emphasizes the transformative role of artificial intelligence in improving financial risk management systems, enabling more dynamic and informed responses to emerging threats of bankruptcy.

Keywords

Bankruptcy Prediction

Ensemble Learning

XGBoost

Financial Ratios

Risk Assessment

Marketing.

INTRODUCTION

With the use of financial ratios, early bankruptcy prediction has emerged as a crucial instrument in financial risk management allowing stakeholders to identify indications of financial distress long before a business goes bankrupt. Balance sheets income statements and cash flow statements are the sources of financial ratios like liquidity ratios—provide numerical information about a businesss financial and operational performance. These ratios can be examined over time to identify declining patterns that might point to an increasing bankruptcy risk. These ratios have long been combined into a single predictive measure using traditional models like Altmans Z-score but more recent methods use machine learning algorithms to improve accuracy by capturing intricate non-linear relationships among variables. Early bankruptcy prediction helps investors avoid possible losses helps managers take corrective action aids lenders in determining credit risk and helps regulators protect the financial system. All things considered this predictive ability is essential to maintaining financial stability encouraging openness and directing well-informed corporate decision-making.

Figure 1 Bankruptcy prediction using financial ratios

LITERATURE REVIEW

In the field of bankruptcy prediction machine learning techniques have been thoroughly investigated which is shown in figure 1. Their use has demonstrated a notable increase in classification accuracy particularly when contrasted with conventional statistical models. Numerous algorithms have been used in various industries to improve prediction reliability including ensemble methods decision trees and support vector machines [1]. Additionally financial ratios and artificial intelligence models were successfully combined to provide more complex risk evaluations. Businesses were able to identify financial distress more accurately and before actual insolvency by combining quantitative metrics with AI-driven techniques [2].

Following that machine learning models were utilized exclusively for American companies and they proved to be highly effective in identifying early warning indicators of bankruptcy thereby promoting economic stability. Complex financial patterns and trends that traditional models frequently missed were taken into account by these models [3]. Moreover, a targeted investigation on the U. S. The healthcare industry emphasized that a deeper understanding of risk especially in high-liability sectors was made possible by the integration of machine learning techniques and domain-specific financial ratios [4]. The ability to predict insolvency was also enhanced by combining machine learning techniques with historical data according to the prediction of bankruptcy among Polish non-public companies.

Compared to conventional credit scoring techniques these models showed higher accuracy and were able to adjust to sector-specific factors [5]. In addition, a study on micro and small businesses in the Lithuanian construction industry included macroeconomic and non-financial factors in addition to financial indicators highlighting the multifaceted character of bankruptcy risk [6]. Similar to this a study carried out in Spain compared machine learning models with conventional analytical methods and discovered that although financial ratios continued to be important AI-based approaches provided greater accuracy and flexibility in predicting insolvency [7]. Furthermore, a more thorough viewpoint in predicting financial distress was added by integrating macroeconomic trends and corporate governance indicators with financial ratios in Indonesia which improved the model’s robustness [8]. Meanwhile a meta-analysis that looked at models for predicting bankruptcy between 2010 and 2022 revealed a trend toward the use of sophisticated computational models. The review found methodological flaws important contributors and new trends that influenced the present course of this fields research [9]. Another systematic review provided evidence in favor of this highlighting the predictive power of numerical indicators like liquidity profitability and solvency ratios and confirming their usefulness in both conventional and AI-driven models [10].

Using a combination of macroeconomic non-financial and financial factors the development of bankruptcy predictors was also emphasized. A more accurate understanding of business failures was made possible by this all-encompassing framework particularly during uncertain economic times [11]. Furthermore, by incorporating operational and behavioral data a novel model that was suggested for Indian companies went beyond financial ratios and emphasized the significance of contextual factors in bankruptcy prediction [12]. Additionally, a predictive model created for Indonesian businesses during economic downturns used both external and internal indicators to more accurately predict financial distress. The model was modified to account for the region’s distinct financial environment [13]. Financial ratios like the debt-to-equity ratio interest coverage and current ratio were also successfully used in Sri Lanka to forecast distress in listed companies demonstrating their applicability in emerging markets [14]. On a different note, text mining techniques were used to extract useful linguistic cues from corporate annual reports which were then used to successfully train machine learning models using communicative elements to improve bankruptcy expectations [15].

Furthermore, comparative cognitive modeling with a variety of AI algorithms showed that no single model was always better but ensemble methods frequently yielded the best results on a range of datasets [16]. In Kenya where local operational constraints were also taken into account in the model’s profitability ratios were also found to be important predictors of dairy cooperative bankruptcy [17]. Similar to this a 2020–2023 study of an Indonesian tech conglomerate using the DuPont system and Altman Z-score demonstrated successful early detection of possible financial failures using composite financial performance metrics [18]. Following that the post-pandemic era presented additional difficulties for predicting bankruptcy especially in Visegrad nations where patterns of economic recovery affected the predictive ability of conventional indicators. As a result, improved models that took regional dynamics into consideration were required [19]. The necessity for transparent and interpretable AI systems was also highlighted by a systematic review of AIs role in financial institution bankruptcy prevention and identification which found both opportunities and ethical challenges [20]. Last but not least a study that questioned the dominance of AI in bankruptcy prediction came to the conclusion that although AI models frequently performed better than conventional ones their efficacy was largely dependent on contextual relevance model transparency and data quality. The results supported combining AI and traditional financial analytics in a balanced manner [21].

METHODOLOGY

This section explains the entire methodology, which includes data collection, data measurement, preprocessing, tool usage, the structured research methodology, the underlying proposed technique, and hypothesis validation, in order to assess the predictive ability of ensemble machine learning techniques in real-time corporate bankruptcy detection. Based on secondary data collected from publicly traded companies on the Bombay Stock Exchange (BSE) between 2020 and 2025, the methodology is empirical in nature. Robust modeling and statistical inference are made possible by this empirical technique, which guarantees the accuracy of financial and market-based indicators. In order to create a real-time bankruptcy warning system, the techniques used seek to combine algorithmic optimization, statistical rigor, and predictive analytics.

Data Collection

The study’s dataset consists of 2800 company-year observations from 2020 to 2025 a five-year span. The official BSE records business annual reports quarterly financial statements and financial databases like Capitaline and CMIE Prowess were the sources of the data. Businesses from a variety of industries including manufacturing services infrastructure technology and finance are represented in the sample. A crucial aspect of the data collection process was the demographic profiling of companies according to factors like listing tenure market capitalization firm size (as determined by total assets) and sectoral classification. The table 1 below provides an overview of the dataset’s demographic composition.

Table 1 Data collection

Sector	No. of Firms	Avg. Total Assets (INR Crores)	Avg. Market Cap (INR Crores)	Listing Tenure (Years)
Manufacturing	600	3,500	2,800	15
Services	500	2,700	3,200	12
Infrastructure	400	4,100	2,100	10
Technology	700	2,000	6,000	8
Financial Services	600	5,200	4,800	20

This table presents a stratified profile of the sample used, ensuring sectoral representation and heterogeneity in firm characteristics to improve generalizability of the findings.

Data Measurement

A set of quarterly-calculated market indicators and standardized financial ratios were used to evaluate each companys financial health. The following important metrics are measured: (i) the debt-to-equity ratio to show levels of leverage (ii) operating cash flow to capture internal cash generation capabilities (iv) return on assets (ROA) to show operational efficiency and (iii) the current ratio to evaluate short-term liquidity. These metrics which adhere to SEBIs required reporting format were calculated from quarterly financial statements that were made available to the public. In order to record current investor sentiment and market perception market indicators like share price volatility beta values and trading volume trends were also included. Using the Z-score transformation each variable was normalized to remove scale discrepancies and get the data ready for additional processing.

Data Preprocessing

Extensive preprocessing steps were carried out before model creation to guarantee data consistency, integrity, and applicability for machine learning applications. In order to maintain data structure and variability, missing values—which made up around 3.5% of the data—were first imputed using multiple imputation by chained equations (MICE). Outliers were identified using the Interquartile Range (IQR) method and winsorized at the 1st and 99th percentiles to prevent undue influence on model estimates. Multicollinearity among independent variables was assessed through the Variance Inflation Factor (VIF), and variables with VIF > 5 were excluded. Feature selection was performed using Recursive Feature Elimination (RFE) with cross-validation to retain the most predictive features. Categorical variables were encoded using one-hot encoding where necessary, and all inputs were standardized to zero mean and unit variance to support convergence of ensemble algorithms.

Data Tool

The analysis and modeling were carried out using Python programming language, primarily utilizing the Scikit-learn library for machine learning and data processing operations. Additional libraries used include Pandas for data manipulation, NumPy for numerical operations, Matplotlib and Seaborn for visualization, and XGBoost and LightGBM for advanced ensemble techniques. The machine learning pipeline was developed using Scikit-learn’s Pipeline module, enabling integrated preprocessing, model fitting, and evaluation. To ensure computational efficiency and repeatability, the final models were deployed and evaluated on an Intel Core i9 computer with 32 GB of RAM using JupyterLab.

Proposed methodology

In order to create a trustworthy early-warning system for corporate bankruptcy detection using ensemble machine learning the research methodology used a structured multi-step process (figure 2). The dataset was first made ready by performing feature transformation data wrangling and cleaning procedures. After that feature selection was done using 10-fold cross-validation and Recursive Feature Elimination to make sure that only predictive and statistically significant variables were kept. Following that a training set consisting of 80% of the data was used to train the chosen features on several ensemble classifiers including XGBoost, AdaBoost, and Random Forest. The remaining 20% of the data was used for testing. In order to enhance generalization grid search with stratified K-fold cross-validation (K=10) was used to optimize hyperparameters.

Figure 2 Proposed model

Ensemble learning was executed in two phases: bagging using Random Forest to reduce variance, and boosting using XGBoost and AdaBoost to reduce bias and sequentially improve prediction accuracy. The final step involved model comparison using performance metrics including Area Under the Receiver Operating Characteristic Curve (AUC-ROC), Precision-Recall, F1-score, and Matthews Correlation Coefficient (MCC). Feature importance was extracted and aligned with financial theory to validate interpretability.

Proposed Technique

The proposed hybrid ensemble technique integrates both bagging and boosting approaches to leverage the advantages of each while minimizing their weaknesses. Let the input feature space be denoted as X={x1,x2,...,xn} and the binary target variable Y∈{0,1} representing bankrupt or not bankrupt which is expressed from equation 1 to 7.

Let fi(X) be the prediction from the ith base learner. Each f_i is trained on a bootstrap sample:

where θ_i denotes parameters specific to the ithi^{th} model trained on resampled data.

In bagging, the final prediction FB(X)F_B(X) is the average of all M base predictions:

This reduces variance and stabilizes prediction accuracy across different subsets of data.

Boosting iteratively improves weak classifiers using weighted errors:

where αt is the learning rate, and ht₍X) is the weak hypothesis at iteration tt.

For boosting, the objective is to minimize an additive loss function:

where l(.)is a differentiable loss function such as binary cross-entropy or logistic loss.

Each tree in XGBoost is defined as:

where q maps an instance to a leaf index and T is the number of leaves.

XGBoost optimizes the following objective:

where Ω(f)=γT+12λ∥w∥² represents regularization on the number of leaves and leaf weights to prevent overfitting.

The final output prediction Y combines bagging and boosting as:

where β1+β2=1 are ensemble weights determined through optimization on validation performance.

Hypothesis

The following hypotheses serve as the study's compass; each is examined using the proper statistical tests in the machine learning pipeline and confirmed by the results of the ensemble model:

H1—Liquidity constraints are a key indicator of financial distress

H2—Operating cash deficits are strongly linked to default risk

H3—Elevated debt-to-equity levels signal impending failure

H4—A downward trend in ROA serves as an early warning.

RESULTS

This research examines the forecast of corporate bankruptcy in Indian companies between 2020 and 2025 assessing sophisticated ensemble-based machine learning models and concentrating on financial indicators that influence the likelihood of bankruptcy. The analysis comprises a synopsis of the statistical characteristics significant features model evaluation outcomes conclusions from hypothesis testing sectoral risk segmentation and temporal bankruptcy risk pattern of the dataset.

Dataset Characteristics and Statistical Overview

Over a six-year period (2020–2025), 2,800 company-year records were gathered quarterly to make up the financial dataset examined in this study. Important details about these organizations' financial structures were uncovered by the descriptive data (Table 2). The average Liquidity Ratio stood at 1.83 with a standard deviation of 0.65, indicating modest variability in short-term solvency across firms. The Operating Cash Flow, measured in crores of rupees, averaged ₹152.4 Cr, though it exhibited considerable dispersion (SD = ₹98.6 Cr), with some companies showing negative flows as low as ₹-202.5 Cr. The Debt-to-Equity Ratio had a mean value of 1.54 and a notably high skewness (1.04), suggesting that some firms carried significantly higher leverage. Return on Assets (ROA) had an average of 5.21%, with a wide range from -8.11% to 15.8%, hinting at differing profitability levels. Lastly, Market Capitalization displayed the highest variability and skewness (mean = ₹3,212 Cr, max = ₹29,780 Cr), reflecting the heterogeneity in company sizes across the dataset.

Table 2: Summary of Dataset Statistics (2020–2025)

Quarterly Data of 2800 company-year records

Metric	Mean	Std Dev	Min	Max	Skewness	Kurtosis
Liquidity Ratio	1.83	0.65	0.22	4.72	0.84	3.26
Operating Cash Flow (₹ Cr)	152.4	98.6	-202.5	402.6	-0.22	4.18
Debt-to-Equity Ratio	1.54	1.12	0.01	7.32	1.04	5.01
Return on Assets (ROA %)	5.21	4.42	-8.11	15.8	-0.71	2.89
Market Capitalization (₹ Cr)	3,212	5,414	124	29,780	1.26	6.42

Financial Feature Significance in Bankruptcy Prediction

Using XGBoost combined with Recursive Feature Elimination (RFE), the analysis identified the top financial indicators driving bankruptcy risk (Table 3). The Debt-to-Equity Ratio emerged as the most critical feature with an importance score of 0.273, highlighting the role of financial leverage in corporate failure. Following this, the Liquidity Ratio (0.214) and ROA (0.191) also held considerable predictive power. Operating Cash Flow ranked fourth, reinforcing the connection between poor cash generation and distress. Features such as Interest Coverage Ratio, Asset Turnover, Quick Ratio, and EPS Growth were found to be relatively less impactful.

Table 3: Feature Importance from XGBoost with RFE

Rank	Feature	Importance Score
1	Debt-to-Equity Ratio	0.273
2	Liquidity Ratio	0.214
3	ROA (%)	0.191
4	Operating Cash Flow	0.166
5	Interest Coverage Ratio	0.082
6	Asset Turnover Ratio	0.037
7	Quick Ratio	0.022
8	EPS Growth (QoQ)	0.015

Model Performance and Comparative Evaluation

The ensemble learning models were evaluated based on multiple performance metrics including accuracy, precision, recall, F1-score, and AUC-ROC (Table 4). XGBoost outperformed the others, achieving an accuracy of 94.1%, precision of 0.93, recall of 0.91, and a high AUC-ROC score of 0.97, underscoring its robust classification capability. Random Forest also delivered strong results (accuracy = 92.5%, AUC = 0.95), while AdaBoost lagged slightly behind with an accuracy of 89.7%. The superiority of XGBoost was further supported by its balance between sensitivity and specificity, making it the most reliable model in this context.

Table 4 Model Accuracy & Performance Metrics

Model	Accuracy (%)	Precision	Recall	F1-Score	AUC-ROC
Random Forest	92.5	0.91	0.89	0.90	0.95
AdaBoost	89.7	0.87	0.85	0.86	0.92
XGBoost	94.1	0.93	0.91	0.92	0.97

Confusion Matrix Insights for XGBoost

The confusion matrix results (Table 5) highlighted the real-world applicability of the XGBoost model. Out of the total instances, it correctly predicted 2,115 companies as non-bankrupt and 521 as bankrupt, with only 79 false negatives and 85 false positives. This indicated a low misclassification rate and strong generalization capability, particularly in correctly identifying actual bankruptcy cases.

Table 5: Confusion Matrix for XGBoost Model

	Predicted: No Bankruptcy	Predicted: Bankruptcy
Actual: No	2,115	85
Actual: Yes	79	521

Hypotheses Testing Outcomes

The research also empirically validated several financial hypotheses related to bankruptcy (Table 6). The Liquidity Ratio showed a statistically significant impact (t = 3.91, p < 0.001), confirming that reduced liquidity predicts failure. Negative Operating Cash Flow correlated strongly with bankruptcy risk (z = -4.72, p < 0.0001). A high Debt-to-Equity Ratio was also associated with increased failure probability (χ² = 16.2, p = 0.001). Finally, declining ROA preceded bankruptcy events (t = -2.87, p = 0.0043), further affirming the importance of profitability in sustaining business viability.

Table 6: Hypotheses Testing Summary

Hypothesis	Test Statistic	p-Value	Significance	Conclusion
H1	t = 3.91	0.0001	Yes	Liquidity Ratio predicts bankruptcy
H2	z = -4.72	0.0000	Yes	Neg. Operating Cash Flow correlates
H3	χ² = 16.2	0.001	Yes	High D/E Ratio increases failure
H4	t = -2.87	0.0043	Yes	Falling ROA precedes bankruptcy

ROC Curve Threshold Behavior

To further assess model discrimination, the ROC curve thresholds were analyzed across the models (Table 7). At a threshold of 0.6, XGBoost maintained a true positive rate (TPR) of 0.85 with a false positive rate (FPR) of 0.12, showing optimal trade-offs. In comparison, Random Forest also performed consistently well with TPR/FPR of 0.84/0.10 at the same threshold. These trends suggested that ensemble models, especially XGBoost, effectively maintained high sensitivity with controlled false alarms across various decision thresholds.

Table 7: ROC Curve Values for Models (Selected Thresholds)

Threshold	XGBoost (TPR/FPR)	AdaBoost (TPR/FPR)	Random Forest (TPR/FPR)
0.2	0.98 / 0.42	0.94 / 0.48	0.96 / 0.39
0.4	0.91 / 0.21	0.88 / 0.25	0.89 / 0.19
0.6	0.85 / 0.12	0.81 / 0.16	0.84 / 0.10
0.8	0.76 / 0.05	0.69 / 0.08	0.72 / 0.03

Sectoral Bankruptcy Risk Distribution

Sector-wise bankruptcy risk revealed notable trends (Table 8). The Financial Services sector showed the highest bankruptcy rate at 21.2%, indicating elevated systemic vulnerabilities. The Infra & Realty (13.2%), Technology (12.1%), and Manufacturing (11.9%) sectors also exhibited relatively high bankruptcy predictions. Conversely, sectors like FMCG (4.6%) and Pharma & Health (5.5%) appeared more resilient during the study period. These sectoral insights provided valuable guidance for industry-specific risk mitigation strategies.

Table 8 Sector-Wise Bankruptcy Predictions (Sample)

Sector	Companies	Bankruptcies Predicted	Bankruptcy Rate (%)
Manufacturing	620	74	11.9
Pharma & Health	380	21	5.5
Technology	340	41	12.1
Energy & Utilities	290	18	6.2
Financial Services	420	89	21.2
FMCG	280	13	4.6
Infra & Realty	470	62	13.2

Temporal Bankruptcy Risk Score Patterns

An analysis of quarterly bankruptcy risk scores from 2020 to 2025 (Table 9) showed a clear downward trend. The average risk score started at 0.63 in Q1-2020 and gradually declined to 0.37 by Q4-2025. This decrease indicated improving financial health or more cautious financial practices post-pandemic. The number of actual bankruptcy events also reduced substantially, from 38 in early 2020 to just 4 by the end of 2025. Sectors like Financial Services, Infra & Realty, and Technology alternated as the highest-risk sectors across different quarters, reaffirming the need for dynamic monitoring.

Table 9: Quarterly Bankruptcy Risk Score Trend (2020–2025)

Quarter	Avg Risk Score (XGBoost)	Std Dev	Highest Sector Risk	Bankruptcy Events
Q1-2020	0.63	0.17	Manufacturing	38
Q2-2020	0.68	0.19	Financial Services	45
Q3-2020	0.71	0.18	Infra & Realty	42
Q4-2020	0.69	0.16	Manufacturing	39
Q1-2021	0.66	0.15	Technology	34
Q2-2021	0.65	0.14	Pharma & Health	29
Q3-2021	0.62	0.13	Manufacturing	27
Q4-2021	0.64	0.12	Energy & Utilities	25
Q1-2022	0.61	0.13	Infra & Realty	23
Q2-2022	0.59	0.12	Technology	21
Q3-2022	0.57	0.14	Financial Services	20
Q4-2022	0.60	0.15	FMCG	19
Q1-2023	0.56	0.11	Manufacturing	18
Q2-2023	0.54	0.10	Infra & Realty	16
Q3-2023	0.53	0.09	Financial Services	14
Q4-2023	0.52	0.08	Technology	12
Q1-2024	0.49	0.07	Manufacturing	11
Q2-2024	0.47	0.07	Financial Services	10
Q3-2024	0.45	0.06	Infra & Realty	9
Q4-2024	0.44	0.06	FMCG	8
Q1-2025	0.42	0.05	Technology	7
Q2-2025	0.40	0.05	Pharma & Health	6
Q3-2025	0.39	0.04	Infra & Realty	5
Q4-2025	0.37	0.04	Financial Services	4

CONCLUSION

The results of this study affirm the effectiveness of ensemble-based machine learning models, particularly XGBoost, in predicting corporate bankruptcy in the Indian context between 2020 and 2025. By leveraging quarterly financial and market-based data, the models provide forward-looking insights that significantly outperform traditional, lagging indicators. The superior performance metrics of XGBoost, including a 94.1% accuracy and an AUC-ROC of 0.97, reflect its robustness in handling class imbalances and capturing nonlinear relationships among features. The feature importance analysis validates established financial theory, underscoring the predictive power of debt-to-equity ratio, liquidity, and ROA. These results support the theoretical proposition that financial leverage, solvency, and operational profitability are key determinants of business failure. From a practical standpoint, the findings offer corporate stakeholders, investors, and regulators a scalable tool for dynamic risk monitoring. The sectoral and temporal segmentation further enhances the model’s utility by allowing tailored strategies based on industry-specific risk exposure and evolving macroeconomic conditions. Notably, the declining bankruptcy risk trend post-2021 suggests a post-pandemic stabilization effect, which could guide future policymaking and credit assessment models. The study also opens up several directions for future research, including integration of ESG indicators, supply chain disruptions, and sentiment analysis from unstructured data sources such as news and social media to enhance model responsiveness in real-time scenarios.

REFERENCES

The findings of this study reveal the powerful predictive capability of ensemble-based machine learning models in forecasting corporate bankruptcy among Indian firms from 2020 to 2025.

XGBoost emerged as the most effective model, achieving an accuracy of 94.1%, precision of 0.93, recall of 0.91, F1-score of 0.92, and an AUC-ROC of 0.97, outperforming Random Forest (accuracy 92.5%, AUC 0.95) and AdaBoost (accuracy 89.7%, AUC 0.92).
The confusion matrix showed XGBoost had low misclassification, correctly identifying 2,115 non-bankrupt and 521 bankrupt firms with only 79 false negatives and 85 false positives. Recursive Feature Elimination (RFE) pinpointed Debt-to-Equity Ratio (importance score = 0.273), Liquidity Ratio (0.214), ROA (0.191), and Operating Cash Flow (0.166) as the most influential variables.
Hypotheses testing reinforced these findings: liquidity constraints (t = 3.91, p < 0.0001), negative operating cash flow (z = -4.72, p < 0.0000), high leverage (χ² = 16.2, p = 0.001), and declining ROA (t = -2.87, p = 0.0043) significantly predicted bankruptcy. ROC threshold analysis showed that at a 0.6 threshold, XGBoost maintained a true positive rate of 0.85 with a false positive rate of 0.12—highlighting a strong balance between sensitivity and specificity.
Sector-wise,Financial Services (21.2%), Infra & Realty (13.2%), and Technology (12.1%) sectors faced the highest bankruptcy risk, while FMCG (4.6%) and Pharma & Health (5.5%) remained more resilient.
Temporally, the average bankruptcy risk score declined from 0.63 in Q1-2020 to 0.37 in Q4-2025, with bankruptcy events dropping from 38 to just 4, indicating post-pandemic recovery.

These insights not only align with financial theory but also have practical applications for early warning systems in corporate governance. Future studies should explore hybrid frameworks incorporating macroeconomic indicators, qualitative data from news and sentiment analysis, ESG scores, and global supply chain disruptions to enhance real-time adaptability and decision-making precision across various economic environments.

REFERENCES

Amaniyah, E., Mongid, A., Haryono, N. A., & Hariyati, H. (2025). Financial distress prediction model. In International Joint Conference on Arts and Humanities 2024 (IJCAH 2024) (pp. 1695–1709). Atlantis Press.
Akil, M. A. M., Perera, W. T. N. M., & Wijekoon, W. M. H. N. (2024). The use of financial ratios in predicting financial distress of listed entities in Sri Lanka.
Billios, D., Seretidou, D., & Stavropoulos, A. (2024). The power of numerical indicators in predicting bankruptcy: A systematic review. Journal of Risk and Financial Management, 17(10), 433.
Bonelli, M. I. (2024). Beyond financial ratios: A novel approach to bankruptcy prediction for Indian firms. SSRN. https://doi.org/10.2139/ssrn.5001034
Chen, T.-K., Liao, H.-H., Chen, G.-D., Kang, W.-H., & Lin, Y.-C. (2023). Bankruptcy prediction using machine learning models with the text-based communicative value of annual reports. Expert Systems with Applications, 233, 120714.
Dasilas, A., & Rigani, A. (2024). Machine learning techniques in bankruptcy prediction: A systematic literature review. Expert Systems with Applications, 255, 124761.
Dewi, D. N., Murhadi, W. R., & Sutejo, B. S. (2023). Financial ratios, corporate governance, and macroeconomic indicators in predicting financial distress. Journal of Law and Sustainable Development, 11(4), 1–18.
Gholampoor, H., & Asadi, M. (2024). Risk analysis of bankruptcy in the US healthcare industries based on financial ratios: A machine learning analysis. Journal of Theoretical and Applied Electronic Commerce Research, 19(2), 1303–1320.
Hidayatullah, F., Nasution, Y. S. J., & Syafina, L. (2024). Analyze financial performance and predict bankruptcy using the Du Pont and Altman Z-score on PT Gojek Tokopedia Tbk period 2020–2023. JPEK (Jurnal Pendidikan Ekonomi dan Kewirausahaan), 8(3), 1352–1366.
Islam, J., Saha, S., Hasan, M., Mahmud, A., & Jannat, M. (2024). Cognitive modelling of bankruptcy risk: A comparative analysis of machine learning models to predict the bankruptcy. In 2024 12th International Symposium on Digital Forensics and Security (ISDFS) (pp. 1–6). IEEE. https://doi.org/10.1109/ISDFS60904.2024
Kanapickienė, R., Kanapickas, T., & Nečiūnas, A. (2023). Bankruptcy prediction for micro and small enterprises using financial, non-financial, business sector and macroeconomic variables: The case of the Lithuanian construction sector. Risks, 11(5), 97.
Kokczyński, B., Witkowska, D., & Socha, B. (2024). Predicting bankruptcy: Insights from Polish non-public companies (2019–2022). European Research Studies Journal, 27(S2), 252–264.
Letkovský, S., Jenčová, S., & Vašaničová, P. (2024). Is artificial intelligence really more accurate in predicting bankruptcy? International Journal of Financial Studies, 12(1), 8.
Martin-Melero, I., Gomez-Martinez, R., Medrano-Garcia, M. L., & Hernández-Perlines, F. (2025). Comparison of corporate insolvency prediction in Spain employing financial ratios in analytical models and machine learning. Academia Revista Latinoamericana de Administración.
Murige, S. M., Simiyu, J. M., & Kimathi, H. (2023). Effect of profitability ratio on bankruptcy prediction of dairy cooperative societies in Meru County, Kenya.
Mwachikoka, C. F., Adil, M., & Phiri, J. (n.d.). Enhancing bankruptcy prediction: The integration of financial ratios and artificial intelligence for more accurate risk assessment.
Sharma, S., & Mittal, A. K. (2024). Evolution of financial, non-financial, and macroeconomic predictors in corporate bankruptcy: A comprehensive review. Economic Sciences, 20(2), 70–83.
Sizan, M. M. H., Chouksey, A., Miah, M. N. I., Pant, L., Ridoy, M. H., Sayeed, A. A., & Khan, M. T. (2025). Bankruptcy prediction for US businesses: Leveraging machine learning for financial stability. Journal of Business and Management Studies, 7(1), 1–14.
Soukal, I., Mačí, J., Trnková, G., Svobodova, L., Hedvičáková, M., Hamplova, E., Maresova, P., & Lefley, F. (2024). A state-of-the-art appraisal of bankruptcy prediction models focussing on the field’s core authors: 2010–2022. Central European Management Journal, 32(1), 3–30.
Valaskova, K., Gajdosikova, D., & Belas, J. (2023). Bankruptcy prediction in the post-pandemic period: A case study of Visegrad Group countries. Oeconomia Copernicana, 14(1), 253–293.
Vásquez-Serpa, L.-J., Rodríguez, C., Pérez-Núñez, J.-R., & Navarro, C. (2025). Challenges of artificial intelligence for the prevention and identification of bankruptcy risk in financial institutions: A systematic review. Journal of Risk and Financial Management, 18(1), 26.