Please use this identifier to cite or link to this item:
https://repository.iimb.ac.in/handle/2074/11158
Title: | Spotting earnings manipulation: using machine learning for financial fraud detection | Authors: | Rahul, Kumar Seth, Nandini Dinesh Kumar, U |
Keywords: | Accrual Manipulation;Bagging;Boosting;Data Analytics;Earnings Manipulation;Ensemble Methods;Gaussian Model;Sampling;Simulation;Supervised Learning;Unsupervised Learning | Issue Date: | 2018 | Publisher: | Springer Verlag | Abstract: | Earnings manipulation and accounting fraud leads to reduced firm valuation in the long run and a public distrust in the company and its management. Yet, manipulation of accruals to hide liabilities and inflate earnings has been a long-standing fraudulent conduct amongst many listed firms. As auditing is time consuming and restricted to a sample of entries, fraud is either not detected or detected belatedly. We believe that supervised machine learning models can be used to determine high risk firms early enough for auditing by the regulator. We also discuss the anomaly detection unsupervised learning methodology. Since the proportion of manipulators is much lower than the non-manipulators, the biggest challenge in predicting earnings manipulation is the imbalance in the data leading to biased results for conventional statistical models. In this paper, we build ensemble models to detect accrual manipulation by borrowing theory from the seminal work done by Beneish. We also showcase a novel simulation-based sampling technique to efficiently handle imbalanced dataset and illustrate our results on data from listed Indian firms. We compare existing ensemble models establishing the superiority of fairly simple boosting models whilst commenting on the shortfall of area under ROC curve as a performance metric for imbalanced datasets. The paper makes two major contributions: (i) a functional contribution of suggesting an easily deployable strategy to identify high risk companies; (ii) a methodological contribution of suggesting a simulation-based sampling approach that can be applied in other cases of highly imbalanced data for utilizing the entire dataset in modeling. | URI: | https://repository.iimb.ac.in/handle/2074/11158 | ISBN: | 9783030041908 9783030041915 |
ISSN: | 0302-9743 | DOI: | 10.1007/978-3-030-04191-5_29 |
Appears in Collections: | 2010-2019 |
Show full item record
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.