Enhancing Road Safety in India: A Predictive Analysis Using Machine Learning Algorithm for Accident Severity Modeling

Humera Khanum; Rushikesh Kulkarni; Anshul Garg; Mir Iqbal Faheem

doi:10.5772/intechopen.1006547

Abstract

This chapter presents a comprehensive study aimed at enhancing road safety in India through the development and application of a machine-learning predictive model for traffic accident severity on Indian highways. With road accidents being a leading cause of death and injury, claiming approximately 1.35 million lives globally each year, India faces a particularly acute challenge, reporting nearly 449,002 road accidents in 2019 alone. This work leverages the adaptability and superior predictive accuracy of machine-learning algorithms to model accident severity, thereby providing a basis for understanding contributing factors and formulating effective preventive strategies. Employing a meticulous multistep methodology, this study involves the collection and preparation of data from authorized organizations for data availability, feature selection, model training, parameter tuning, and model evaluation based on statistical accuracy matrixes. The chapter concludes by highlighting the significant potential of integrating machine-learning techniques with enhanced data recording systems to improve road safety modeling, decision-making, and accident prevention, ultimately contributing to the reduction of road traffic accidents and their associated human and economic costs.

Keywords

road safety in India
machine-learning algorithms
accident severity modeling
traffic accident analysis
predictive analytics
accident prevention strategies

Author Information

Show +

Humera Khanum*
- Civil Engineering Department, Symbiosis Institute of Technology, Pune Campus, Symbiosis International (Deemed University), Pune, India
- School of Civil Engineering, Lovely Professional University, Phagwara, Punjab, India
Rushikesh Kulkarni
- Civil Engineering Department, Symbiosis Institute of Technology, Pune Campus, Symbiosis International (Deemed University), Pune, India
Anshul Garg
- School of Civil Engineering, Lovely Professional University, Phagwara, Punjab, India
Mir Iqbal Faheem
- Civil Engineering Department, Deccan College of Engineering and Technology, Hyderabad, India

*Address all correspondence to: author1@inbox.com

1. Introduction

1.1 Background

Across the globe, the issue of road safety stands out as one of the most significant areas of public health and development. Unfortunately, India bears an unequal share of this burden [1]. It is worth noting that according to the Global Status Report on Road Safety 2018, a significant number of road accidents are a threat to life as well as general health, thereby calling for comprehensive and more effective interventions [2, 3]. Particularly, the Indian subcontinent is in a peculiarly fragile state as it relates to traffic accidents, that claim lives besides maiming many others involved [4]. It has arisen because of a myriad of factors like driver behavior among them a lack of tolerance for other road users.

This chapter explores how machine-learning algorithms can enhance road safety in India by building a robust model for predicting the severity of an accident. The primary purpose of this study is to identify what causes accidents to be severe using a large dataset and analyze relevant traffic and road characteristics as outlined in the chapter “The effect of traffic and road characteristics on road safety: A review and future research direction” [4]. Information from this analysis could be instrumental in guiding targeted interventions on road infrastructure improvement before setting up a secure transport system.

It is worrying that the question of road safety in India is facing a crisis situation now. India is currently grappling with a severe road safety crisis which largely contributes to worldwide road traffic fatalities. A high incidence of road accidents is reported due to its high and increasing population in addition to rising car numbers alongside poor roads that do not support them Hence, despite owning only 1% of the world vehicles population, India contributed 11% of global death from road accidents according to Global Status Report on Road Safety 2018 (World Health Organization) cited earlier [1]. Therefore, there is an urgent need for proper road safety mechanisms in such a nation.

Road accidents result from an interaction between different factors, which include human error such as speeding, alcohol drinking while driving, use of a phone during driving, or driver fatigue significantly leads to the occurrence of road crashes. Poor infrastructure like bad road designs, poor lighting systems nonexistence of some precautions like pavements or guardrails as well as inadequacy in maintenance also predisposes the happening of accidents along roads. Vehicle condition: malfunctions may cause accidents when they occur in a car; brake failure, and tire burst among others may cause accidents. Weather conditions like rain, fog, or snow reduce visibility making the roads more slippery hence increasing the chances of having accidents on such roads. Traffic conditions: this involves too many vehicles on the road, traffic jams as well as aggressive driving behaviors among motorists which may lead to many road accidents happening every day.

Traditional road safety measures: in the bid to see that there is a reduction in the occurrence of crashes, several measures have been put in place over the years. Some of these conventional road safety measures have borne fruits when it comes to minimizing accidents, while still there are many other methods that can be employed. However, it is not always easy to follow this path because they face some difficulties that are as follows:

Reactivate approach: conventional interventions are rather responsive, being concerned with dealing with incidents once they occur not preventing them.

Data limitations: there is little data for focused interventions due to a lack of adequate information regarding the accidents happening along the road.

Enforcement challenges: the enforcement of traffic rules may be difficult in many countries due to the high population and scarce resources particularly those in the developing world.

Predictive analytics together with machine-learning were born as an alternative approach due to the limitations of traditional road safety interventions [5, 6]. These technologies have the following capabilities:

Historical records on accidents could be used to detect patterns using machine-learning algorithms. Once patterns are detected, they can be used to formulate predictive models.

Machine learning programs can examine big sets of data to find factors that increase the risk of accidents by studying past accidents. To prevent accidents on roads, decision-makers need to be aware of the likelihood of accidents occurring beforehand. Machine Learning models can help improve how money is spent on enforcing laws, running awareness campaigns, and other efforts to reduce traffic deaths. They are also effective in deciding how to allocate resources for law enforcement, improving roads, and educating people about road safety using data as a source [7].

The main goal of this review was to construct a predictive model on different accident incidents’ severities within India through the use of machine learning methods by Building Forecasting Models for Accident Severity.

To overcome the challenge of improving safety levels on our roads, this research adopts a machine-learning approach aimed at building a predictive model that can be used to predict accident severity in India.

This chapter attempts to address the pressing issue of road safety within India by exploring the application potential of machine learning in predicting the extent of injury caused during an accident. Consequently, it will draw from the original work—“The effect of traffic and road characteristics on road safety.” This methodology would largely entail applying large datasets to pinpoint major drivers behind accidents’ seriousness hence leading toward designing preventive interventions based on data analysis. The focus of this chapter is thus to provide information on specific interventions and road safety measures aimed at enhancing the transportation system’s safety within India.

2. Literature review

This research derives from the existing body of knowledge about the road safety situation and utilizes machine-learning techniques, which give valuable insights.

Globally, road traffic accidents are an urgent public health problem that results in many deaths, injuries, and significant economic losses. India is currently grappling with a major road safety issue owing to rapid population growth coupled with a rise in the number of vehicles on its roads [2, 8].

Traditional road safety measures, though not completely worthless, have tended toward a reactive and somewhat restricted approach. In several cases, this has resulted in the development of innovative solutions based on data, with machine learning showing potential for enabling proactive interventions.

However, its ramifications extend globally, mostly impacting less developed nations that account for over 85% of victims worldwide but only receive half of the licenses dispensed annually.

Road accidents result from a complex interplay of several factors. An analysis of traffic characteristics, road infrastructure, and their interaction is provided which underscores how such factors influence the frequency and severity of accidents [4]. The research argues that efficient safety strategies must take into consideration factors like traffic volume, speed, road geometry, and infrastructure design.

Human error, which encompasses behaviors like speeding, drunk driving distracted driving or even ‘driving while tired’, is still a major factor in road traffic crashes. There is an intricate connection between driving behavior and road safety. Thus, it adopts a theoretical framework that incorporates personal tendencies within society’s expectations against which cultural norms govern driving conduct in different contexts. Thus, focusing interventions would be important in changing such dangerous driving behaviors to facilitate road safety policy formulation [9].

Improving road safety not only requires focusing on infrastructure and enforcement but also creating a safety culture among users of the road. In this regard, the need to incorporate road safety education in school curricula is underscored, as it equips young people with adequate information on how well they can use roads safely when using them. In addition, further highlights that point concerning targeted driver education programs which aim especially at professionals thus leading to better safety outcomes on roads.

Successful road safety strategies need sound institutional frameworks and policy interventions based on solid evidence. The study examined how institutions and policy instruments affected road safety promotion in Uganda, focusing on public health outcomes. According to the research, the use of collaborative approaches that involve stakeholders as well as making evidence-based decisions will lead to sustainable road safety improvement.

Advancements in technology particularly data science and machine learning present opportunities in improving road safety. According to data analysis plays a crucial role in pinpointing safety concerns within the road safety ecosystem. The study underlined the importance of collecting and analyzing large-scale information sets for purposeful action planning and stakeholders. Overall, machine learning offers a powerful means for anticipating the seriousness of highway accidents, which can result in both proactive safety measures and the identification of high-risk situations.

“Modelling Road Accident Severity with Comparisons of Logistic Regression, Decision Tree, and Random Forest” assesses the effectiveness of different machine learning techniques that include logistic regression, classification, and regression tree, as well as random forest in predicting road accident severity. In the study, it is shown that there is no other method that can produce more accurate results than the use of random forests in estimating the level of severity of an accident [10, 11]. The article argues on the ability of this technique in forecasting for accident severity perspective.

“Identification of potential traffic accident hot spots based on accident data and GIS” shows how important spatial attributes should be integrated with crash data at both macro and micro levels when developing prediction models.

3. Road safety in India: an overview

The planet continues to be anxious regarding road safety as a whole and India is among the countries with higher challenges concerning this aspect. Therefore, the high number of road traffic crashes and associated injuries serve as justification for the development of novel ways that aim at preventing these incidents from happening [12]. This chapter focuses on enhancing road safety in India through predictive analysis using machine learning algorithms for accident severity modeling.

India is one of the countries having the largest road networks globally and this leads to numerous accidents due to among other reasons, driver behavior [13], condition of roads, and vehicle-related causes. The Global Status Report by the World Health Organization on Road Safety points out that road traffic injuries are the leading cause of death for people aged 5–29 years globally. Specifically, India’s Motor Vehicle Accident Report by the Government reveals a worrying trend in the severity of road crashes and loss of lives.

However, machine intelligence is here to take care of such challenges to manage them effectively. In this regard, machine intelligence is able to sift through massive sums of data in order to establish patterns that can be helpful in predicting where an accident is more likely to happen, what causes it, and in the end the means of containing it. Some studies have endorsed the application of machine learning in traffic accident analysis and hotspot prediction using various techniques such as decision trees, random forests, logistic regression, and so on [14].

This chapter seeks to examine how machine learning algorithms can be used in predicting the seriousness of road accidents especially in India. Predictive analytics based on historical data is crucial for identifying factors associated with higher relative severity thereby allowing for identification of potential accident hotspots. The primary intent is not just about reducing these incidences but also about ensuring that they are less severe resulting in saving lives as well as minimizing economic losses from road traffic-related incidents.

It is important to acknowledge that this involves many different things and not only accident severity modeling using machine learning. Consequently, the possibility of predictive analytics within transforming strategies for road safety depicts a shift from being reactive to proactive. This chapter thus provides theoretical background to ML in analyzing accidents; reviews some applied literature on this subject; and introduces methods used by this science discipline alongside describing some real-life examples when such methods were applied within the Indian environment.

In short advancing road safety through predictive analytics using ML based on severity estimation appears as an avenue toward mitigating the heavy burden of these incidents on roads in India. By using data and advanced analytics, actors involved in these processes will develop more effective methods to prevent road accidents which make Indian roads and other areas safer.

4. Machine learning algorithms: a primer

Machine learning (ML) algorithms are driving the data science revolution by transforming various industries and academic inquiry in themselves since they can learn from data, identify patterns, and make decisions with minimal human intervention. Ultimately this makes them very useful for dealing with tough problems that cannot be solved through traditional analytic methods. Thus, the classifications of ML algorithms range from their foundations to their types to how they can be exploited in modeling accident severity for road safety.

In terms of how they learn, ML algorithms fall under three main categories.

Supervised learning: in supervised learning, these algorithms are given labeled datasets and they try to determine what each feature of the input means with regard to the output. Using this information, they can henceforth predict unseen data or make decisions based on new information acquired through training such as regression models including linear regression models; decision trees; and support vector machines among others.

Unsupervised learning: these are algorithms that do not use labeled observations but look for patterns within the data itself such as clustering techniques or dimensionality reduction methods. An example includes k-means clustering which is an unsupervised algorithm while principal component analysis (PCA) falls under this category.

Reinforcement learning: this type of machine learning is based on the idea that when a living organism gets feedback for its actions, the organism learns how to act in an environment.

Foundational concepts: there are some foundational concepts behind how ML algorithms work.

Feature selection: feature selection is about choosing the most relevant input features to use in predicting.

Model training: this is the process of teaching or training ML algorithms how to make predictions or decisions based on data whose outcomes are already known.

Parameter tuning: this involves adjusting the settings within the ML model so that more accurate results can be achieved.

Model evaluation: evaluating the performance of a machine learning model according to specific metrics such as accuracy, precision-recall curves (PRCs), F1 score for classification, or Mean Square Error (MSE) in regressions.

Applications in road safety: in the area of the road safety domain, machine-learning systems have been mainly employed to develop models that are capable of predicting accident severity levels. For example, based on historical accident data over time, these models can provide probabilities of future occurrences and their related injuries or deaths, hence enabling targeted measures. For instance, supervised learning algorithms could predict accident hotspots or severity levels through road status factors, including weather conditions and traffic volume, among many others. At the same time, unsupervised learning of hidden patterns in accident data, which would not be evident at first glance, can provide new insights into strategies for prevention.

To summarize, ML algorithms are highly effective at solving complex challenges across different sectors, especially in making roads safer. Considering this fact, they lead to a reduction in accident severity through learning from data while giving exact recommendations on how to enhance road safety in general or save lives in worst-case scenarios. Nonetheless, one might expect certain progressions regarding AI use within roads due to technological advancements, such as addressing traffic congestion through these machines thereby requiring necessary knowledge pertaining to ML algorithms together with their applications.

5. Methodology

The suggested approach for this investigation encompasses the subsequent stages of actualizing an RF technique for estimating accident intensity by making use of machine learning concepts.

Data preparation: the initial stage to avail an RF model for prediction on the severity of injuries is to collect all the data and get it prepared. The data on road accidents for specific road stretches can be sourced from other sources such as the Ministry of Road Transport and Highways (MoRTH) and the National Highways Authority of India (NHAI).
Data wrangling and mining techniques shall be used to clean and preprocess the data.
Feature selection: after you have completed your data preparation process, selecting appropriate features becomes essential. According to its definition, feature selection refers to identifying and selecting the most useful predictors or independent variables. There are many ways to choose features, including statistical tests, correlations, and principal component analysis (PCA).
Model training: to this end, the preprocessed data can be used to train an RF model. Such a model can be created using machine learning. This algorithm uses bootstrap aggregating as well as random feature selection to build several decision trees that are then combined so as to attain better performance.
RF algorithm formulation: the RF algorithm can be represented as:

RF(X)=1B∑b=1B Tb(X)E1

where X is the input features, B is the number of trees, and Tb(X) is the prediction of the b-th individual decision tree.

Parameter tuning: for enhancing the performance of a random forest model, it is critical to tweak its settings. The tuning performance of a random forest mainly depends on three key parameters: the overall number of trees (n_estimators), the number of features for the node splitting (max_feature), and the maximum tree depth (max_depth).

In creating an RF model used to predict traffic accident gravity, we usually use Gini impurity as one of the measures to evaluate the importance of different explanatory variables. At each node split, Gini impurity, a measure employed in decision trees that are base learners within the RF framework, remains critical for selecting features optimally. It is a quantitative measure that guides us in noting how effectively this variable distinguishes our target classes.

Mechanism of Gini impurity: in the context of binary classification, the Gini impurity for a node is calculated as:

I_G(P)=∑k=1n p2kE2

where P is the proportion of samples classified to class k at that node, and the summation operates over all classes. A lower Gini impurity score suggests a higher purity of the node, indicating an enhanced classification.

Gini importance in RF: Gini impurity serves two purposes in the developed RF model: node splitting and feature importance. Node splitting helps in identifying the most important variable at each node by checking how much purity can be reduced due to split on all potential splits, while feature importance computes Gini importance which is the average reduction of impurity caused by each feature after training all trees. Gini importance provides insights on which features have more weight in relation to making predictions in this task.

Model evaluation: it is important to evaluate the model’s performance after training the RF model and optimizing its parameters. Different evaluation metrics like accuracy, precision, recall, F1 score, and Area Under The Curve-Receiver Operating Characteristics (AUC-ROC) curve may thus be employed.

Model implementation: after training and evaluating the model, it is ready to be used in predicting the seriousness of road accidents. This implies that a quantifiable process can be established with the aim of developing an algorithm in Python that will aid in predicting how bad road traffic will result in the future on highways in India.

6. Case study review: accident severity prediction

The study areas selected were the two stretches of Indian National Highways, (1) Pune-Sholapur Section of NH-9 in km 144/400 to km 249/000 in the State of Maharashtra, and (2) Six-Laning of Barwa-Adda-Panagarh Section of NH-2 from km 398.240 to km 521.120 including Panagarh Bypass in the States of Jharkhand and West Bengal [15] (Figure 1).

Figure 1.
Pune-Sholapur Section of NH-9 in the state of Maharashtra and Barwa-Adda-Panagarh Section of NH-2 in the states of Jharkhand and West Bengal.

The study areas were selected based on specific criteria. Firstly, the researchers had prior experience working on one of the stretches, the Pune-Sholapur Section of NH-9, from km 144/400 to km 249/000 in the State of Maharashtra. This experience could have provided insights and knowledge that could be useful in conducting the study. Additionally, data were also provided by the same concessionaire as of the previous stretch on request for another stretch, which is the Six-Laning of Barwa-Adda-Panagarh Section of NH-2 from km 398.240 to km 521.120, including the Panagarh Bypass in the States of West Bengal. This data could have been relevant to the research objectives and could have assisted in achieving the desired outcomes.

The primary aim of the research was to create a predictive model for the severity of traffic accidents on Indian highways utilizing Random Forest models, chosen for their precision and comprehensibility. The study’s results were employed to establish a predictive model for accident severity, which can contribute to the formulation of road safety strategies and measures. This model enables the identification of high-risk zones and the allocation of resources for accident prevention and mitigation.

6.1 Data collection and preparation

Source data: the analysis focused on the Pune-Solapur Section of NH-9, covering accident records from 2013 to 2018, and the Six-Laning of Barwa-Adda-Panagarh Section of NH-2, encompassing accident data from 2015 to 2019. Data on road accidents was collected from the Concessionaires of the National Highways Authority of India for these projects. Subsequently, exploratory data analysis was conducted on the raw data.

Data preparation: the secondary source data were utilized for exploration. The dataset comprises 3257 observations, with 1855 observations of the Bengal (BAEL) Section and 1402 observations related to Pune-Solapur. It includes 32 variables, among which is the target variable “accident severity.” Table 1 displays the attributes and their respective mappings.

Attributes	Mapping
Accident Index
Date
Day of week	1—Sunday, 2—Monday, 3—Tuesday, 4—Wednesday, 5—Thursday, 6—Friday, 7—Saturday
Time of Accident, Accident Location—A	1—Urban, 2—Rural, 3—Unallocated
Accident Location-A Chainage-km
Accident Location-A Chainage-km-RoadSide	LHS, RHS
Nature of Accident—B1, B2, B3	1—Overturning, 2—Head on collision, 3—Rear End Collision, 4—Collision Brush/Side Wipe, 5—Right Turn Collision, 6—Skidding, 7a—Others-Hit Cyclist, 7b—Others-Hit Pedestrian, 7c—Others-Hit Parked Vehicle, 7d—Others-Hit Fixed Object, 7e—Others-Wrong Side Driving, 7f—Others-Hit Animal, 7g—Others-Hit Two-Wheeler, 7h—Others-Unknown, 7i—Others-Fallen down, 8—Overtaking vehicle, 9—Left Turn Collision
Accident Severity—C	1—Fatal, 2—Grevious Injury, 3—Minor Injury, 4—Non-Injury (Damage only)
Classification of Accident—C1, C2, C3	1—Fatal, 2—Grevious Injury, 3—Minor Injury, 4—Non-Injury (Damage only)
Causes—D1, D2, D3, D4, D5	1—Drunken, 2—Overspeeding, 3—Vehicle out of control, 4a—Fault of driver of motor vehicle, 4b—Driver of other vehicle, 4c—Cyclist, 4d—Pedestrian, 4e—Passenger, 4f—Animal, 5a—Defect in mechanical condition of motor vehicle, 5b—Road condition
Road Feature—E	1—Single lane, 2—Two lanes, 3—Three lanes or more without central divider median, 4—Four lanes or more with central divider along with carriageway width
Road Condition—F	1—Straight Road, 2—Slight Curve, 3—Sharp Curve, 4—Flat Road, 5—Gentle incline, 6—Steep incline 7—Hump, 8—Dip
Intersection Type—G	1—T Junction, 2—‘Y Junction, 3—Four arm junction, 4—Staggered junction, 5—Roundabout, 6—Uncontrolled junction
Weather Conditions—H	1—Fine, 2—Mist/Fog, 3—Cloud, 4—Light Rain, 5—Heavy Rain, 6—Hail/sleet, 7—Snow, 8—Strong Wind, 9—Dust Storm, 10—Very Hot, 11—Very Cold, 12—Other extraordinary weather condition
Vehicle Type Involved—J—V1, V2, V3, V4	1—Car/Jeep/Van, 2—SUV, 3—Bus, 4—Mini Bus, 5—Truck, 6—Two—Wheeler, 7—Three—Wheeler, 8—Cycle, 9—Pedestrian, 10—Tractor, 11—Unknown, 12—Animal, 13—Objects, 14—LCV, 15—MAV
Number of Vehicles
Number of Casualties-Fatal, Grievous Injury, Minor Injury, Non Injured

Table 1.

Dataset attributes and parameters mapping.

6.2 Data modeling

The RF classification algorithm has been employed in this study to forecast the severity of road traffic accidents in India. This section details the procedure for implementing the model and performance evaluation and discusses the results obtained.

The target variable for the RF model is selected as the accident Severity’ which has classes as Fatal, Grevious Injury, Minor Injury and No Injury and indexed as [1—Fatal, 2—Grevious Injury, 3—Minor Injury, 4—No Injury]. The dataset is partitioned into training and testing sets with a ratio of 80 and 20%, respectively. The hyperparameters’n_estimators’ and’max_depth’ are specified, and a grid search is conducted with cross-validation (cv = 5) to identify the optimal hyperparameters. The best parameters and scores are obtained. The best estimator is fit on the training data. Predictions are made on the test data and the accuracy of the model is obtained. The algorithm and program for Accident Severity Modeling using RF is written in the Python programming language, and the code is made available to the public for further development. The source code can be accessed via the software availability statement. Accuracy analysis on test data: Three metrics were employed to evaluate the effectiveness of the algorithms: accuracy, precision, and recall.

6.3 Result and discussion

Model performance: The model used three hyperparameters: ‘max_depth’:10, ‘max_features’:‘sqrt’, and ‘n_estimators’: 100. The confusion matrix showed correct and incorrect classifications per class. With support, the classification report displayed precision, recall, and f1-score per class. The model had high precision and recall for class 1 but low precision and recall for classes 2, 3, and 4. The overall accuracy was 67%, with a weighted average f1-score of 0.64. The macro average f1-score, giving equal weight to each class, was 0.53.

The RF classifier model was optimized using a grid search with parameters: max depth of 2, n estimators of 5000, and random state of 0. After applying the model to test data, predictions were saved in an Excel file for analysis. The model’s accuracy on the test data were approximately 41.47%, showing its ability to predict traffic accident severity in 41.47% of cases.

6.3.1 Prediction output

6.3.1.1 Comparison between observed and predicted accident severity levels

The predicted values are generated by the RF model using the input features, while the actual accident severity indices are represented by the observed values. Figures 2 and 3 summarizes the comparison between observed and predicted values.

Figure 2.
Comparison of accident severity as observed and predicted index.

Figure 3.
Comparative analysis of observed and predicted accident severity index against time.

The RF model accurately predicts the accident severity index on dates like 25-02-2017, 17-04-2017, and 22-04-2017. On 18-02-2017, 23-02-2017, and 27-03-2017, the model predicted lower accident severity index values than observed. On 24-05-2017 and 20-10-2017, the model occasionally overestimated the accident severity index by predicting a higher value than observed.

The model may have a bias due to an imbalance in the training dataset, with severity index 2 occurring more frequently than other categories. This bias is evident when the model often predicts a severity index of 2 for accidents, even when the observed values differ.

6.3.1.2 Comparative analysis of observed and predicted accident severity index against time

The plot for the 165 rows of predicted data does not fit in the A4 sheet. Figure 4 displays the date, day of the week, and time of the accident, as well as the observed and predicted accident severity indices. The data are published and the link is provided in the Tableau graphs visuals availability [A-i].

Figure 4.
Comparison between the actual and predicted severity of accidents based on location and chainages-(RHS).

The dataset contains accident data from February 18 to December 31, 2017, analyzed using Tableau from the Excel table provided.

The accident severity index ranges from 1 to 4, with 1 being the least severe and 4 the most severe.

The majority of accidents in the dataset have a severity index of 3 or 4. A severity index of 2 indicates a less severe accident, while 4 indicates a more severe one. Most accidents are predicted to have a severity index of 2 or 1. The predicted severity index is typically lower than the observed severity index, indicating room for improvement in the accuracy of the accident severity prediction model.

6.3.1.3 Comparison between the actual and predicted severity of accidents based on location and chainages on the right-hand side (RHS)

The Tableau plot (Figure 5) displays accident data on the right side of the road. It shows the date, day of the week, accident location, observed accident severity index, and predicted accident severity index for each incident.

Figure 5.
Comparison between the actual and predicted severity of accidents based on location and chainages on the LHS.

The data are published and the link to the Tableau graphs provided in the availability of the visual [A-ii]. The plot effectively shows the spatial distribution of accidents and their severity over time, helping identify patterns and trends. The Tableau plot does not fit on an A4 sheet.

The majority of accidents have an observed severity index of 2 or 3, indicating moderate severity. However, the predicted accident severity index largely remains at 2, suggesting somewhat conservative predictions that do not fully capture the observed severity range.

External factors like traffic patterns or weather conditions may have a greater impact on the occurrence and severity of accidents than the day of the week. There seems to be no correlation between the day of the week and the frequency or severity of accidents.

6.3.1.4 Comparison between the actual and predicted severity of accidents based on location and chainages on the left-hand side (LHS)

The data are published and the link is provided in the Tableau graphs visuals availability [A-iii]. The graph displays the date, day of the week, and accident location on the Left Hand Side (LHS) of the road, along with observed and predicted accident severity indices. The plotted predicted data does not fit on an A4 sheet.

The majority of accidents on the left side of the road had a severity index of 2 or 3, indicating that most collisions were of moderate severity. Few instances of severity index 1 and 4 were observed.

The predictive model may be biased toward predicting less severe accidents, as the majority of cases had a predicted accident severity index of 2, with only a few instances of values 3 and 4.

The day of the week may not be a significant predictor of accident severity on the left side of the road, as accidents appeared to occur every day without a discernible pattern or trend.

There may not be a specific accident hotspot or concentration on the left-hand side of the road, as the accident locations were scattered along the roadway at various distances, measured by Accident Location-A Chainage km.

6.3.1.5 Data recording and availability

To enhance road safety modeling accuracy, India needs a more advanced data recording system for road accidents. This system should comply with MoRTH and IRC guidelines and utilize the Road Accident Recording and Reporting Formats. Digital monitoring can increase data collection frequency and minimize missing information. Machine learning can help regain missing data, improving road safety modeling accuracy.

6.3.2 Conclusion

To improve the model’s performance, correct dataset imbalance and adjust hyperparameters. The RF classifier predicted traffic accident severity with 67% accuracy on the training set and around 41.47% on the test set. It tended to underestimate severity, possibly due to bias in the training data. No clear link was found between the day of the week and accident occurrence or severity.

No discernible patterns or trends were observed in terms of accident location. The model frequently underestimated accident severity, although it accurately predicted it in some instances. External factors may have a greater influence on the occurrence and severity of accidents. The observed and predicted accident severity indices were compared against variables such as dates, times, and locations on both sides of the road.

To enhance road safety modeling, adopting a sophisticated data recording system in line with MoRTH and IRC recommendations is crucial. Digital monitoring of road accidents can boost data collection frequency and prevent vital information loss. Incorporating machine learning techniques can improve interventions and decision-making in traffic accident prevention and mitigation.

Our research in accident severity modeling stands out for leveraging Artificial Intelligence (AI) models, specifically the Random Forest (RF) algorithm. We focus on improving accuracy and providing tailored solutions for India’s road safety challenges. Our work is a standard for precise and reliable accident severity predictions with global applicability. This study contributes to the literature in this field.

6.3.3 Future scope

The study presented offers a solid foundation for future research in the area of road safety modeling and accident prevention on Indian highways. Despite the constraints of the current study, it highlights potential areas for further research, which will be explored in subsequent studies.

The study has recognized the presence of dataset bias and imbalance that could impact the performance of the model. Subsequent research will prioritize enhancing both the quality and quantity of data to mitigate bias and enhance model performance. This will entail investigating alternative data sources, refining data collection techniques, and resolving data quality concerns.

The study employed the Random Forest (RF) algorithm to construct a predictive model for the severity of traffic accidents. Future research will investigate the utilization of alternative machine learning algorithms or ensemble models to enhance the model’s performance. Furthermore, efforts will be made to refine hyperparameters and rectify dataset imbalance to enhance the accuracy of the model.

The analysis of external factors in accidents was emphasized in the study, focusing on their influence on predicting accident severity. Future research should investigate the effects of external factors like weather conditions, road infrastructure, and driver behavior on accident severity. This research can improve the precision of predictive models and provide valuable insights for decision-making in accident prevention strategies.

The study emphasized the necessity of implementing an advanced data recording system that complies with the guidelines established by MoRTH and IRC. Subsequent research could concentrate on the creation of a real-time monitoring system capable of collecting road safety data instantly and offering valuable information for initiatives aimed at preventing accidents.

7. Integrating machine learning in road safety applications: a paradigm shift toward taking preventative measures

The use of machine learning in road safety applications represents a fundamental shift from reactive approaches to preventing high-risk incidents through data-driven interventions. Such smart systems that can predict and prevent road accidents using machine learning on large datasets aid save lives as well as promote safer transport systems in general.

Use of predictive analytics in preventing accidents. To road safety, the prediction aspect plays a vital role in its application in this field due to its ability to predict the likelihood of events. ML algorithms, can, therefore, use historical accident records together with traffic patterns and other related information contained in the environment to predict where over time such events will be more likely to happen thus preventive measures should be put in place. One of the interventions implemented is Predictive Policing, which involves the deployment of law enforcement officers, either on foot or in vehicles, to specific high-risk areas where accidents are most common. This deployment occurs during peak hours each day, extending late into the night, with the exception of major routes leading out of town. Dynamic Traffic Management consists of promptly modifying speed limits, lane arrangements, or signal timing based on the current traffic conditions, such as congestion areas on highways. On the other hand, Customized Driver Notifications involve sending personalized alerts to drivers, taking into consideration their location, and driving mode, among other factors, to inform them about potential hazards.

The design and maintenance of road infrastructure can be improved using machine learning methods. This will help to promote driving safety since it becomes possible to have a more comprehensive examination of accident data alongside other aspects like pavement conditions, road geometry, and traffic flow patterns. Dangerous spots on roads need to be defined. These spots are usually more prone to accidents caused by poor road design, lack of proper maintenance, or insufficient road signs. Road design optimization involves the provision of guidelines on road geometry, road safety features implementation, and measures enhancing visibility in order to prevent traffic accidents. For this reason, there is an emphasis on maintenance activities such that road maintenance agencies can proactively address road defects and associated infrastructural failures leading to accidents that could have been avoided. The use of machine learning (ML) enabled Advanced Driver Assistance Systems (ADAS) is one such driving safety supplement since they form protective equipment inside the drivers’ car. The technology in the use of such systems entails sensors, cameras, and advanced algorithms that include the function called Lane Departure Warning for example. It alerts drivers when their vehicles move out of designated lanes hence averting unintended lane departures There is an Adaptive Cruise Control (ACC) that enables cars to vary their speed maintaining them at a safe distance from cars in front brake quickly enough to accommodate those behind in case of sudden stops thereby minimizing chances of rear-end collisions. The term Automatic Emergency Braking (AEB) is used to describe technology that detects potential accidents and then applies brakes to prevent or lessen their impact customized driver feedback and education are essential in addressing road safety concerns especially now that road carnages as a result of careless driving or mechanical problems which have become so rampant. The list includes fog, rain, and ice as well as construction zones or animal crossings as some of the factors to be considered. We can achieve more personalized driver feedback and an enhanced driving experience utilizing machine learning which aids individuals in adopting safe driving habits. It uses data obtained from smartphones, within-car sensors, and telematic devices to measure the behavior of drivers during the most dangerous sections of the road Additionally, this allows the detection of risky driving patterns seen as overspeeding, abrupt stops, and driving distractedly. Providing personalized feedback helps convince individuals to shun dangerous practices of driving for safer ones. Gamification has become inevitable to enhance road safety by incentivizing those drivers committer safe actions at all times Factors to consider Ensuring the safety of roads through machine learning involves taking different factors into account. How machine learning performs in such scenarios greatly depends on data quality and quantity. Hence in these cases, the appropriate action that is taken ensures that the data used for learning is accurate, comprehensive, and unbiased. One of the major concerns that might crop up when dealing with machine learning involves biases and fairness. When this happens, these issues could lead to unjust or discriminatory outcomes toward marginalized communities due to the replication of training data biases by models. As such, addressing issues concerning model fairness and bias should be a major concern in solving these problems Concerns about privacy may arise from collecting and analyzing huge amounts of data hence many people tend to be concerned when they are dealing with data in any activity. Transparency and ethics should be observed while handling this matter. To sum up, there is still great potential for this kind of technology to help us reduce road accidents at a very significant level while improving road safety through machine learning. The primary focus of this study is to investigate how machine learning algorithms can be applied to improve road safety. Machine learning can cause a revolution in road safety projects. In this particular context if data are proactively used instead of waiting for accidents they can have far fewer preliminary fatalities and this way move to a safer public transportation system. Thus, as already said above there are various advantages that come with the following methodical approach: Predictive analytics can be a useful tool for identifying possible risks beforehand. For example, some of these risks include accidents that take place in rainy conditions where high risks are involved. Within this field, among others, we have predictive policing, dynamic traffic control, and personalized driver notifications. It is essential to have a look at crashes along with infrastructure data by use of machine learning algorithms that enable identification of dangerous spots on roads for improvement in design hence optimization of maintenance activities which results into much safer roads. This contributes to the enhancement of road safety. These vehicle systems that rely on machine learning for instance the advanced driver assistance systems (ADAS) greatly enhance driver consciousness as they improve road safety They offer functions such as adaptive cruise control, autonomous emergency braking, and lane departure warning that work for all drivers regardless of sex or age while on the road. The study aims at improving road safety by the best means available which is machine learning techniques.

8. Conclusions

Improving safety on Indian roads by using predictive analysis and machine learning algorithms for accident severity modeling seems a promising avenue for addressing the crucial issue of road traffic accidents. In this regard, the focus of the study is on the possibility of using data analytics coupled with advanced analytics to come up with better strategies for preventing accidents and improving safety.

In predicting accident severity and spotting risky zones, Random Forest (RF) classifiers in particular have potentials that are enormously great when it comes to exploiting machine learning approaches. For example, among the various challenges faced lie imbalanced datasets as well as poor quality data collection methods but such problems are manageable based on this study indicating how they can be used for providing direction toward interventions meant to enhance road safety.

The research suggests that provided there is already enough database structure as well as continuous machine learning models’ adjustment one can enhance prediction about accident severity with respect to its accuracy and consistency. Such can also form input in policy policy-making process, resource allocations, or specifically into designating safer roads accordingly while taking into account various factors among which include types of roads, crossroads, etc. trailed information outlined by this research.

Integrating machine learning in strong data collection and analysis initiatives can enable stakeholders to transition from a reactive approach to managing traffic safety into a predictive one. This change in focus will help minimize deaths and financial losses from accidents on the roads while at the same time reducing the occurrence rate as well as the seriousness of accidents recorded annually.

9. Data availability, software availability, tableau graphs visual availability

Ref. [15].

9.1 Data availability

The Data of Accident Severity Prediction Modeling for Indian Highways Case Study stretches mentioned is available on Zenodo Open Access Repository and available for further analysis at https://doi.org/10.5281/zenodo.7773156 [16].

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

9.2 Software availability

https://github.com/humera-k/RF_Accident_Severity

https://zenodo.org/badge/latestdoi/616376786

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

9.3 Tableau graphs visual availability

Ref. [16].

https://public.tableau.com/app/profile/humera.khanum/viz/Accidental_Analysis_1/Sheet52 (Comparative analysis of observed and predicted accident severity index against time)
https://public.tableau.com/app/profile/humera.khanum/viz/Accidental_Analysis_1/Sheet3 (Comparative analysis of observed and predicted accident severity index against Location and Chainages-Right hand Side (RHS))
https://public.tableau.com/app/profile/humera.khanum/viz/Accidental_Analysis_1/Sheet4 (Comparative analysis of observed and predicted accident severity index against Location and Chainages-Left Hand Side (LHS))

Acknowledgments

Our sincere thanks are extended to the National Highways Authority of India and ILFS Engineering and Construction Company for their invaluable assistance in providing us with raw data on accidents. Their crucial support has played a key role in facilitating the execution of this research and analysis.

Conflict of interest

The authors declare no conflict of interest.

Abbreviations

RF	random forest
MoRTH	Ministry of Road Transport and Highways
NHAI	National Highways Authority of India

References

1. Global Status Report on Road Safety 2018. Geneva: World Health Organization; 2018
2. Patel M, Patel R. A study on causes of road accidents in India. International Journal of Engineering Research and Applications. 2013;3(6):1386-1391
3. Yan M, Shen Y. Traffic accident severity prediction based on random forest. Sustainability (Switzerland). 2022;14(3):2. DOI: 10.3390/su14031729
4. Wang C, Quddus MA, Ison SG. The effect of traffic and road characteristics on road safety: A review and future research direction. Safety Science. 2013. ISSN: 09257535;57:264-275. DOI: 10.1016/j.ssci.2013.02.012
5. Barbosa P, Andrade M, Ferreira S. Machine learning applied to road safety modeling: A systematic literature review. Journal of Traffic and Transportation Engineering (English Edition). 2020;7(6):775-790. DOI: 10.1016/j.jtte.2020.07.004
6. Al-Mistarehi BW, Alomari AH, Imam R, et al. Using machine learning models to forecast severity level of traffic crashes by R studio and ArcGIS. Frontiers in Built Environment. 2022;8:1-14. DOI: 10.3389/fbuil.2022.860805
7. Lord D, Mannering F. The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives. Transportation Research Part A: Policy and Practice. 2010;44:291-305. DOI: 10.1016/j.tra.2010.02.001
8. Ramanujam V, Bhalla K. Speeding on Indian roads: A survey of Indian drivers. Accident Analysis and Prevention. 2009;41(3):527-532. DOI: 10.1016/j.aap.2009.01.009
9. Daniel MC, Woo KH. Risky behaviors and road safety: An exploration of age and gender influences on road accident rates. PLoS One. 2024;19(1):e0296663. DOI: 10.1371/journal.pone.0296663
10. Hoang Long V, Ahmed K, Ma W. A random forest approach to predicting traffic accident severity. IEEE Access. 2021;9:1219-1232. DOI: 10.1109/ACCESS.2021.3098040
11. Singh G, Kumar A. Random forest-based prediction model for traffic accident severity on Indian highways. Journal of Traffic and Transportation Engineering (English Edition). 2021;8(6):693-706. DOI: 10.1016/j.jtte.2021.05.012
12. Road Safety Manual a Guide for Practitioners: Road Safefy Management. PIARC. Version 1-20/10/2015; 2019. p. 36. Available from: https://roadsafety.piarc.org/en/road-safety-management
13. Damodariya SM, Patel CR. Identification of factors causing risky driving behavior on high-speed multi-lane highways in India through principal component analysis. International Journal of Engineering. 2022;35(11):2130-2138
14. Santos D, Saias J, Quaresma P, Nogueira VB. Machine learning approaches to traffic accident analysis and hotspot prediction. Computers. 2021;10(12):157
15. Humera K, Anshul G, Iqbal FM. Accident severity prediction modeling for road safety using random forest algorithm: An analysis of Indian highways. F1000Research. 2023;12:494. DOI: 10.12688/f1000research.133594.2
16. Khanum H, Garg A, Faheem MI. Data for Accident Severity Prediction Modelling for Indian Highways Case Study (Accidentdata_V1). Zenodo; 2023. DOI: 10.5281/zenodo.7773156

[1] 1. Global Status Report on Road Safety 2018. Geneva: World Health Organization; 2018

[2] 2. Patel M, Patel R. A study on causes of road accidents in India. International Journal of Engineering Research and Applications. 2013;3(6):1386-1391

[3] 3. Yan M, Shen Y. Traffic accident severity prediction based on random forest. Sustainability (Switzerland). 2022;14(3):2. DOI: 10.3390/su14031729

[4] 4. Wang C, Quddus MA, Ison SG. The effect of traffic and road characteristics on road safety: A review and future research direction. Safety Science. 2013. ISSN: 09257535;57:264-275. DOI: 10.1016/j.ssci.2013.02.012

[5] 5. Barbosa P, Andrade M, Ferreira S. Machine learning applied to road safety modeling: A systematic literature review. Journal of Traffic and Transportation Engineering (English Edition). 2020;7(6):775-790. DOI: 10.1016/j.jtte.2020.07.004

[6] 6. Al-Mistarehi BW, Alomari AH, Imam R, et al. Using machine learning models to forecast severity level of traffic crashes by R studio and ArcGIS. Frontiers in Built Environment. 2022;8:1-14. DOI: 10.3389/fbuil.2022.860805

[7] 7. Lord D, Mannering F. The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives. Transportation Research Part A: Policy and Practice. 2010;44:291-305. DOI: 10.1016/j.tra.2010.02.001

[8] 8. Ramanujam V, Bhalla K. Speeding on Indian roads: A survey of Indian drivers. Accident Analysis and Prevention. 2009;41(3):527-532. DOI: 10.1016/j.aap.2009.01.009

[9] 9. Daniel MC, Woo KH. Risky behaviors and road safety: An exploration of age and gender influences on road accident rates. PLoS One. 2024;19(1):e0296663. DOI: 10.1371/journal.pone.0296663

[10] 10. Hoang Long V, Ahmed K, Ma W. A random forest approach to predicting traffic accident severity. IEEE Access. 2021;9:1219-1232. DOI: 10.1109/ACCESS.2021.3098040

[11] 11. Singh G, Kumar A. Random forest-based prediction model for traffic accident severity on Indian highways. Journal of Traffic and Transportation Engineering (English Edition). 2021;8(6):693-706. DOI: 10.1016/j.jtte.2021.05.012

[12] 12. Road Safety Manual a Guide for Practitioners: Road Safefy Management. PIARC. Version 1-20/10/2015; 2019. p. 36. Available from: https://roadsafety.piarc.org/en/road-safety-management

[13] 13. Damodariya SM, Patel CR. Identification of factors causing risky driving behavior on high-speed multi-lane highways in India through principal component analysis. International Journal of Engineering. 2022;35(11):2130-2138

[14] 14. Santos D, Saias J, Quaresma P, Nogueira VB. Machine learning approaches to traffic accident analysis and hotspot prediction. Computers. 2021;10(12):157

[15] 15. Humera K, Anshul G, Iqbal FM. Accident severity prediction modeling for road safety using random forest algorithm: An analysis of Indian highways. F1000Research. 2023;12:494. DOI: 10.12688/f1000research.133594.2

[16] 16. Khanum H, Garg A, Faheem MI. Data for Accident Severity Prediction Modelling for Indian Highways Case Study (Accidentdata_V1). Zenodo; 2023. DOI: 10.5281/zenodo.7773156