A Method for Predicting the Type and Severity of Freeway Accidents Based on XGBoost
-
摘要: 高速公路事故频发,而以往研究未能充分揭示交通流动态特性对事故类型与严重程度的影响。为此研究了基于动态交通流数据的高速公路事故类型与严重程度的预测方法。从高速公路门架数据中提取流量、密度、速度等交通流数据,同时考虑时间特征以及时间和空间不均匀性特征的数据,与事故数据相匹配构成全样本。建立了基于极端梯度提升树(extrem Gradient Boosting,XGBoost)算法的预测模型,预测事故是否发生、事故类型以及事故严重程度。分别考虑追尾事故和其他事故2种事故类型、有人员伤亡和仅财产损失2种事故严重程度,模型的结果表明:①上下游速度差大、低速、路段车流量大且频繁分流、合流条件下交通事故风险较高;②低速、路段车辆多且合流、分流交通量大、上下游速度差大的情况下发生追尾事故的风险更高;③路段车流量较少且追尾事故发生于周末或夜间可能会增大事故严重程度。将常用机器学习算法与XGBoost算法的预测效果进行对比,XGBoost事故类型预测模型与事故严重程度预测模型的ROC曲线下面积(Area Under Curve,AUC)分别达到了0.76和0.88——相比于序列Logistic、高斯朴素贝叶斯、线性SVM、随机森林以及神经网络等其他常用算法,平均分别提升了0.08和0.24。这表明基于XGBoost建立的模型具有较好的预测性能。研究结果为高速公路路段实时交通流状态预警提供了可靠手段,进而可以提升高速公路行车安全。Abstract: Freeway accidents are frequent, and previous studies have failed to adequately reveal the effect of dynamic traffic flow on accident type and severity. This study focuses on a prediction method for types and severity of freeway accidents based on real-time traffic flow data. Traffic flow characteristics, including volume, density, and speed, are extracted from freeway gantry data. Simultaneously, temporal features and spatiotemporal non-uniformity features are considered. These data are then matched with accident data to constitute the full dataset for modeling. The model based on the extreme gradient boosting tree (XGBoost) algorithm is developed to predict the occurrence of accidents and accident types, and also to assess accident severity. Two types of accidents (i.e., rear-end collisions and other types of accidents) are considered and two levels of accident severity (i.e., injury or fatal accidents and proper-ty-damage-only accidents) are distinguished. The results indicate that: ①a higher risk of traffic accidents is associated with significant speed difference between upstream and downstream traffic, low speeds, high traffic volumes with frequent merging and diverging conditions; ②rear-end accidents are more likely to occur in situations with lower speeds, high traffic volumes with merging and diverging flows, and significant speed difference between upstream and downstream traffic; ③accidents involving rear-end collisions may result in higher severity when they occur on road segments with lower traffic volumes or occur during weekends or nighttime. The Area Under Curve (AUC) of the XGBoost-based models for accident types prediction and accident severity prediction reached 0.76 and 0.88 respectively. Compared with other commonly used algorithms such as Sequential Logistic, Gaussian Naive Bayes, Linear Support Vector Machine (SVM), Random Forest, and Neural Network, the XGBoost-based model demonstrates an average improvement of 0.08 and 0.24 in AUC values for predictions of accident types and accident severity. These results indicate that the XGBoost-based model exhibits better prediction performance. The research findings provide a reliable way for state warning of real-time traffic flow on freeway segments, which could be useful for improving driving safety.
-
Key words:
- traffic safety /
- freeway /
- prediction of accident types /
- prediction of accident severity /
- XGBoost
-
表 1 变量含义描述
Table 1. Description of variable meanings
名称 含义 Night 样本对应的时间(夜晚则取值为1;白天则取值为0) DayOfWeek 样本对应的日期(周末则取值为1;工作日则取值为0) LossNum 只通过该断面而未通过下游断面的交通量(可视为分流交通量,veh/min) NewNum 未通过该断面而通过下游断面的交通量(可视为合流交通量,veh/min) Dens 对应分钟平均密度[veh/(km·lane) CellFlow 路段总流量[veh/(h·lane) SecCount 分钟断面交通量(veh/min) InputGini 车辆到达的Gini系数(车头时距替代指标,表示时间不均匀性) FlowRateRatio 分钟断面流率比(分钟断面交通量·60·24 / 日累计交通量) DailyCountRatio 日断面流量比(日累计交通量与其均值之比) CvLaneNum 车道交通量变异系数(表示横向空间不均匀性) DensNum 对应分钟结束时刻的路段车辆数(veh) Conservation 路段总流入交通量与总流出交通量的差(veh/min, 纵向空间不均匀性) MeanSpeed 通过该门架的车辆到达下1个门架的平均速度(km/h) SpeedDifference 上下游速度差(km/h) 表 2 特征筛选结果
Table 2. Feature selection results
模型1采用特征 模型2采用特征 模型3采用特征 NewNum MeanSpeed DensNum LossNum LossNum DayOfWeek MeanSpeed FlowRateRatio CellFlow CvLaneNum InputGini Night SecCount SpeedDifference DailyCountRatio SpeedDifference NewNum LossNum DayOfWeek Dens Conservation 表 3 XGBoost模型最佳参数取值
Table 3. Optimal parameter values for XGBoost models
模型 基学习器数量(n_estimators) 学习率(learning_rate) 树最大深度(max_depth) 最小叶子权重(min_child_weight) 训练样本占总样本比例(subsample) 列采样率(colsample_bytree) 最小损失函数下降(gamma) 模型1 500 0.13 4 3 0.88 0.9 0.6 模型2 400 0.01 8 6 0.75 0.75 6 模型3 400 0.25 8 3 0.9 0.8 0.7 表 4 模型结果对比
Table 4. Comparison of the models' results
预测算法 事故风险预测模型 事故类型预测模型 事故严重程度预测模型 准确率 AUC值 准确率 AUC值 准确率 AUC值 XGBoost 0.97 0.96 0.72 0.76 0.94 0.88 序列Logistic 0.92 0.70 0.65 0.71 0.94 0.72 高斯朴素贝叶斯 0.83 0.67 0.62 0.69 0.84 0.54 线性SVM 0.91 0.72 0.64 0.69 0.94 0.61 随机森林 0.92 0.76 0.61 0.66 0.94 0.67 神经网络 0.92 0.74 0.61 0.67 0.88 0.66 -
[1] 马壮林, 邵春福, 胡大伟, 等. 高速公路交通事故起数时空分析模型[J]. 交通运输工程学报, 2012, 12(2): 93-99. doi: 10.3969/j.issn.1671-1637.2012.02.015MA Z L, SHAO C F, HU D W, et al. Temporal-spatial analysis model of traffic accident frequency on expressway[J]. Journal of Traffic and Transportation Engineering, 2012, 12 (2): 93-99. (in Chinese) doi: 10.3969/j.issn.1671-1637.2012.02.015 [2] 孟祥海, 张晓明, 郑来. 基于线形与交通状态的山区高速公路追尾事故预测[J]. 中国公路学报, 2012, 25(4): 113-118. doi: 10.3969/j.issn.1001-7372.2012.04.020MENG X H, ZHANG X M, ZHENG L. Prediction of rear-end collision on mountainous expressway based on geometric alignment and traffic conditions[J]. China Journal of Highway and Transport, 2012, 25(4): 113-118. (in Chinese) doi: 10.3969/j.issn.1001-7372.2012.04.020 [3] 张璇, 唐进君, 黄合来, 等. 山区高速公路隧道路段与开放路段的事故影响因素分析[J]. 交通信息与安全, 2022, 40(3): 10-18. doi: 10.3963/j.jssn.1674-4861.2022.03.002ZHANG X, TANG J J, HUANG H L, et al. An analysis of influential factors of crashes at tunnels and open sections of mountainous freeways[J]. Journal of Transport Information and Safety, 2022, 40(3): 10-18. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2022.03.002 [4] 马壮林, 张宏璐, 张祎祎, 等. 高速公路路侧事故起数预测模型[J]. 长安大学学报(自然科学版), 2017, 37(4): 119-126. doi: 10.3969/j.issn.1671-8879.2017.04.016MA Z L, ZHANG H L, ZHANG Y Y, et al. Roadside accident frequency prediction model on expressway[J]. Journal of Chang'an University(Natural Science Edition), 2017, 37(4): 119-126. (in Chinese) doi: 10.3969/j.issn.1671-8879.2017.04.016 [5] 高昆. 基于交通流的实时交通状态辨识及事故风险预警模型研究[D]. 西安: 长安大学, 2019.GAO K. Research on real-time traffic state identification and accident risk early warning model based on traffic flow[D]. Xi'an: Chang'an University, 2019. (in Chinese) [6] 马聪, 张生瑞, 马壮林, 等. 高速公路交通事故非线性负二项预测模型[J]. 中国公路学报, 2018, 31(11): 176-185. doi: 10.3969/j.issn.1001-7372.2018.11.019MA C, ZHANG S R, MA Z L, et al. Nonlinear negative binomial regression model of expressway traffic accident frequency prediction[J]. China Journal of Highway and Transport, 2018, 31(11): 176-185. (in Chinese) doi: 10.3969/j.issn.1001-7372.2018.11.019 [7] 王洁, 曲晓黎, 张金满. 河北高速公路交通事故特征及其气象预警模型[J]. 干旱气象, 2020, 38(2): 339-345. https://www.cnki.com.cn/Article/CJFDTOTAL-GSQX202002020.htmWANG J, QU X L, ZHANG J M. Characteristics of expressway traffic accident and meteorological warning model based on logistic regression in Hebei Province[J]. Journal of Arid Meteorology, 2020, 38(2): 339-345. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-GSQX202002020.htm [8] LEE C, ABDEL-ATY M, HSIA L. Potential real-time indicators of sideswipe crashes on freeways[J]. Transportation Research Record Journal of the Transportation Research Board, 2006, 1953, 1: 41-49. [9] ABDEL-ATY MA, HASSAN H M, AHMED M, et al. Real-time prediction of visibility related crashes[J]. Transportation Research Part C: EmergingTechnologies, 2012, 24: 288-298. doi: 10.1016/j.trc.2012.04.001 [10] SUN J, SUN J. A dynamic Bayesian network model for real-time crash prediction using traffic speed conditions data[J]. Transportation Research Part C: Emerging Technologies, 2015, 54: 176-186. doi: 10.1016/j.trc.2015.03.006 [11] KIM D, JUNG SY, YOON S. Risk predictionfor winterroad accidents on expressways[J]. Appl Sci-Basel, 2021, 11(20): 9534. [12] WANG C, ZHONG M, ZHANG H, et al. Impacts of real-time traffic state on urban expressway crashes by collision and vehicle type[J]. Sustainability, 2022, 14(4): 2238. [13] WAKATSUKI Y, TATEBE J, XING J. Improving the accuracy of traffic accident prediction models on expressways by considering additional information[J]. International Journal of IntelligentTransportation Systems Research, 2022, 20(1): 309-319. [14] WANG L, ZOU L J, ABDEL-ATY M, et al. Expressway rear-end crash risk evolution mechanism analysis under different traffic states[J]. Transportmetrica B: Transport Dynamics, 2022, 11(1): 510-527. [15] QU X, WANG W, WANG W F, et al. Real-time freeway sideswipe crash prediction by support vector machine[J]. IET Intelligent Transport Systems, 2013, 7(4): 445-453. [16] LI Z B, WANG W, CHEN R Y, et al. Conditional inference tree-based analysis of hazardous traffic conditions for rear-end and sideswipe collisions with implications for control strategies on freeways[J]. IET Intelligent Transport Systems, 2014, 8(6): 509-518. [17] YANG B, LIU P, CHAN C Y, et al. Identifying the crash characteristics on freeway segments based on different ramp influence areas[J]. Traffic Injury Prevention, 2019, 20(4): 386-391. [18] GUO Y Y, LI Z B, LIU P, et al. Exploring risk factors with crashes by collision type at freeway diverge areas: accounting for unobserved heterogeneity[J]. IEEE Access, 2019(7): 11809-11819. [19] YE F, CHENG W, WANG C S, et al. Investigating the severity of expressway crash based on the random parameter logit model accounting for unobserved heterogeneity[J]. Advances inMechanicalEngineering, 2021, 13 (12): 16878140211067278. [20] WANG Y G, LUO X Y. Analyzing rear-end crash severity for a mountainous expressway in China via a classification and regression tree with random forest approach[J]. Archives of Civil Engineering, 2021, 67(4): 591-604. [21] LIU B, MENG Y W, WANG H H, et al. Analysis of the influencing factors of traffic accidents based on the logistics method[C]. International Conference on Smart Transportation and City Engineering, Chongqing, China: SPIE, 2021. [22] PANDE A, NUWORSOO C, SHEW C. Proactive assessment of accident risk to improve safety on a system of freeways[R]. California, USA: Mineta Transportation Institute, 2012. [23] 孙剑, 孙杰. 城市快速路实时交通流运行安全主动风险评估[J]. 同济大学学报(自然科学版), 2014, 42(6): 873-879. https://www.cnki.com.cn/Article/CJFDTOTAL-TJDZ201406008.htmSUN J, SUN J. Proactive assessment of real-time traffic flow accident risk on urban expressway[J]. Journal of Tongji University(Natural Science Edilion), 2014, 42(6): 873-879. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-TJDZ201406008.htm [24] 马新露, 樊博, 陈诗敖, 等. 基于实时交通流的事故风险评估与分析模型[J]. 华南理工大学学报(自然科学版), 2021, 49(8): 19-25, 34. https://www.cnki.com.cn/Article/CJFDTOTAL-HNLG202108003.htmMA X L, FAN B, CHEN S A, et al. Evaluation and analysis model for freeways crash risk based on real-time traffic flow[J]. Journal of South China University of Technology(Natural Science Edition), 2021, 49(8): 19-25, 34. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-HNLG202108003.htm [25] CHEN T Q, GUESTRIN C. XGBoost: A scalable tree boosting system[C]. The 22nd ACM SigkddInternational Conferenceon Knowledge Discovery and Data Mining, San Francisco, California, USA: Association for Computing Machinery, 2016. [26] LUNDBERG S M, LEE S-I. A unified approach to interpreting model predictions[C]. Advances in Neural Information Processing Systems(NIPS 2017), Long Beach, California, USA: ACM, 2017.