A Short-term Prediction Model for Taxi Speed Based on XGBoost
-
摘要: 准确预测短时出租车速度是识别驾驶员异常加减速行为的前提,有助于提升乘客的安全与舒适。以城市中出租车实时移动速度为研究对象,研究了基于XGBoost的短时出租车速度预测模型。将出租车的移动速度数据集划分为训练集和测试集,构造滑动时间窗口,以时间窗口内的出租车历史移动速度的时间序列为输入变量,以出租车当前时间的移动速度为输出变量,采用前向验证的方法进行模型评估。利用基于贝叶斯算法的hyperopt模块实现模型参数的快速优化,得到模型最优参数组合,并基于深圳市2013年10月22日的出租车GPS轨迹数据集进行算例分析,将模型的预测结果与非参数回归模型、神经网络模型预测结果进行比较。研究表明:所构建的短时出租车速度预测模型的平均绝对误差(MAE)为9.841,均方根误差(RMSE)为12.711,均低于非参数回归模型和神经网络模型,提高了出租车速度的预测精度;由于出租车速度序列缺乏规律性,调整后的R2(R2 _adjusted)为0.592,且相较于其他2个模型,XGBoost模型在出租车速度发生急剧变化的时间点附近具有更优的拟合效果,避免了过拟合造成的预测精度下降。Abstract: An accurate short-term prediction for taxi speed is the premise of identifying abnormal driving behaviors of acceleration and deceleration in advance, which helps to enhance passengers'comfort and safety. A short-term prediction model is proposed to forecast real-time speed of taxis with an Extreme Gradient Boosting(XGBoost) model. The dataset of taxi speeds is divided into a training set and a test set, where a sequence of historical speed data in a time window are taken as an input variable, and the current speed data is taken as an output variable. The accuracy of the model is evaluated by a method called walk-forward validation. Based on the Bayesian algorithm, a hyperopt module is used to optimize model parameters, and a combination of optimal parameters can be obtained in a timely fashion. Experiments are carried out based on a data set of taxi GPS trajectory, which was collected in the City of Shenzhen on October 22, 2013, and the results of the proposed model are compared with those of two other models, including a non-parametric regression model and a neural network model. The results shows that the mean absolute error(MAE)and the root mean square error(RMSE)of the proposed model is 9.841 and 12.711. respectively. Due to the lack of regularity in the taxi speed sequence, the corrected R2(R2 _adjusted)is 0.592, which outperform those of the non-parametric regression model and the neural network model. Besides, compared with the two other models, the proposed model has a better goodness of fit under the scenario that a taxi suddenly changes its speed in a significant way, which can be used to avoid degraded accuracy due to model overfitting.
-
Key words:
- urban transportation /
- taxi speed /
- short-term prediction /
- XGBoost
-
表 1 XGBoost模型参数设置
Table 1. XGBoost model parameter settings
参数名 取值 参数名 取值 提升器 GBtree Gamma 0.05 学习率 0.05 列占比 0.6 迭代次数 1 000 子采样 0.6 树的最大深度 6 最小叶子节点权重和 3 表 2 模型预测性能评价
Table 2. Model prediction performance evaluation
选用模型 MAE RMSE R2_adjusted 非参数回归模型 13.660 20.279 0.595 神经网络模型 13.817 18.347 0.504 XGBoost模型 9.841 12.711 0.592 -
[1] 王萍, 万蔚, 张克, 等. 出租车驾驶员生态驾驶行为评价[J]. 交通工程, 2018, 18(6): 41-44+50. https://www.cnki.com.cn/Article/CJFDTOTAL-DLJA201806007.htmWANG P, WAN W, ZHANG K, et al. Taxi driver ecological driving behavior evaluation[J]. Journal of Transportation Engineering, 2018, 18(6): 41-44+50. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-DLJA201806007.htm [2] 任慧君, 许涛, 李响. 利用车载GPS轨迹数据实现公交车驾驶安全性分析[J]. 武汉大学学报(信息科学版), 2014, 39 (6): 739-744. https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH201406021.htmREN H J, XU T, LI X. Driving behavior analysis based on trajectory data collected with vehicle-mounted GPS receivers[J]. Geomatics and Information Science of Wuhan University, 2014, 39(6): 739-744. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-WHCH201406021.htm [3] 惠飞, 郭静, 贾硕, 等. 基于双向长短记忆网络的异常驾驶行为检测[J]. 计算机工程与应用, 2020, 56(24): 116-122. doi: 10.3778/j.issn.1002-8331.2006-0079HUI F, GUO J, JIA S, et al. Detection of abnormal driving behavior based on BiLSTM[J]. Computer Engineering and Applications, 2020, 56(24): 116-122. (in Chinese) doi: 10.3778/j.issn.1002-8331.2006-0079 [4] 赵建东, 陈溱, 焦彦利, 等. 重点营运车辆的异常驾驶行为识别研究[J]. 交通运输系统工程与信息, 2022, 22(1): 282-291. https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT202201030.htmZHAO J D, CHEN Q, JIAO Y L, et al. Recognition of abnormal driving behavior of key commercial vehicle[J]. Journal of Transportation Systems Engineering and Information Technology, 2022, 22(1): 282-291. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT202201030.htm [5] 王进, 史其信. 短时交通流预测模型综述[J]. 中国公共安全(学术卷), 2005(1): 92-98. https://www.cnki.com.cn/Article/CJFDTOTAL-GGAQ200501016.htmWANG J, SHI Q X. Summary of short-term traffic flow prediction models[J]. China Public Safety(Academic Volume), 2005(1): 92-98. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-GGAQ200501016.htm [6] SMITH B L, WILLIAMS B M, OSWALD R K. Comparison of parametric and nonparametric models for traffic flow forecasting[J]. Transportation Research Part C: Emerging Technologies, 2002, 10(4): 303-321. doi: 10.1016/S0968-090X(02)00009-8 [7] VLAHOGIANNI E I, KARLAFTIS M G, GOLIAS J C. Optimized and meta-optimized neural networks for short-term traffic flow prediction[J]. A Genetic Approach, 2005, 13(3), 211-234. [8] ZHANG H, WANG X M, CAO J, et al. A hybrid short-term traffic flow forecasting model based on time series multifractal characteristics[J]. Applied Intelligence, 2018, 48(8): 2429-2440. doi: 10.1007/s10489-017-1095-9 [9] 李振龙, 张利国, 钱海峰. 基于非参数回归的短时交通流预测研究综述[J]. 交通运输工程与信息学报, 2008, 6(4): 34-39. doi: 10.3969/j.issn.1672-4747.2008.04.007LI Z L, ZHANG L G, QIAN H F. Review of the short-term traffic flow forecasting based on non-parametric regression[J]. Journal of Transportation Engineering and Information, 2008, 6(4): 34-39. (in Chinese) doi: 10.3969/j.issn.1672-4747.2008.04.007 [10] 罗文慧, 董宝田, 王泽胜. 基于CNN-SVR混合深度学习模型的短时交通流预测[J]. 交通运输系统工程与信息, 2017, 17(5): 68-74. https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201705010.htmLUO W H, DONG B T, WANG Z S. Short-term traffic flow prediction based on CNN-SVR hybrid deep learning model[J]. Transportation System Engineering and Information, 2017, 17(5): 68-74. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201705010.htm [11] 冯微, 陈红, 张兆津, 等. 基于GBRBM-DBN模型的短时交通流预测方法[J]. 交通信息与安全, 2018, 36(5): 99-108. doi: 10.3963/j.issn.1674-4861.2018.05.013FENG W, CHEN H, ZHANG Z J, et al. A forecast of short-term traffic flow based on GBRBM-DBN model[J]. Journal of Transport Information and Safety, 2018, 36(5): 99-108. (in Chinese) doi: 10.3963/j.issn.1674-4861.2018.05.013 [12] SHI F, WANG Y, CHEN J, et al. Short-term vehicle speed prediction by time series neural network in high altitude areas[C]. 5th International Conference on Civil and Hydraulic Engineering, Nanjing, China: Hohai University, 2019. [13] YEON K, MIN K, SHIN J, et al. Ego-vehicle speed prediction using a long short-term memory based recurrent neural network[J]. International Journal of Automotive Technology, 2019, 20(4): 713-722. doi: 10.1007/s12239-019-0067-y [14] LADAN M, AHMAD M, NASSER L A. Vehicle speed prediction via a sliding-window time series analysis and an evolutionary least learning machine: A case study on San Francisco urban roads[J]. Engineering Science and Technology, an International Journal, 2015, 18(2): 150-162. doi: 10.1016/j.jestch.2014.11.002 [15] 冯安琪, 钱丽萍, 黄玉蘋, 等. RFID环境下基于自适应卡尔曼滤波的高速移动车辆速度预测[J]. 计算机科学, 2019, 46 (4): 100-105. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJA201904016.htmFENG A Q, QIAN L P, HUANG Y P, et al. RFID data-driven vehicle speed prediction using adaptive Kalman filter[J]. Computer Science, 2019, 46(4): 100-105. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JSJA201904016.htm [16] 郭兴, 马彬, 杨朝红, 等. 考虑驾驶行为预判的改进ARIMA车速短期预测方法[J]. 重庆理工大学学报(自然科学), 2022, 36(1): 51-59. https://www.cnki.com.cn/Article/CJFDTOTAL-CGGL202201006.htmGUO X, MA B, YANG C H, et al. An improved ARIMA speed short-term prediction method driving behavior prediction[J]. Journal of Chongqing University of Technology (Natural Science), 2022, 36(1): 51-59. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-CGGL202201006.htm [17] 连静, 刘爽, 李琳辉, 等. 插电式混合动力汽车车速预测及整车控制策略[J]. 控制理论与应用, 2017, 34(5): 564-574. https://www.cnki.com.cn/Article/CJFDTOTAL-KZLY201705002.htmLIAN J, LIU S, LI L H, et al. Plug-in hybrid electric vehicle speed prediction and control strategy[J]. Control Theory and Application, 2017, 34(5): 564-574. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-KZLY201705002.htm [18] CHEN T Q, GUESTRIN C. XGBoost: A scalable tree boosting system[C]. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, USA: Association for Computing Machinery, 2016. [19] 钟颖, 邵毅明, 吴文文, 等. 基于XGBoost的短时交通流预测模型[J]. 科学技术与工程, 2019, 19(30): 337-342. https://www.cnki.com.cn/Article/CJFDTOTAL-KXJS201930052.htmZHONG Y, SHAO Y M, WU W W, et al. Short-term traffic flow prediction model based on XGBoost[J]. Science Technology and Engineering, 2019, 19(30): 337-342. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-KXJS201930052.htm [20] 叶景, 李丽娟, 唐臻旭. 基于CNN-XGBoost的短时交通流预测[J]. 计算机工程与设计, 2020, 41(4): 1080-1086. https://www.cnki.com.cn/Article/CJFDTOTAL-SJSJ202004030.htmYE J, LI L J, TANG Z X. Short-term traffic flow forecasting based on CNN-XGBoost[J]. Computer Engineering and Design, 2020, 41(4): 1080-1086. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-SJSJ202004030.htm [21] 宋瑞蓉, 路树华, 王斌君, 等. 基于ABiLSTM与XGBoost组合模型的交通时间预测[J]. 软件导刊, 2021, 20(8): 20-28. https://www.cnki.com.cn/Article/CJFDTOTAL-RJDK202108005.htmSONG R R, LU S H, WANG B J, et al. Traffic time prediction based on ABiLSTM and XGBoost combination model[J]. Software Guide, 2021, 20(8): 20-28. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-RJDK202108005.htm [22] 梁泉, 翁剑成, 胡娟娟, 等. 融合XGBoost和图谱修正的公交通勤乘客目的地预测方法[J]. 交通信息与安全, 2021, 39 (4): 68-76. doi: 10.3963/j.jssn.1674-4861.2021.04.009LIANG Q, WENG J C, HU J J, et al. Travel destination prediction of public transport commuters by integrating XGBoost algorithm and graph adjustment method[J]. Journal of Transport Information and Safety, 2021, 39(4): 68-76. doi: 10.3963/j.jssn.1674-4861.2021.04.009 [23] ZHANG D S, ZHAO J J, ZHANG F, et al. UrbanCPS: A cyber-physical system based on multi-source big infrastructure data for heterogeneous model integration[C]. 6th International Conference on Cyber-Physical Systems, New York, USA: ACM, 2015.