Identification of Bunching State of Bus Lines Based on a LightGBM Model
-
摘要: 同1条线路上相邻公交车辆由于受到道路等因素的影响,其实际车头时距与发车间隔相比显著缩短,导致相邻车辆在较短的时间内到达同1个公交站点即引发公交线路“串车运行”问题(即相邻公交车辆在实际运行中的车头时距与发车间隔相比显著缩短的现象)。辨识线路的串车状态(串车运行和非串车运行)是进一步提升城市公交车辆运营的稳定性的关键。提出了基于贝叶斯参数优化的LightGBM模型,并将其用以识别公交线路串车状态。从站点、运行、乘客、时间和天气这5个角度初步选取20个影响线路串车状态的关键因素,并采用Spearman相关性检验和方差膨胀因子诊断多重共线性。建立二元Logit模型进行影响因素分析。提取显著的影响因子,构建LightGBM模型用以识别线路串车状态,并分别采用贝叶斯优化与随机搜索优化对模型中用以确定模型属性和训练过程的超参数进行寻优。以西安市公交车辆行车数据为例进行模型的应用验证,对比2种参数寻优方法(贝叶斯优化与随机搜索优化)的效率,并将提出的LightGBM模型与XGBoost、随机森林(RF)、决策树模型(DT)和AdaBoost模型的识别精度进行对比。研究表明:上车乘客数、信号灯数量、近距商业区数量、近距内主路上行驶的距离(即车辆在近距离范围内在主道路上行驶过的距离)和拥堵延时指数对线路串车状态有显著影响。LightGBM模型的参数采用贝叶斯优化比采用随机搜索优化的准确率提高了1.31%。采用贝叶斯算法优化参数的LightGBM模型比采用随机搜索算法优化的准确率提高了1.31%。所提出的经贝叶斯优化的LightGBM模型正确识别公交线路串车状态(包括串车运行和非串车运行)的比率为82.89%,识别性能优于对比模型。Abstract: Actual headways of adjacent buses of a same line can be significantly shortened, compared with the departing intervals, due to the influences of road situations and other factors, resulting in adjacent buses arriving at the same bus station in a relatively short period of time. This is called "bus bunching" in the transit industry. Identification of the bunching state of bus lines(i.e., bunching or non-bunching)is a key step to improve the operation of the urban public transit system. A LightGBM model with its parameters optimized by a Bayesian algorithm is proposed and applied to identify the bunching state. First, 20 factors related to the following five aspects including bus stops, operation, passengers, time, and weather, which potentially influence the bus bunching state, are selected. Spearman correlation test and variance inflation factor are used to diagnose their multi-collinearity. Then, a binary Logit model is developed to identify the significant impact factors, based on which the LightGBM model is developed to identify the bus bunching state. The super parameters of the LightGBM model(which are used to determine model attributes and training process)are optimized by a Bayesian optimization and a random search optimization, respectively. Finally, bus operation data from the City of Xi'an, China is used to verify the proposed model. The efficiency of the above two parameter optimization methods(i.e., Bayesian and random search)are compared, and the identification accuracy of the proposed LightGBM model is compared with XGBoost, Random Forest(RF), Decision Tree(DT)and AdaBoost models. Study results show that: first, the following factors, including number of passengers, number of signal lights, number of business districts within a short range, driving length on the main road within a short-range and traffic congestion index have a significant impact on the bus bunching state; second, the accuracy of the LightGBM model with its parameters optimized with the Bayesian method is 1.31%higher than that model with its parameters optimized by the random search method; third, the accuracy of the proposed Bayesian optimized LightGBM model for identifying the two bus bunching states(i.e., bunching or non-bunching)reaches 82.89%, which is found to be better than the above competing models.
-
表 1 影响因素的描述
Table 1. Description of influencing factors
符号 因素 选取角度 因素特征 因素描述 相关文献支撑 F 上车乘客数/人 乘客 累计 观测时段内本线路所有车辆从始发站至观测站点的前1站累计上乘人数 文献[8-10] f 近距上车乘客数/人 乘客 即时 上1个观测站点至本观测站点间,本观测站点前50%的站点处至前1站范围内在观测时段本线路所有车辆累计上乘人数 文献[8-10] T 时段 时间 0=平峰; 1=高峰 文献[10] L 公交专用道长度/km 运行 累计 车辆从始发站行驶到观测站点累计经过的公交专用道长度 文献[6] lP 近距公交专用道占比 运行 即时 上1个观测站点至本观测站点间,本观测站点前50%的站点范围内公交专用道线路长度占该范围总长度的比值 文献[6] R 主路行驶距离/km 运行 累计 车辆从始发站行驶到观测站点累计在主路上行驶的长度 结合专家知识和经验,通过访谈选取 r 近距主路行驶距离/km 运行 即时 上1个观测站点至本观测站点间,本观测站点前50%的站点范围内车辆在主路上行驶的长度 结合专家知识和经验,通过访谈选取 TL 左转弯数量/次 运行 累计 车辆从始发站行驶到观测站点累计操纵左转弯的次数 文献[6] tl 近距左转弯数量/次 运行 即时 上1个观测站点至本观测站点间,本观测站点前50%的站点范围内车辆操纵左转弯的次数 文献[6] S 信号灯数量/个 运行 累计 车辆从始发站行驶到观测站点累计途径的信号灯个数 文献[7-8] s 近距信号灯数量/个 运行 即时 上1个观测站点至本观测站点间,本观测站点前50%的站点范围内途径的信号灯个数 文献[7-8] M 行驶里程/km 运行 累计 车辆从始发站行驶到观测站点途径的线路长度 文献[7] m 近距行驶里程/km 运行 即时 上1个观测站点至本观测站点间,本观测站点前50%的站点范围内途径的线路长度 文献[7] B 商业区数量/个 运行 累计 车辆从始发站行驶到观测站点经过的商业规模大于3万m2的商业区的数量 结合专家知识和经验,通过访谈选取 b 近距商业区数量/个 运行 即时 上1个观测站点至本观测站点间,本观测站点前50%的站点范围内经过的商业规模大于3万m2的商业区的数量 结合专家知识和经验,通过访谈选取 c 拥堵延时指数 运行 即时 上1个观测站点至本观测站点间,本观测站点前50%的站点处至前1站范围内路段的拥堵延时指数的平均值 文献[8] D 距离始发站的站数/个 站点 累计 车辆从始发站至观测站点经过的站点数量 文献[7] U 换乘地铁数量/个 站点 累计 车辆从始发站行驶到观测站点经过的500 m范围内有地铁换乘点的公交站点的数量 结合专家知识和经验,通过访谈选取 u 近距换乘地铁数量/个 站点 即时 上1个观测站点至本观测站点间,本观测站点前50%的站点范围内,在500 m范围内有地铁换乘点的公交站点的数量 结合专家知识和经验,通过访谈选取 W 天气 天气 0=晴天;1=雨天 结合专家知识和经验,通过访谈选取 表 2 公交IC卡数据说明
Table 2. Bus IC card data description
字段英文名 字段中文名 说明 示例 BUSID 车ID号 用于描述公交车辆的ID号 5083 TRADEDATETIME 交易日期 用于描述乘客上车刷卡的时间 2017/7/5 08:04:45 LINENAME 线路名 用于描述公交车运行的线路信息 38路 STATIONNAME 站点名 用于描述公交站点的名称 朝阳门 STATIONID 线路号 用于描述公交站点的ID号 66 USERTYPE 优惠类型 用于描述乘客使用的公交卡类型 1 BUSNAME 车号 用于描述公交车的车辆信息 陕AK7160 表 3 Spearman相关性检验
Table 3. Spearman correlation test results
r b S F tl lp s u c W T f L R TL M m B D U r 1.000 0.476 0.136 -0.217 0.296 -0.474 0.367 0.412 0.233 -0.023 -0.117 0.217 0.393 0.380 -0.003 0.098 0.856 0.292 0.005 0.485 b 1.000 -0.291 -0.329 0.172 -0.379 0.471 0.340 0.139 0.005 -0.010 0.270 -0.285 -0.285 0.003 -0.386 0.859 -0.153 -0.380 -0.079 S 1.000 0.374 0.398 -0.453 0.059 -0.412 0.225 -0.004 -0.038 -0.158 0.479 0.933 0.490 0.572 -0.150 0.955 0.944 0.356 F 1.000 0.028 -0.178 -0.158 -0.397 0.173 0.252 0.297 0.604 0.329 0.461 0.413 0.538 -0.216 0.493 0.567 0.655 tl 1.000 -0.384 0.217 -0.340 0.025 -0.009 0.019 -0.030 0.336 0.396 0.786 0.396 0.301 0.408 0.339 0.082 lp 1.000 -0.178 0.395 0.044 0.163 -0.026 -0.077 -0.323 -0.355 -0.485 -0.392 -0.360 -0.620 -0.348 -0.203 s 1.000 0.303 -0.082 -0.142 -0.024 0.031 -0.069 -0.015 0.250 -0.010 0.635 0.035 -0.074 -0.010 u 1.000 0.254 0.001 -0.113 -0.062 -0.193 -0.193 -0.483 -0.400 0.412 -0.469 -0.400 0.215 c 1.000 0.034 0.011 -0.096 0.461 0.468 -0.034 0.353 -0.043 0.245 0.244 0.350 W 1.000 0.078 0.330 0.042 0.007 -0.008 0.008 -0.018 -0.008 0.008 -0.007 T 1.000 0.302 -0.109 -0.090 0.101 -0.058 -0.086 -0.007 0.004 -0.195 f 1.000 0.287 0.259 0.143 0.227 0.069 0.054 0.084 -0.398 L 1.000 0.986 0.361 0.950 -0.068 0.844 0.846 0.813 R 1.000 0.457 0.964 -0.082 0.871 0.895 0.758 TL 1.000 0.554 0.003 0.665 0.649 0.151 M 1.000 0.192 0.927 0.931 0.628 m 1.000 -0.086 -0.325 0.210 B 1.000 0.930 0.462 D 1.000 0.415 U 1.000 注:有底色标出的为显著强相关。 表 4 方差膨胀因子检验结果
Table 4. Variance inflation factor test results
变量符号 容差 VIF 变量符号 容差 VIF r 0.103 9.671 s 0.360 2.779 b 0.202 4.952 u 0.103 9.750 S 0.185 5.413 c 0.596 1.677 F 0.387 2.583 W 0.773 1.294 tl 0.350 2.857 T 0.753 1.328 lp 0.122 8.171 表 5 HL检验结果
Table 5. Theresultsof HL test
检验指标 检验值 卡方 10.461 自由度 8 显著性 0.234 表 6 Logit模型回归结果
Table 6. Logit model regression results
变量符号 β Sig F 0.006 < 0.001 T(1) 0.303 0.362 lp -1.261 0.511 r -1.644 0.007 tl 0.063 0.834 S 0.093 0.037 s 0.114 0.269 b 1.107 0.001 c 0.449 0.004 u 0.823 0.447 W(1) 0.102 0.716 常量 -2.669 0.150 表 7 最优参数组合
Table 7. Optimal parameter combination
参数名称 参数范围 随机搜索优化 贝叶斯优化 max_depth (3, 10) 7 8 num_leaves (10, 200) 109 150 learning_rate (0.01, 0.3) 0.06 0.05 subsample (0.7, 1) 0.85 0.85 colsample_bytree (0.7, 1) 0.95 0.95 min_data_in_leaf (20, 25) 24 21 表 8 对比模型的最优参数组合
Table 8. Optimal parameter combination of comparison model
参数 模型 XGBoost RF DT AdaBoost 分类器个数 45 41 37 最大特征数 5 4 5 随机数种子 13 8 6 1 最大深度 5 5 叶子节点最少样本数 6 1 学习率 0.01 0.8 注: 表中空项表示模型不设置此项参数或此参数采用默认值。 表 9 识别结果对比分析
Table 9. Comparative analysis of recognition results
单位: % 模型 准确率 精确率 召回率 F1分数 训练集 测试集 训练集 测试集 训练集 测试集 训练集 测试集 贝叶斯优化LightGBM 85.11 82.89 84.17 79.31 81.45 76.67 82.79 77.97 随机参数优化LightGBM 84.40 81.58 83.90 76.67 79.84 76.67 81.82 76.67 XGBoost 81.56 76.32 82.69 75 71.67 70.59 76.79 72.73 RF 80.77 73.61 80.73 70.97 72.13 68.75 76.19 69.84 DT 79.37 72.22 77.12 65.71 73.98 74.19 75.52 69.70 AdaBoost 76.22 72.22 72.36 66.67 72.36 70.97 72.36 68.75 表 10 每类串车状态的识别准确率
Table 10. Recognition accuracy of each type of bus bunching state
单位: % 实际线路串车状态 贝叶斯优化
LightGBM识别随机搜索优化
LightGBM识别XGBoost识别 RF识别 DT识别 AdaBoost识别 非串车运行 串车运行 非串车运行 串车运行 非串车运行 串车运行 非串车运行 串车运行 非串车运行 串车运行 非串车运行 串车运行 非串车运行 0.87 0.13 0.85 0.15 0.81 0.19 0.76 0.24 0.71 0.29 0.73 0.27 串车运行 0.24 0.76 0.24 0.76 0.29 0.71 0.31 0.69 0.26 0.74 0.29 0.71 -
[1] NEWELL G F. Control of pairing of vehicles on a public transportation route: Two vehicles, one control point[J]. Construction and Building Materials, 1974, 8(3): 248-264. [2] MOREIRA-MATIAS L, GAMA J, MENDES-MOREIRA J, et al. An incremental probabilistic model to predict bus bunching in real-time[J]. Advances in Intelligent Data Analysis XIII, 2014(8819): 227-238. [3] 焦道通. 基于智能公交数据的多条线路站点串车机理研究[D]. 成都: 西南交通大学, 2019.JIAO D T. Study on the mechanism of multi-line bus bunching based on intelligent bus data[D]. Chengdu: Southwest Jiaotong University, 2019. (in Chinese) [4] SCHMOCKER J D, SUN W, FONZONE A, et al. Bus bunching along a corridor served by two lines[J]. Transportation Research Part B: Methodological, 2016(93): 300-317. [5] ZHANG H, CUI H, SHI B. A data-driven analysis for operational vehicle performance of public transport network[J]. IEEE Access, 2019(7): 96404-96413. [6] RASHIDI S, RANJITKAR P, CSABA O, et al. Using automatic vehicle location data to model and identify determinants of bus bunching[J]. Transportation Research Procedia, 2017(25): 1444-1456. [7] ARRIAGADA J, GSCHWENDER A, MUNIZAGA M A. Modeling bus bunnching using massive location and fare collection data[J]. Journal of Intelligent Transportation Systems, 2019, 23(4): 332-344. doi: 10.1080/15472450.2018.1494596 [8] DENG Y J, LIU X H, HU X B, et al. Reduce bus bunching with a real-time speed control algorithm considering heterogeneous roadway conditions and intersection delays[J]. Journal of Transportation Engineering Part A-Systems, 2020, 146(7): 04020048. doi: 10.1061/JTEPBS.0000358 [9] ZHANG H, LIU Y J, SHI B Y, et al. Analysis of spatial-temporal characteristics of operations in public transport networks based on multisource data[J]. Journal of Advanced Transportation, 2021(11): 1-15. [10] MOOSAVI, SEYED M H, WAH Y C. Measuring bus running time variation during high-frequency operation using automatic data collection systems[J]. ITE Journal-Institute of Transportation Engineers, 2020, 90(1): 45-49. [11] 张建, 丁建勋, 龙建成, 等. 公交线路车头时距特征分析及运行状态研究[J]. 交通运输系统工程与信息, 2015, 15(6): 220-226. doi: 10.3969/j.issn.1009-6744.2015.06.033ZHANG J, DING J X, LONG J C, et al. The exploration of time-headway characteristic and operation status on the bus route[J]. Journal of Transportation Systems Engineering and Information Technology, 2015, 15(6): 220-226. (in Chinese) doi: 10.3969/j.issn.1009-6744.2015.06.033 [12] HANS E, CHIABAUT N, LECLERCQ L, et al. Real-time bus route state forecasting using particle filter and mesoscopic modeling[J]. Transportation Research Part C: Emerging-Technologies, 2015(61): 121-140. [13] DENG Y J, LUO X, HU X B, et al. Modeling and prediction of bus operation states for bunching analysis[J]. Transportation Engineering Journal of ASCE, 2020, 146(9): 04020106. [14] SUN W Z, SCHMOCKER J D, NAKAMURA T. On the tradeoff between sensitivity and specificity in bus bunching prediction[J]. Journal of Intelligent Transportation Systems, 2021, 25(4): 384-400. [15] YU H Y, CHEN D W, WU Z H, et al. Headway-based bus bunching prediction using transit smart card data[J]. Transportation Research Part C: Emerging Technologies, 2016(72): 45-59. [16] 赵君豪, 李志恒, 于海洋, 等. 基于遗传算与LS-SVM的公交串车预测[C]. 第十三届中国智能交通年会, 天津: 中国智能交通协会, 2018.ZHAO J H, LI Z H, YU H Y, et al. Bus bunching prediction based on genetic algorithm and LS-SVM[C]. 13th Annual Conference of ITS China, Tianjin: ITS China, 2018. (in Chinese) [17] 张健, 李梦甜, 冉斌, 等. 常规公交车辆串车形成及预测建模[J]. 东南大学学报(自然科学版), 2017, 47(6): 1269-1273. https://www.cnki.com.cn/Article/CJFDTOTAL-DNDX201706029.htmZHANG J, LI M T, RAN B, et al. Causes and forecast modeling of conventional bus bunching[J]. Journal of Southeast University(Natural Science Edition), 2017, 47(6): 1269-1273. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-DNDX201706029.htm [18] ANDRES M, NAIR R. A predictive-control framework to address bus bunching[J]. Transportation Research Part B: Methodological, 2017(104): 123-148. [19] YU H Y, WU Z H, CHEN D W. Probabilistic prediction of bus headway using relevance vector machine regression[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(7): 1772-1781. [20] 张晓峰. 基于数据驱动的公交串车预测及控制策略研究[D]. 北京: 北京交通大学, 2021.ZHANG X F. Data driven prediction of bus bunching and the control strategies[D]. Beijing: Beijing Jiaotong University, 2021. (in Chinese) [21] 美国交通运输研究委员会, 杨佩昆. 公共交通通行能力和服务质量手册[M]. 北京: 中国建筑工业出版社, 2010.Transportation Research Board, YANG P K. Transit capacity and quality of service manual[M]. Beijing: China Architecture & Building Press, 2010. (in Chinese) [22] 马新卫, 季彦婕, 金雪, 等. 租赁自行车用户出行特征及方式的影响因素分析[J]. 浙江大学学报(工学版), 2020, 54(6): 1202-1209. https://www.cnki.com.cn/Article/CJFDTOTAL-ZDZC202006018.htmMA X W, JI Y J, JIN X, et al. Analysis on travel characteristics of bike-sharing users andinfluence factors on way to travel[J]. Journal of Zhejiang University(Engineering Science), 2020, 54(6): 1202-1209. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-ZDZC202006018.htm [23] 余强, 黄晓林. 基于LightGBM的心音信号分类研究[J]. 陕西师范大学学报(自然科学版), 2020, 48(6): 47-55. https://www.cnki.com.cn/Article/CJFDTOTAL-SXSZ202006008.htmYU Q, HUANG X L. Classification of heart sound signals based on LightGBM[J]. Journal of Shaanxi Normal University(Natural Science Edition), 2020, 48(6): 47-55. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-SXSZ202006008.htm