A Small-scale Pedestrian Detection Method Based on Fused Residual Networks and Feature Pyramids
-
摘要: 针对小尺度行人检测中存在的过拟合、特征不易对齐,以及易忽略多尺度特征等问题,研究了1种融合残差网络和特征金字塔的小尺度行人检测方法。考虑到原始残差网络在检测小尺度行人时过于依赖训练集而出现过拟合问题,构建带有丢弃层的残差块代替残差网络结构中的标准残差块来解决这一局限,同时利用丢弃层的正则作用降低计算过程的复杂程度。通过在特征金字塔网络的侧向连接部分嵌入特征选择模块和特征对齐模块,对输入图像中重要的行人特征加强和对齐,提升算法对行人的多尺度特征学习能力,弥补特征金字塔网络出现特征不易对齐和易忽略多尺度特征的缺陷,提高小尺度行人的检测精度。在Caltech Pedestrian数据集上对模型进行训练、测试和验证,实验结果表明:小尺度行人检测精度为73.6%,AP50检测精度为95.6%。在同为50层残差网络和特征金字塔网络下,改进后的模型可以使AP值提高17.2%,AP50提高7.8%,小尺度行人检测精度提高了21.6%;在同为101层残差网络和特征金字塔网络下,可以使AP值提高24.5%,AP50提高8.2%,小尺度行人检测精度提高32.3%。同时与RefindDet512、GHM800算法相比,AP值分别提高20.8%和17.7%,AP50分别提高5.5%和3.6%,小尺度行人检测精度分别提高26.8%和20.6%,由此证明提出的模型性能优于经典检测算法,可以有效地提高小尺度行人检测精度。Abstract: Traditional detection methods for small-scale pedestrians have several issues such as overfitting, misalignment of features, and neglect of multi-scale features. Therefore, a new small-scale pedestrian detection method is proposed by combining residual networks and feature pyramids. To solve the overfitting problem of the residual networks for detecting small-scale pedestrians, a residual block with a dropout layer is applied to replace the standard residual block in the residual network structure. Moreover, the regularization effect of the dropout layer can reduce the computational complexity. The embedding feature selection module and feature alignment module in the lateral connection part of the feature pyramid networks can improve the ability of learning multi-scale features of pedestrians. The feature selection module and feature alignment module make up for the deficiency of misalignment of features and neglect of multi-scale features, which can improve the accuracy of detecting small-scale pedestrians. The proposed model is trained, tested, and validated based on the Caltech Pedestrian dataset. Experiment results show that the detection accuracy for small-scale pedestrians is 73.6% and the AP50 detection accuracy is 95.6%. Compared to the traditional method, the proposed method improves the AP (average precision) by 17.2%, AP50 (average precision when the intersection over union is greater than 0.5) by 7.8%, and detection accuracy for small-scale pedestrians by 21.6% respectively, when the number of layers is set as 50. In addition, the proposed method improves the AP by 24.5%, AP50 by 8.2%, and detection accuracy for small-scale pedestrians by 32.3%, when the number of layers is set as 101. Moreover, compared with RefindDet512 and GHM800 algorithms, the AP is improved by 20.8% and 17.7%, the AP50 is improved by 5.5% and 3.6%, and the detection accuracy for small-scale pedestrians is improved by 26.8% and 20.6%, respectively. Therefore, it can be concluded that the proposed method can effectively improve performance and accuracy of pedestrian detection, when compared to traditional algorithms.
-
表 1 在Caltech Pedestrian数据集上的消融实验结果
Table 1. Ablation experiment results on Caltech Pedestrian Dataset
方法 骨干网络 AP AP50 AP75 APS ResNet-FPN ResNet-50-PFN 52.6 87.0 58.2 42.5 ResNet-101-FPN 52.8 87.4 58.5 41.3 IResNet-FPN IResNet-50-PFN 57.2 89.3 66.8 48.1 IResNet-101-FPN 57.9 91.0 67.0 48.6 ResNet-IFPN ResNet-50-IPFN 58.8 91.4 68.1 50.0 ResNet-101-IFPN 59.4 92.0 69.0 50.9 FRN-FP IResNet-50-IPFN 69.8 94.8 82.7 64.1 IResNet-101-IFPN 77.3 95.6 89.0 73.6 表 2 本文方法与经典检测算法的对比结果
Table 2. Comparison results of our algorithm with classical detection algorithms
方法 骨干网络 AP AP50 AP75 APS RefindDet512 ResNet-101 56.5 90.1 64.5 46.8 GHM800 ResNet-101 59.6 92.0 70.0 53.0 FRN-FP(本文) IResNet-50-IPFN 69.8 94.8 82.7 64.1 IResNet-101-IFPN 77.3 95.6 89.0 73.6 -
[1] HOU L, LU K, XUE J. Refined one-stage oriented object detection method for remote sensing images[J]. IEEE Transactions on Image Processing, 2022(31): 1545-1558. [2] GE Z, JIE Z, HUANG X, et al. Delving deep into the imbalance of positive proposals in two-stage object detection[J]. Neurocomputing, 2021, 425: 107-116. doi: 10.1016/j.neucom.2020.10.098 [3] 李翔, 何淼, 罗海波. 1种面向遮挡行人检测的改进YOLOv3算法[J]. 光学学报, 2022, 42(14): 160-169. https://www.cnki.com.cn/Article/CJFDTOTAL-GXXB202214021.htmLI X, HE M, LUO H B. An improved yolov3 algorithm for occluded pedestrian detection[J]. Acta Optica Sinica, 2022, 42(14): 160-169. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-GXXB202214021.htm [4] 王鹏, 神和龙, 尹勇, 等. 基于深度学习的船舶驾驶员疲劳检测算法[J]. 交通信息与安全, 2022, 40(1): 63-71. doi: 10.3963/j.jssn.1674-4861.2022.01.008WANG P, SHEN H L, YIN Y, et al. A detection algorithm for the fatigue of ship officers based on deep learning technique[J]. Journal of Transport Information and Safety, 2022, 40(1): 63-71. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2022.01.008 [5] 杨鹏强, 张艳伟, 胡钊政. 基于改进RepVGG网络的车道线检测算法[J]. 交通信息与安全, 2022, 40(2): 73-81. doi: 10.3963/j.jssn.1674-4861.2022.02.009YANG P Q, ZHANG Y W, HU Z Z. A lane detection algorithm based on improved repvgg network[J]. Journal of Transport Information and Safety, 2022, 40(2): 73-81. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2022.02.009 [6] 储珺, 束雯, 周子博, 等. 结合语义和多层特征融合的行人检测[J]. 自动化学报, 2022, 48(1): 282-291. https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO202201020.htmCHU J, SHU W, ZHOU Z B, et al. Combining semantics with multi-level feature fusion for pedestrian detection[J]. Acta Automatica Sinica, 2022, 48(1): 282-291. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO202201020.htm [7] 罗艳, 张重阳, 田永鸿, 等. 深度学习行人检测方法综述[J]. 中国图象图形学报, 2022, 27(7): 2094-2111. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB202207003.htmLUO Y, ZHANG C Y, TIAN Y H, et al. An overview of deep learning based pedestrian detection algorithms[J]. Journal of Image and Graphics, 2022, 27(7): 2094-2111. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB202207003.htm [8] RABBI J, RAY N, SCHUBERT M, et al. Small-object detection in remote sensing images with end-to-end edge-enhanced gan and object detector network[J]. Remote Sensing, 2020, 12 (9): 1432. doi: 10.3390/rs12091432 [9] ZHAI S, SHANG D, WANG S, et al. Df-ssd: An improved ssd object detection algorithm based on densenet and feature fusion[J]. IEEE Access, 2020(8): 24344-24357. [10] ROY A M, BOSE R, BHADURI J. A fast accurate fine-grain object detection model based on YOLOv4 deep neural network[J]. Neural Computing and Applications, 2022, 34(5): 3895-3921. doi: 10.1007/s00521-021-06651-x [11] YIN Q, YANG W, RAN M, et al. Fd-ssd: An improved ssd object detection algorithm based on feature fusion and dilated convolution[J]. Signal Processing: Image Communication, 2021(98): 116402. [12] 王程, 刘元盛, 刘圣杰. 基于改进YOLOv4的小目标行人检测算法[J]. 计算机工程, 2023, 49(2): 296-302, 313. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJC202302036.htmWANG C, LIU Y S, LIU S J. Small target pedestrian detection algorithm based on improved yolov4[J]. Computer Engineering, 2023, 49(2): 296-302, 313. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JSJC202302036.htm [13] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]. Computer Vision and Pattern Recognition, Hawaii, USA: IEEE, 2017. [14] LI J, LIANG X, SHEN S M, et al. Scale-aware fast r-cnn for pedestrian detection[J]. IEEE Transactions on Multimedia, 2017, 20(4): 985-996. [15] REN S, HE K, GIRSHICK R, et al. Faster r-cnn: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149. [16] WU M, YUE H, WANG J, et al. Object detection based on rgc mask r-cnn[J]. IET Image Processing, 2020, 14(8): 1502-1508. [17] ZHANG L, LIN L, LIANG X, et al. Is faster r-cnn doing well for pedestrian detection?[C]. European Conference on Computer Vision, Amsterdam, Netherlands: Springer, 2016. [18] LIU T, STATHAKI T. Faster r-cnn for robust pedestrian detection using semantic segmentation network[J]. Frontiers in Neurorobotics, 2018(12): 1-10. [19] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]. Computer Vision and Pattern Recognition, Las Vegas, USA: IEEE, 2016. [20] SHAO X, WANG Q, YANG W, et al. Multi-scale feature pyramid network: A heavily occluded pedestrian detection network based on resnet[J]. Sensors, 2021, 21(5): 1820. [21] HUANG S, LU Z, CHENG R, et al. Fapn: feature-aligned pyramid network for dense image prediction[C]. International Conference on Computer Vision, Montreal, Canada: IEEE, 2021. [22] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: A simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958. [23] TANG L, TANG W, QU X, et al. A scale-aware pyramid network for multi-scale object detection in sar images[J]. Remote Sensing, 2022, 14(4): 973. [24] CAI Z, VASCONCELOS N. Cascade r-cnn: high quality object detection and instance segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43 (5): 1483-1498. [25] ZHANG S, WEN L, BIAN X, et al. Single-shot refinement neural network for object detection[C]. Computer Vision and Pattern Recognition, Salt Lake City, USA: IEEE, 2018. [26] LI B, LIU Y, WANG X. Gradient harmonized single-stage detector[C]. AAAI Conference on Artificial Intelligence, Hawaii, USA: AAAI, 2019.