“CNN-based People Detection in Voxel Space using Intensity Measurements and Point Cluster Flattening”

Authors: Joacim Dybedal and Geir Hovland,
Affiliation: University of Agder
Reference: 2021, Vol 42, No 2, pp. 37-46.

Keywords: Human detection, point clouds, flattening, convolutional neural network

Abstract: In this paper real-time people detection is demonstrated in a relatively large indoor industrial robot cell as well as in an outdoor environment. Six depth sensors mounted at the ceiling are used to generate a merged point cloud of the cell. The merged point cloud is segmented into clusters and flattened into gray-scale 2D images in the xy and xz planes. These images are then used as input to a classifier based on convolutional neural networks (CNNs). The final output is the 3D position (x,y,z) and bounding box representing the human. The system is able to detect and track multiple humans in real-time, both indoors and outdoors. The positional accuracy of the proposed method has been verified against several ground truth positions, and was found to be within the point-cloud voxel-size used, i.e. 0.04m. Tests on outdoor datasets yielded a detection recall of 76.9 percent and an F1 score of 0.87.

PDF PDF (5142 Kb)        DOI: 10.4173/mic.2021.2.1

References:
[1] Aalerud, A., Dybedal, J., and Hovland, G. (2019). Automatic Calibration of an Industrial RGB-D Camera Network Using Retroreflective Fiducial Markers, Sensors. 19(7):1561. doi:10.3390/s19071561
[2] Borgmann, B., Hebel, M., Arens, M., and Stilla, U. (2017). Detection of Persons in MLS Point Clouds using Implicit Shape Models, pf.bgu.tum.de. doi:10.5194/isprs-archives-XLII-2-W7-203-2017
[3] Cao, Z., Simon, T., Wei, S.E., and Sheikh, Y. (2017). Realtime multi-person 2D pose estimation using part affinity fields, In Proc. 30th IEEE Conf. Comp. Vision and Pattern Rec., CVPR 2017. pages 1302--1310. doi:10.1109/CVPR.2017.143
[4] Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection, In Proc. 2005 IEEE Comp. Soc. Conf. on Comp. Vision and Pattern Recognition, CVPR 2005, volumeI. pages 886--893. doi:10.1109/CVPR.2005.177
[5] Dybedal, J. (2021). Human detector for point clouds using point cloud flattening an cnn scene classifier, https://github.com/dybedal/wp3-human-voxel-detector, 2021.
[6] Dybedal, J. (2021). Replication Data for: CNN-based People Detection in Voxel Space using Intensity Measurements and Point Cluster Flattening, itDataverseNO. 2021. doi:10.18710/HMJVFM
[7] Dybedal, J., Aalerud, A., and Hovland, G. (2019). Embedded Processing and Compression of 3D Sensor Data for Large Scale Industrial Environments, Sensors. 19(3):636. doi:10.3390/s19030636
[8] Hakim, E.A. (2018). 3D YOLO: End-to-End 3D Object Detection Using Point Clouds, Technical report.
[9] Hui, L., Yun, L., Meiyi, Q., and Shujuan, P. (2019). Object detection method based on three-dimension information extraction of laser point cloud, In ACM Intl. Conf. Proc. Series. Association for Comp. Mach., New York, USA, pages 208--213. doi:10.1145/3307363.3307366
[10] Kim, J.M., Kim, Y.-J., and Moon, C.-B. (2020). Human Target Tracking using a 3D Laser Range Finder based on SJPDAF by Filtering the Laser Scanned Point Clouds, Intl. J. Control, Automation and Systems. 18(X):1--11. doi:10.1007/s12555-019-0603-6
[11] Lewandowski, B., Liebner, J., Wengefeld, T., Muller, S., and Gross, H.M. (2019). Fast and robust 3D person detector and posture estimator for mobile robotic applications, In Proc. IEEE Intl. Conf. Robotics and Automation. pages 4869--4875. doi:10.1109/ICRA.2019.8793712
[12] Linder, T. and Arras, K.O. (2015). Real-time full-body human attribute classification in RGB-D using a tessellation boosting approach, In IEEE Intl. Conf. Intelligent Robots and Systems. pages 1335--1341. doi:10.1109/IROS.2015.7353541
[13] Linder, T. and Arras, K.O. (2016). People detection, tracking and visualization using ROS on a mobile service robot, Studies in Computational Intelligence. 625:187--213 doi:10.1007/978-3-319-26054-9_8
[14] Munaro, M., Basso, F., and Menegatti, E. (2016). OpenPTrack: Open source multi-camera calibration and people tracking for RGB-D camera networks, Robotics and Autonomous Systems, 2016. 75:525--538. doi:10.1016/j.robot.2015.10.004
[15] Munaro, M., Lewis, C., Chambers, D., Hvass, P., and Menegatti, E. (2016). RGB-D human detection and tracking for industrial environments, In Adv. Intell. Systems and Computing. pages 1655--1668, 2016. doi:10.1007/978-3-319-08338-4_119
[16] Simon, M., Amende, K., Kraus, A., Honer, J., Saemann, T., Kaulbersch, H., Milz, S., and Gross, H.M. (2019). Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds, Technical report.
[17] Simon, M., Milz, S., Amende, K., and Gross, H.-M. (2018). Complex-YOLO: An Euler-Region-Proposal for Real-time 3D Object Detection on Point Clouds, Technical report.
[18] Spinello, L., Luber, M., and Arras, K.O. (2011). Tracking people in 3D using a bottom-up top-down detector, In Proc. IEEE Intl. Conf. Robotics and Automation. pages 1304--1310. doi:10.1109/ICRA.2011.5980085
[19] Tang, H.L., Chien, S.C., Cheng, W.H., Chen, Y.Y., and Hua, K.L. (2017). Multi-cue pedestrian detection from 3D point cloud data, In Proc. IEEE Intl. Conf. Multimedia and Expo. pages 1279--1284. doi:10.1109/ICME.2017.8019455
[20] The MathWorks, Inc. (2021). Scene Classification Using Deep Learning, https://blogs.mathworks.com/deep-learning/2019/11/25/scene-classification-using-deep-learning/, 2021. Accessed: 2021-04-28.
[21] Tome, D. and Russell, C. (2017). Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image, Technical report. http://visual.cs.ucl.ac.uk/pubs/liftingFromTheDeep.
[22] Velodyne Lidar. (2020). Automated with Velodyne | The Marsden Group | Velodyne Lidar, 2020. https://velodynelidar.com/automated-with-velodyne/the-marsden-group/.
[23] Wengefeld, T., Lewandowski, B., Seichter, D., Pfennig, L., and Gross, H.M. (2019). Real-time person orientation estimation using colored pointclouds, In Proc. 2019 European Conf. Mobile Robots. 2019. doi:10.1109/ECMR.2019.8870914
[24] Yan, Z., Duckett, T., and Bellotto, N. (2020). Online learning for 3D LiDAR-based human detection: experimental analysis of point cloud clustering and classification methods, Autonomous Robots. 44:147--164. doi:10.1007/s10514-019-09883-y
[25] Zhou, Y. and Tuzel, O. (2017). VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection, Proc. IEEE Comp. Soc. Conf. Comp. Vision and Pattern Rec., 2017. pages 4490--4499. doi:10.1109/CVPR.2018.00472
[26] Zimmermann, C., Welschehold, T., Dornhege, C., Burgard, W., and Brox, T. (2018). 3D Human Pose Estimation in RGBD Images for Robotic Task Learning, In Proc. IEEE Intl. Conf. Robotics and Automation. pages 1986--1992. doi:10.1109/ICRA.2018.8462833


BibTeX:
@article{MIC-2021-2-1,
  title={{CNN-based People Detection in Voxel Space using Intensity Measurements and Point Cluster Flattening}},
  author={Dybedal, Joacim and Hovland, Geir},
  journal={Modeling, Identification and Control},
  volume={42},
  number={2},
  pages={37--46},
  year={2021},
  doi={10.4173/mic.2021.2.1},
  publisher={Norwegian Society of Automatic Control}
};