Weakly Supervised 2D Pose Adaptation and Body Part Segmentation for Concealed Object Detection †
Abstract
:1. Introduction
- We propose two methods for adapting an RGB pretrained pose detector to estimate accurate 2D poses in backscatter images, including a weakly supervised 3D–2D pose correction network that is optimized without ground-truth (GT) 2D or 3D pose annotations. Our proposed 2D pose refinement significantly improves the correctness of 2D poses in backscatter images.
- We introduce an unsupervised procedure for segmenting human body parts in MWS images by estimating bounding polygons for each body part using keypoints of refined 2D poses. Our heuristic approach is motivated by the absence of body part annotations in the dataset and is effective at segmenting all body parts of a person in images captured from different viewpoints.
- We then propose a weakly supervised, ROI-attentive, dual-chain CNN classifier that detects anomalies given multi-view images of a cropped body part. Our anomaly detection network processes segmented images, their region of interest (ROI) masks, and derived region composite vectors (RCV) to decide if a detected anomaly should be attributed to a given body part. This ensures the network not only detects anomalies but also learns to associate anomalies with affected body parts, ultimately leading to more precise detection.
TSA Passenger Screening Dataset
2. Related Work
3. Method
3.1. Unsupervised Body Part Segmentation from 2D Poses
3.1.1. Keypoint Selection from HPE Confidence Maps
3.1.2. Multi-View Coherent Pose Optimization
Algorithm 1 Per Keypoint RANSAC Bundle Adjustment | |
Input: | ▹ 2D pixel positions of keypoint in each frame |
Output: | ▹ 2D pixel positions of keypoint in each frame after bundle adjustment |
1: , | |
2: while do | ▹ iterate over subset of keypoints for 3D bundle adjustment |
3: | ▹ randomly selected subset, 1 every 4 consecutive frames |
4: | ▹ 3D point is regressed from R via least squares optimization |
5: | ▹ 2D positions after projecting to each frame |
6: | ▹ note inlier points based on Euclidean dist. between P & |
7: if then | |
8: | ▹ retain the largest inlier set |
9: end if | |
10: end while | |
11: | ▹ least squares bundle adjusted 3D point regressed from I 2D points |
12: | ▹ final 2D positions after projecting to each frame |
3.1.3. Estimating Bounding Polygons for Body Segmentation
Limb Body Part Segmentation
Torso Body Part Segmentation
3.2. RaadNet: ROI Attentive Anomaly Detection Network
Training and Inference with RaadNet Ensemble Classifiers
3.3. Extension: Learning Local Optimal 2D Pose Correction
3.3.1. Weakly Supervised 3D to 2D Pose Correction Network
3.3.2. Weakly Supervised Loss Terms
Reprojected 2D Loss (R2D)
Multi-View Pose Consistency Loss
Multi-View Posture Consistency Loss
3.3.3. 3D Pose Prior Regularizers
Bone Proportion Constraint (BPC)
Joint Mobility Constraint (JMC)
4. Results
4.1. Evaluation of 2D Pose Refinement for MWS Images
4.2. Concealed Item Detection with RaadNet
4.3. Comparison to TSA-PSD Proprietary Classifiers
5. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
HIPAA | Health Insurance Portability and Accountability Act |
TSA | Transport Security Administration |
TSA-PSD | Transport Security Administration Passenger Screening Challenge Dataset |
HD-AIT | High Definition-Advanced Imaging Technology system |
MWS | Millimeter Wave Scan images |
H36M | Human3.6m 3D pose dataset |
SOTA | State of the Art |
ROI | Region of Interest |
RCV | Region Composite Vector |
CNN | Convolutional Neural Network |
FCN | Fully Connected Network |
CBI | Cropped Body part Image |
GT | Ground Truth |
RaadNet | ROI Attentive Anomaly Detection Network |
W-MPJPE | Weighted Mean Per Joint Position Error |
VPose | VideoPose3D Estimator Network |
ICPR | International Conference on Pattern Recognition |
NSF | National Science Foundation |
References
- Güler, R.A.; Trigeorgis, G.; Antonakos, E.; Snape, P.; Zafeiriou, S.; Kokkinos, I. DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild. In Proceeding of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2614–2623. [Google Scholar]
- Sun, K.; ** Network. ar**+Network&author=Gong,+K.&author=Liang,+X.&author=Li,+Y.&author=Chen,+Y.&author=Yang,+M.&author=Lin,+L.&publication_year=2018&journal=ar** by Semantic Bundle Adjustment. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 1538–1545. [Google Scholar]
- Triggs, B.; McLauchlan, P.; Hartley, R.; Fitzgibbon, A. Bundle Adjustment—A Modern Synthesis. In Proceedings of the Workshop on Vision Algorithms, Corfu, Greece, 21–22 September 1999. [Google Scholar]
- Grisetti, G.; Guadagnino, T.; Aloise, I.; Colosi, M.; Corte, B.D.; Schlegel, D. Least Squares Optimization: From Theory to Practice. Robotics 2020, 9, 51. [Google Scholar] [CrossRef]
- Curtis, A.R.; Powell, M.J.D.; Reid, J.K. On the Estimation of Sparse Jacobian Matrices. IMA J. Appl. Math. 1974, 13, 117–119. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.G.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the CVPR09, Miami, FL, USA, 20–25 June 2009; Available online: http://image-net.org/index (accessed on 20 February 2018).
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. ar**v 2014, ar**v:1412.6980. [Google Scholar]
- Pavllo, D.; Feichtenhofer, C.; Grangier, D.; Auli, M. 3D human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Ionescu, C.; Papava, D.; Olaru, V.; Sminchisescu, C. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 1325–1339. [Google Scholar] [CrossRef] [PubMed]
- Richer, P. Richer’s Average Human Proportions—7.5 Heads. Artistic Anatomy: The Great French Classic on Artistic Anatomy. Available online: https://www.proko.com/human-figure-proportions-average-richer (accessed on 10 August 2019).
- Hale, R.B. Hales’s Cranial Method for Human Proportions. Available online: https://www.proko.com/human-figure-proportions-cranium-unit-hale (accessed on 10 August 2019).
- Loomis, A. Loomis Idealistic Proportions. Available online: https://www.proko.com/human-figure-proportions-idealistic-loomis (accessed on 10 August 2019).
mm | R.Sh | R.Eb | R.Wr | L.Sh | L.Eb | L.Wr | R.Hp | R.Ke | R.Ak | L.Hp | L.Ke | L.Ak | Avg. |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Generic Pose | 115.7 | 191.5 | 119.1 | 100.4 | 173.4 | 117.6 | 88.49 | 115.3 | 137.3 | 79.91 | 105.8 | 134.5 | 123.2 |
Confined Pose | 55.2 | 36.9 | 35.2 | 52.3 | 44.4 | 42.6 | 65.1 | 38.5 | 42.9 | 61.6 | 36.9 | 43.6 | 46.3 |
Coherent Pose-1 | 40.2 | 32.5 | 19.9 | 38.1 | 31.4 | 23.6 | 82.8 | 38.9 | 22.4 | 78.1 | 35.8 | 21.0 | 38.7 |
Coherent Pose-2 | 37.4 | 30.2 | 19.5 | 35.8 | 31.5 | 20.2 | 64.5 | 38.3 | 21.1 | 61.2 | 35.3 | 20.4 | 34.6 |
Coherent Pose-3 | 35.6 | 28.0 | 17.1 | 35.2 | 17.5 | 17.3 | 61.8 | 36.0 | 20.2 | 61.2 | 35.1 | 20.6 | 32.1 |
Body Part Anomaly Detection Methods | Validation Set | Test | |||||
---|---|---|---|---|---|---|---|
Avg.F1 ↑ | F1.Score ↑ | Precision ↑ | Recall ↑ | Accuracy ↑ | Logloss ↓ | Logloss ↓ | |
FastNet (*) [14] | 0.8890 | - | - | - | - | - | - |
AlexNet-1 (*) [18] | - | - | - | - | - | 0.0088 | - |
AlexNet-2 (*) [19] | 0.9828 | - | - | - | - | - | 0.0913 |
RaadNet + FixedSeg. [18] + Mask + RCV (Bsl-1) | 0.9761 | 0.9555 | 0.9628 | 0.9487 | 0.9723 | 0.0201 | 0.1384 |
RaadNet + Unref.PoseSeg. + Mask + RCV (Bsl-2) | 0.9184 | 0.7526 | 0.8652 | 0.6659 | 0.9108 | 0.1282 | 0.1608 |
Ours RaadNet + Co.PoseSeg. (Abl-1) | 0.9775 | 0.9581 | 0.9655 | 0.9505 | 0.9766 | 0.0143 | 0.0934 |
Ours RaadNet + Co.PoseSeg. + Mask (Abl-2) | 0.9637 | 0.9540 | 0.9550 | 0.9531 | 0.9687 | 0.0131 | 0.0886 |
Ours RaadNet + Co.PoseSeg. + Mask + RCV (Opt.) | 0.9859 | 0.9738 | 0.9941 | 0.9544 | 0.9946 | 0.0097 | 0.0751 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Amadi, L.; Agam, G. Weakly Supervised 2D Pose Adaptation and Body Part Segmentation for Concealed Object Detection. Sensors 2023, 23, 2005. https://doi.org/10.3390/s23042005
Amadi L, Agam G. Weakly Supervised 2D Pose Adaptation and Body Part Segmentation for Concealed Object Detection. Sensors. 2023; 23(4):2005. https://doi.org/10.3390/s23042005
Chicago/Turabian StyleAmadi, Lawrence, and Gady Agam. 2023. "Weakly Supervised 2D Pose Adaptation and Body Part Segmentation for Concealed Object Detection" Sensors 23, no. 4: 2005. https://doi.org/10.3390/s23042005