Towards Multi-Objective Object Push-Grasp Policy Based on Maximum Entropy Deep Reinforcement Learning under Sparse Rewards
(This article belongs to the Section Multidisciplinary Applications)
Abstract
:1. Introduction
- (1)
- Design a maximum entropy deep reinforcement learning gras** method based on an attention mechanism to address complex and sparse reward tasks while eliminating the trouble of adjusting hyper-parameters in unstructured gras** environments.
- (2)
- Design an experience replay mechanism to reduce data correlation and combine advantage functions to enhance reasoning and decision-making abilities in complex environments.
- (3)
- Design object affordance perception based on space-channel attention to make robots more flexible in dealing with various complex gras** tasks.
- (4)
- Our proposed method has generalization ability from simulation to real world. For cluttered situations, the experimental results indicate the gras** rate of unknown objects is up to 100% and 91.6% for single-object and multi-object, respectively.
2. Related Work
3. Preliminaries and Problem Formulation
3.1. Model Description
3.2. Prioritized Experience Replay
3.3. Reward Resha**
4. Push-Grasp Policy Design
4.1. Affordance Perception
4.2. Maximum Entropy DQN
5. Experiment Analysis
5.1. Experimental Setup
5.2. Training
5.3. Object Gras** Simulation Experiments
5.4. Ablation Experiment
5.5. Physical Experiment
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhang, H.; Lan, X.; Bai, S.; Zhou, X.; Tian, Z.; Zheng, N. ROI-based Robotic Grasp Detection for Object Overlap** Scenes. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 4768–4775. [Google Scholar] [CrossRef]
- Zhou, X.; Lan, X.; Zhang, H.; Tian, Z.; Zhang, Y.; Zheng, N. Fully Convolutional Grasp Detection Network with Oriented Anchor Box. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 7223–7230. [Google Scholar] [CrossRef]
- Chen, T.; Shenoy, A.; Kolinko, A.; Shah, S.; Sun, Y. Multi-Object Gras**—Estimating the Number of Objects in a Robotic Grasp. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 7 September–1 October 2021; pp. 4995–5001. [Google Scholar] [CrossRef]
- Liu, S.; Wang, L.; Vincent Wang, X. Multimodal Data-Driven Robot Control for Human–Robot Collaborative Assembly. ASME. J. Manuf. Sci. Eng. May 2022, 144, 051012. [Google Scholar] [CrossRef]
- Valencia, D.; Jia, J.; Hayashi, A.; Lecchi, M.; Terezakis, R.; Gee, T.; Liarokapis, M.; MacDonald, B.A.; Williams, H. Comparison of Model-Based and Model-Free Reinforcement Learning for Real-World Dexterous Robotic Manipulation Tasks. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 871–878. [Google Scholar] [CrossRef]
- Yu, K.-T.; Bauza, M.; Fazeli, N.; Rodriguez, A. More than a million ways to be pushed. A high-fidelity experimental dataset of planar pushing. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; pp. 30–37. [Google Scholar] [CrossRef]
- Bauza, M.; Rodriguez, A. A probabilistic data-driven model for planar pushing. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 3008–3015. [Google Scholar] [CrossRef]
- Palleschi, A.; Angelini, F.; Gabellieri, C.; Park, D.W.; Pallottino, L.; Bicchi, A.; Garabini, M. Grasp It Like a Pro 2.0: A Data-Driven Approach Exploiting Basic Shape Decomposition and Human Data for Gras** Unknown Objects. IEEE Trans. Robot. 2023, 39, 4016–4036. [Google Scholar] [CrossRef]
- Lee, M.A.; Zhu, Y.; Srinivasan, K.; Shah, P.; Savarese, S.; Fei-Fei, L.; Garg, A.; Bohg, J. Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 8943–8950. [Google Scholar] [CrossRef]
- Takahashi, K.; Ko, W.; Ummadisingu, A.; Maeda, S.-I. Uncertainty-aware Self-supervised Target-mass Gras** of Granular Foods. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), **+of+Granular+Foods&conference=Proceedings+of+the+2021+IEEE+International+Conference+on+Robotics+and+Automation+(ICRA)&author=Takahashi,+K.&author=Ko,+W.&author=Ummadisingu,+A.&author=Maeda,+S.-I.&publication_year=2021&pages=2620%E2%80%932626&doi=10.1109/ICRA48506.2021.9561728" class='google-scholar' target='_blank' rel='noopener noreferrer'>Google Scholar] [CrossRef]
- Zeng, A.; Song, S.; Welker, S.; Lee, J.; Rodriguez, A.; Funkhouser, T. Learning Synergies Between Pushing and Gras** with Self-Supervised Deep Reinforcement Learning. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 4238–4245. [Google Scholar] [CrossRef]
- Berscheid, L.; Meißner, P.; Kröger, T. Robot Learning of Shifting Objects for Gras** in Cluttered Environments. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 612–618. [Google Scholar] [CrossRef]
- Liu, H.; Yuan, Y.; Deng, Y.; Guo, X.; Wei, Y.; Lu, K.; Fang, B.; Guo, D. Active Affordance Exploration for Robot Gras**. In Intelligent Robotics and Applications. ICIRA 2019; Yu, H., Liu, J., Liu, L., Ju, Z., Liu, Y., Zhou, D., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2019; Volume 11744, pp. 426–438. [Google Scholar] [CrossRef]
- Peng, G.; Liao, J.; Guan, S.; Yang, J.; Li, X. A pushing-gras** collaborative method based on deep Q-network algorithm in dual viewpoints. Sci. Rep. 2022, 12, 3927. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Ju, Z.; Yang, C. Combining Reinforcement Learning and Rule-based Method to Manipulate Objects in Clutter. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24July 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Mohammed, M.Q.; Kwek, L.C.; Chua, S.C.; Aljaloud, A.S.; Al-Dhaqm, A.; Al-Mekhlafi, Z.G.; Mohammed, B.A. Deep Reinforcement Learning-Based Robotic Gras** in Clutter and Occlusion. Sustainability 2021, 13, 13686. [Google Scholar] [CrossRef]
- Lu, N.; Lu, T.; Cai, Y.; Wang, S. Active Pushing for Better Gras** in Dense Clutter with Deep Reinforcement Learning. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020; pp. 1657–1663. [Google Scholar] [CrossRef]
- Kiatos, M.; Sarantopoulos, I.; Koutras, L.; Malassiotis, S.; Doulgeri, Z. Learning Push-Gras** in Dense Clutter. IEEE Robot. Autom. Lett. 2022, 7, 8783–8790. [Google Scholar] [CrossRef]
- Lu, N.; Cai, Y.; Lu, T.; Cao, X.; Guo, W.; Wang, S. Picking out the Impurities: Attention-based Push-Gras** in Dense Clutter. Robotica 2023, 41, 470–485. [Google Scholar] [CrossRef]
- Kalashnikov, D.; Irpan, A.; Pastor, P.; Ibarz, J.; Herzog, A.; Jang, E.; Quillen, D.; Holly, E.; Kalakrishnan, M.; Vanhoucke, V.; et al. Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. ar** Policies. In Proceedings of the 2022 Sixth IEEE International Conference on Robotic Computing (IRC), Rome, Italy, 5–7 December 2022; pp. 156–163. [Google Scholar] [CrossRef]
- Sarantopoulos, I.; Kiatos, M.; Doulgeri, Z.; Malassiotis, S. Split Deep Q-Learning for Robust Object Singulation. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 6225–6231. [Google Scholar] [CrossRef]
Layer Name | Output Size | Kernel Size/Number | Output Feature Maps |
---|---|---|---|
Conv | 112 112 | 64 | |
Pooling | 56 56 | 64 | |
Attention block_1 | 56 56 | 256 | |
Transition layer_1 | 56 56 | 128 | |
28 28 | 125 | ||
Attention block_2 | 28 28 | 512 | |
Transition layer_2 | 28 28 | 256 | |
14 14 | 256 | ||
Attention block_3 | 14 14 | 1024 | |
Transition layer_3 | 14 14 | 512 | |
7 7 | 512 | ||
Attention block_4 | 7 7 | 1024 |
GS(%) | GE (Number per Hour) | GT (s) | |||||||
---|---|---|---|---|---|---|---|---|---|
Module | cub | cy | o | cub | cy | o | cub | cy | o |
DenseNet-201 | 78.5 | 75.1 | 68.5 | 800 | 642 | 590 | 4.5 | 5.6 | 6.1 |
DenseNet-169 | 89.2 | 85.7 | 80.3 | 947 | 734 | 679 | 3.8 | 4.9 | 5.3 |
DenseNet-121(Ours) | 100 | 100 | 100 | 972 | 782 | 750 | 3.7 | 4.6 | 4.8 |
GS (%) | GE (Number per Hour) | GT (s) | |
---|---|---|---|
Same structure | 93.1 | 702 ± 3 | 7.9 |
Different structure | 92.4 | 519 ± 3 | 10.8 |
Evaluation Metrics (Mean %) | ||
---|---|---|
Methods | Completion | GS (%) |
Dual viewpoint [14] | 92 | 83.2 |
Rule-based method [15] | 90 | 72.8 |
VPG-only depth [30] | 96 | 74.6 |
VPG [11] | 90 | 86.9 |
Ours | 98 | 92.4 |
Module | 60% | 70% | 80% | 90% |
---|---|---|---|---|
DQN (DenseNet121) | 185 | 525 | - | - |
ME-DQN-noAF | 269 | 337 | 402 | - |
ME-DQN-noattention | 213 | 286 | 592 | - |
ME-DQN (ours) | 287 | 368 | 435 | 711 |
Methods | Attempts | Average Successful Rate/Individual Object Time | Successful Rate of Empty Workplace | ||
---|---|---|---|---|---|
10 Objects | 20 Objects | 30 Objects | |||
UCB [32] | 523 | 82% (15.8 s) | 89% | 83% | 75% |
3DCNN [33] | 471 | 87% (12.7 s) | 92.5% | 89.5% | 79% |
Coordinator [34] | 509 | 85% (17.3 s) | 94.5% | 81% | 79.5% |
VPG [11] | 497 | 82.9% (10.9 s) | 94.8% | 83.6% | 70.3% |
Ours | 511 | 91.6% (8.9 s) | 96% | 88% | 87.2% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, T.; Mo, H. Towards Multi-Objective Object Push-Grasp Policy Based on Maximum Entropy Deep Reinforcement Learning under Sparse Rewards. Entropy 2024, 26, 416. https://doi.org/10.3390/e26050416
Zhang T, Mo H. Towards Multi-Objective Object Push-Grasp Policy Based on Maximum Entropy Deep Reinforcement Learning under Sparse Rewards. Entropy. 2024; 26(5):416. https://doi.org/10.3390/e26050416
Chicago/Turabian StyleZhang, Tengteng, and Hongwei Mo. 2024. "Towards Multi-Objective Object Push-Grasp Policy Based on Maximum Entropy Deep Reinforcement Learning under Sparse Rewards" Entropy 26, no. 5: 416. https://doi.org/10.3390/e26050416