MURM: Utilization of Multi-Views for Goal-Conditioned Reinforcement Learning in Robotic Manipulation
Abstract
:1. Introduction
- We propose a novel framework, MURM, for solving goal-conditioned RL by leveraging images from multiple viewpoints with two effective implementation methods called the dropout method and the separated Q-functions method.
- We empirically show the effectiveness of our framework MURM in complicated tasks compared to single-view baselines.
2. Related Works
3. Preliminaries
3.1. Variational Autoencoders (VAE)
3.2. Goal-Conditioned Reinforcement Learning
3.3. Offline Reinforcement Learning
4. Methods
4.1. Designing Demo Dataset for Offline RL
4.2. Representation Learning with VQVAE
4.3. Utilizing Multi-Views in Goal-Conditioned RL in Offline RL Settings
Algorithm 1 MURM |
Require: Dataset , policy , Q-function , RL algorithm , replay buffer , state , state 1: Collect demos of from noisy expert 2: Learn state-encoders for each i viewpoint 3: Change states with raw images to latent states 4: if = Separated Q-functions then 5: 6: end if 7: Initialize and Q by 8: for 1, …, do 9: Sample goals for used i views: 10: for do 11: sample 12: sample 13: end for 14: Store trajectory in replay buffer 15: if = Dropout then 16: with Bernoulli(p), = 17: end if 18: Update and Q with sampled batches using 19: end for |
5. Experimental Evaluation
5.1. Experimental Setups
5.2. Results and Analysis
5.3. Ablation Experiments
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
GCRL | Goal-Conditioned Reinforcement Learning |
RL | Reinforcement Learning |
VQVAE | Variational Autoencoders |
MURM | Multi-view Unified Reinforcement Learning for Manipulation |
MDP | Markov Decision Process |
IQL | Implicit Q-Learning |
HER | Hindsight Experience Replay |
SOTA | State-of-the-art |
MLP | Multi-Layer Perceptron |
References
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of go without human knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef] [PubMed]
- Chen, P.; Lu, W. Deep reinforcement learning based moving object gras**. Inf. Sci. 2021, 565, 62–76. [Google Scholar] [CrossRef]
- Su, H.; Hu, Y.; Li, Z.; Knoll, A.; Ferrigno, G.; De Momi, E. Reinforcement learning based manipulation skill transferring for robot-assisted minimally invasive surgery. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 2203–2208. [Google Scholar]
- Plappert, M.; Andrychowicz, M.; Ray, A.; McGrew, B.; Baker, B.; Powell, G.; Schneider, J.; Tobin, J.; Chociej, M.; Welinder, P.; et al. Multi-goal reinforcement learning: Challenging robotics environments and request for research. ar**. Adv. Eng. Inform. 2022, 52, 101562. [Google Scholar] [CrossRef]
- Gupta, D.S.; Bahmer, A. Increase in mutual information during interaction with the environment contributes to perception. Entropy 2019, 21, 365. [Google Scholar] [CrossRef] [PubMed]
- Kroemer, O.; Niekum, S.; Konidaris, G. A review of robot learning for manipulation: Challenges, representations, and algorithms. J. Mach. Learn. Res. 2021, 22, 1395–1476. [Google Scholar]
- Jangir, R.; Hansen, N.; Ghosal, S.; Jain, M.; Wang, X. Look Closer: Bridging Egocentric and Third-Person Views With Transformers for Robotic Manipulation. IEEE Robot. Autom. Lett. 2022, 7, 3046–3053. [Google Scholar] [CrossRef]
- James, S.; Wada, K.; Laidlow, T.; Davison, A.J. Coarse-to-fine q-attention: Efficient learning for visual robotic manipulation via discretisation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13739–13748. [Google Scholar]
- OpenAI, O.; Plappert, M.; Sampedro, R.; Xu, T.; Akkaya, I.; Kosaraju, V.; Welinder, P.; D’Sa, R.; Petron, A.; Pinto, H.P.d.O.; et al. Asymmetric self-play for automatic goal discovery in robotic manipulation. ar**v 2021, ar**v:2101.04882. [Google Scholar]
- Akinola, I.; Varley, J.; Kalashnikov, D. Learning precise 3d manipulation from multiple uncalibrated cameras. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 4616–4622. [Google Scholar]
- James, S.; Davison, A.J. Q-attention: Enabling efficient learning for vision-based robotic manipulation. IEEE Robot. Autom. Lett. 2022, 7, 1612–1619. [Google Scholar] [CrossRef]
- Seo, Y.; Kim, J.; James, S.; Lee, K.; Shin, J.; Abbeel, P. Multi-View Masked World Models for Visual Robotic Manipulation. ar**v 2023, ar**v:2302.02408. [Google Scholar]
- Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Higgins, I.; Matthey, L.; Pal, A.; Burgess, C.; Glorot, X.; Botvinick, M.; Mohamed, S.; Lerchner, A. Beta-vae: Learning basic visual concepts with a constrained variational framework. In Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Van Den Oord, A.; Vinyals, O. Neural discrete representation learning. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning, Lille, France, 7–9 July 2015; pp. 1889–1897. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. ar**v 2015, ar**v:1509.02971. [Google Scholar]
- Kostrikov, I.; Nair, A.; Levine, S. Offline reinforcement learning with implicit q-learning. ar**v 2021, ar**v:2110.06169. [Google Scholar]
- Coumans, E.; Bai, Y. Pybullet, a Python Module for Physics Simulation for Games, Robotics and Machine Learning. 2016. Available online: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=E_82W3EAAAAJ&citation_for_view=E_82W3EAAAAJ:hqOjcs7Dif8C (accessed on 16 August 2023).
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
- Andrychowicz, M.; Wolski, F.; Ray, A.; Schneider, J.; Fong, R.; Welinder, P.; McGrew, B.; Tobin, J.; Pieter Abbeel, O.; Zaremba, W. Hindsight experience replay. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Neurons (MLP) | Success Rate (%) | Time Consumption |
---|---|---|
32 | 0.52 | |
64 | 0.66 | |
128 | 33.14 ± 2.73 | 1.0 |
256 | 2.64 |
Epsilon Values | 0.1 | 1.0 | 2.0 | 3.0 | 5.0 |
---|---|---|---|---|---|
Std () mean | ±5.64 | ±2.70 | ±2.97 | ±1.94 | ±2.86 |
Success Rate (%) | 29.04 ± 0.89 | 29.97 ± 2.08 | 34.49 ± 3.08 | 33.14 ± 2.73 | 27.92 ± 3.43 |
Viewpoints | Success Rate (%) | ||
---|---|---|---|
Task 1 | Task 2 | Task 3 | |
Global-view | 34.49 | 46.63 | 15.74 |
Adjacent-view | 33.76 | 39.50 | 14.65 |
Top-view | 18.84 | 38.65 | 0.17 |
Side-view | 17.99 | 33.86 | 12.74 |
Active-view | 12.51 | 8.68 | 3.63 |
Methods | Success Rate (%) | ||
---|---|---|---|
Task 1 | Task 2 | Task 3 | |
Single-view Baseline (50%) | 0.0 | 3.17 | 0.0 |
Single-view Baseline (10%) | 11.35 | 20.1 | 7.85 |
Multi-view Concatenated | 4.22 | 11.65 | 3.93 |
MURM-Dropout | 36.07 | 40.2 | 9.67 |
MURM-Separated Q-functions | 38.35 | 43.96 | 14.88 |
Methods | Time Consumption |
---|---|
(If Add a Viewpoint) | |
Single-view Baselines | 1.0 |
MURM-Dropout | 1.05 (+0.46) |
MURM-Separated Q-functions | 1.55 (+0.5) |
Viewpoint Added | Success Rate (%) |
---|---|
Top-view | 30.23 (−8.12) |
Side-view | 32.87 (−5.48) |
Active-view | 23.07 (−15.28) |
Demo Episodes | 250 | 500 | 1000 | 2000 | 3000 |
Success Rates (%) | 29.11 | 31.22 | 38.35 | 35.84 | 34.75 |
Noisy Experts (k%) | 0% | 25% | 50% | 100% | |
Success rates (%) | 26.57 | 31.85 | 33.73 | 38.35 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jang, S.; Jeong, H.; Yang, H. MURM: Utilization of Multi-Views for Goal-Conditioned Reinforcement Learning in Robotic Manipulation. Robotics 2023, 12, 119. https://doi.org/10.3390/robotics12040119
Jang S, Jeong H, Yang H. MURM: Utilization of Multi-Views for Goal-Conditioned Reinforcement Learning in Robotic Manipulation. Robotics. 2023; 12(4):119. https://doi.org/10.3390/robotics12040119
Chicago/Turabian StyleJang, Seongwon, Hyemi Jeong, and Hyunseok Yang. 2023. "MURM: Utilization of Multi-Views for Goal-Conditioned Reinforcement Learning in Robotic Manipulation" Robotics 12, no. 4: 119. https://doi.org/10.3390/robotics12040119