Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review
Abstract
:1. Introduction
2. Background
2.1. Main Big Code Dataset
2.2. Tokenization
2.3. Language Models on Software Naturalness
2.4. Measurement of Language Models with Entropy
3. AI-Assisted Programming Tasks
3.1. Code Generation
3.2. Code Completion
3.3. Code Translation
3.4. Code Refinement
3.5. Code Summarization
3.6. Defect Detection
3.7. Clone Detection
Framework | Year | Task(s) | Baseline(s) | Supported Language(s) | Open Sourced |
---|---|---|---|---|---|
Refactory [137] | 2019 | Defect Detection | BLEU | Java | ✗ |
CuBERT [138] | 2020 | Code Refinement, Defect Detection | BERT | Python | ✓ |
CugLM [139] | 2020 | Code Completion | BERT | Java, TypeScript | ✓ |
Intellicode [140] | 2020 | Code Generation, Code Completion | GPT-2 | Python, C#, JavaScript, and TypeScrip | ✗ |
Great [141] | 2020 | Defect Detection | Vanilla Transformers | Python | ✓ |
TreeGEN [51] | 2020 | Code Generation | Vanilla Transformers | Python | ✓ |
C-BERT [127] | 2020 | Defect Detection | BERT | C | ✗ |
TransCoder [142] | 2020 | Code Translation | Vanilla Transformers | C++, Java, and Python | ✗ |
GraphCodeBERT [143] | 2020 | Code Summarization, Code Refinement | BERT | Java | ✗ |
Codex [35] | 2021 | Code Generation, Code Completion, Code Summarization, Benchmark | GPT-3 | JavaScript, Go, Perl, and 6 more | ✗ |
Copilot [144] | 2021 | Code Generation, Code Completion | Codex | Java, PHP, Python, and 5 more | ✗ |
CodeT5 [145] | 2021 | Code Summarization, Code Generation, Code Translation, Code Refinement, Defect Detection, Clone Detection | T5 | Python, Java | ✓ |
Tfix [146] | 2021 | Code Refinement, Defect Detection | T5 | JavaScript | ✓ |
CodeRL [147] | 2021 | Code Summarization, Code Generation, Code Translation, Code Refinement, Defect Detection, Clone Detection | T5 | Java | ✓ |
TreeBERT [148] | 2021 | Code Summarization | Vanilla Transformers | Python, Java | ✓ |
BUGLAB [149] | 2021 | Code Refinement, Defect Detection | GREAT | Python | ✓ |
TBCC [150] | 2021 | Clone Detection | Vanilla Transformers | C, Java | ✓ |
APPS [36] | 2021 | Benchmark | N/A | Python | ✓ |
CodeXGLUE [34] | 2021 | Benchmark | N/A | Python | ✓ |
CoTexT [151] | 2021 | Code Summarization, Code Generation, Code Refinement, Defect detection | T5 | Python, Java, Javascript, PHP, Ruby, Go | ✓ |
SynCoBERT [152] | 2021 | Code Translation, Defect Detection, Clone Detection | BERT | Ruby, Javascript, Go, Python, Java, PHP | ✗ |
TravTrans [153] | 2021 | Code Completion | Vanilla Transformers | Python | ✗ |
CCAG [154] | 2021 | Code Completion | Vanilla Transformers | JavaScript, Python | ✗ |
DeepDebug [155] | 2021 | Defect Detection | Reformer | Java | ✓ |
Recoder [93] | 2021 | Defect Detection | TreeGen | Java | ✓ |
PLBART [156] | 2021 | Code Summarization, Code Generation, Code Translation, Code Refinement, Clone Detection, Detect Detection | BART | Java, Python | ✗ |
CODEGEN [157] | 2022 | Code Generation | GPT-NEO & GPT-J | Python | ✓ |
GPT-2 for APR [158] | 2022 | Code Refinement | GPT-2 | JavaScript | ✓ |
CERT [39] | 2022 | Code Generation | CODEGEN | Python | ✓ |
PyCoder [87] | 2022 | Code Generation | GPT-2 | Python | ✓ |
AlphaCode [38] | 2022 | Code Generation | GPT | Java | ✗ |
InCoder [40] | 2022 | Code Generation, Code Completion, Code Summarization | GPT-3 | Java, JavaScript, Python | ✓ |
RewardRepair [159] | 2022 | Code Refinement, Defect Detection | T5 | Java | ✓ |
CodeParrot [37] | 2022 | Code Generation | GPT-2 | Python | ✓ |
AlphaRepair [160] | 2022 | Code Refinement, Defect Detection | CodeBERT | Java | ✓ |
CodeReviewer [128] | 2022 | Code Summarization, Code Refinement, Defect Detection | CodeT5 | Java | ✓ |
TransRepair [161] | 2022 | Code Refinement, Defect Detection | BLEU | Java | ✗ |
NatGen [162] | 2022 | Code Generation, Code Translation, Code Refinement | CodeT5 | Java, Python, Go, JavaScript, Ruby, PHP | ✓ |
DualSC [163] | 2022 | Code Generation, Code Summarization | T5 | Shellcode | ✓ |
VulRepair [164] | 2022 | Code Refinement, Defect Detection | T5 | C, C++ | ✓ |
CoditT5 [165] | 2022 | Code Summarization, Defect Detection | CodeT5 | Java, Python, Ruby, PHP, Go, JavaScript | ✓ |
C4 [166] | 2022 | Clone Detection | CodeBERT | C++, C#, Java, Python | ✓ |
SPT-Code [167] | 2022 | Code Summarization, Code Completion, Code Refinement, Code Translation | CodeBERT & GraphCodeBERT | Python, Java, JavaScript, PHP, Go | ✓ |
ExploitGen [168] | 2023 | Code Generation | CodeBERT | Python, Assembly | ✓ |
Santacoder [169] | 2023 | Code Summarization, Code Generation | GPT-2 | Python, Java, and Javascript | ✓ |
xCodeEval [42] | 2023 | Benchmark | N/A | Python, Java, C++, PHP, and 8 more | ✓ |
StarCoder [170] | 2023 | Code Generation, Code Completion, Code Summarization | BERT & SantaCoder | HTML, Python, Java, and 83 more | ✓ |
4. Challenges and Opportunities
4.1. Computational Expense
4.2. Quality Measurement
4.3. Software Security
4.4. Software Piracy
4.5. Integration with Existing Tools
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Vechev, M.; Yahav, E. Programming with “Big Code”. Found. Trends® Program. Lang. 2016, 3, 231–284. [Google Scholar] [CrossRef]
- Hindle, A.; Barr, E.T.; Su, Z.; Gabel, M.; Devanbu, P. On The Naturalness of Software. In Proceedings of the 34th International Conference on Software Engineering (ICSE), Zurich, Switzerland, 2–9 June 2012; pp. 837–847. [Google Scholar]
- Goodman, J.T. A bit of progress in language modeling. In Computer Speech & Language; Elsevier: Amsterdam, The Netherlands, 2001; pp. 403–434. [Google Scholar]
- Dijkstra, E.W. A Preliminary Investigation into Computer Assisted Programming; The University of Texas: Austin, TX, USA, 2007. [Google Scholar]
- Rajamani, S. AI Assisted Programming. In Proceedings of the 15th Annual ACM India Compute Conference, Jaipur, India, 9–11 November 2022; p. 5. [Google Scholar]
- Dijkstra, E.W. The Humble Programmer. Commun. ACM 1972, 15, 859–866. [Google Scholar] [CrossRef]
- Ji, Y.; Bosselut, A.; Wolf, T.; Celikyilmaz, A. The Amazing World of Neural Language Generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, Virtual, 19–20 November 2020; pp. 37–42. [Google Scholar]
- Surameery, N.M.S.; Shakor, M.Y. Use ChatGPT to Solve Programming Bugs. Int. J. Inf. Technol. Comput. Eng. (IJITC) 2023, 3, 17–22. [Google Scholar]
- Talamadupula, K. Applied AI Matters: AI4Code: Applying Artificial Intelligence to Source Code. AI Matters 2021, 7, 18–20. [Google Scholar] [CrossRef]
- Ross, S.I.; Martinez, F.; Houde, S.; Muller, M.; Weisz, J.D. The Programmer’s Assistant: Conversational Interaction with a Large Language Model for Software Development. In Proceedings of the 28th International Conference on Intelligent User Interfaces, Sydney, Australia, 27–31 March 2023; pp. 491–514. [Google Scholar]
- Mehrabi, N.; Morstatter, F.; Saxena, N.; Lerman, K.; Galstyan, A. A Survey on Bias and Fairness in Machine Learning. ACM Comput. Surv. (CSUR) 2021, 54, 1–35. [Google Scholar] [CrossRef]
- Carvalho, D.V.; Pereira, E.M.; Cardoso, J.S. Machine Learning Interpretability: A Survey on Methods and Metrics. Electronics 2019, 8, 832. [Google Scholar] [CrossRef]
- Tjoa, E.; Guan, C. A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4793–4813. [Google Scholar] [CrossRef] [PubMed]
- Beigi, G.; Liu, H. A Survey on Privacy in Social Media: Identification, Mitigation, and Applications. ACM Trans. Data Sci. 2020, 1, 1–38. [Google Scholar] [CrossRef]
- Allamanis, M.; Barr, E.T.; Devanbu, P.; Sutton, C. A Survey of Machine Learning for Big Code and Naturalness. ACM Comput. Surv. (CSUR) 2018, 51, 1–37. [Google Scholar] [CrossRef]
- Lin, G.; Wen, S.; Han, Q.L.; Zhang, J.; ** Language to Code in Programmatic Context. ar**+Language+to+Code+in+Programmatic+Context&author=Iyer,+S.&author=Konstas,+I.&author=Cheung,+A.&author=Zettlemoyer,+L.&publication_year=2018&journal=ar** Program Repair Space with Existing Patches and Similar Code. In Proceedings of the 27th ACM SIGSOFT International Symposium On Software Testing And Analysis, Amsterdam, The Netherlands, 16–21 July 2018; pp. 298–309. [Google Scholar]
- Han, X.; Zhang, Z.; Ding, N.; Gu, Y.; Liu, X.; Huo, Y.; Qiu, J.; Yao, Y.; Zhang, A.; Zhang, L.; et al. Pre-trained Models: Past, Present and Future. AI Open 2021, 2, 225–250. [Google Scholar] [CrossRef]
- Lin, H.; Bilmes, J. How to Select a Good Training-Data Subset for Transcription: Submodular Active Selection for Sequences; Technical report; Washington University: Washington, DC, USA, 2009. [Google Scholar]
- Liang, W.; Zou, J. MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts and Training Conflicts. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
- Yin, Y.; Chen, C.; Shang, L.; Jiang, X.; Chen, X.; Liu, Q. AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Bangkok, Thailand, 1–6 August 2021; pp. 5146–5157. [Google Scholar]
- OpenAI. CHATGPT: Optimizing Language Models for Dialogue. 2023. Available online: https://online-chatgpt.com/ (accessed on 16 May 2023).
- Serban, I.V.; Sankar, C.; Germain, M.; Zhang, S.; Lin, Z.; Subramanian, S.; Kim, T.; Pieper, M.; Chandar, S.; Ke, N.R.; et al. A Deep Reinforcement Learning Chatbot. ar**v 2017, ar**v:1709.02349. [Google Scholar]
- Christiano, P.F.; Leike, J.; Brown, T.; Martic, M.; Legg, S.; Amodei, D. Deep Reinforcement Learning from Human Preferences. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Ling, L.; Tan, C.W. Human-assisted Computation for Auto-grading. In Proceedings of the IEEE International Conference on Data Mining Workshops, Singapore, 17–20 November 2018; pp. 360–364. [Google Scholar]
- Ziegler, D.M.; Stiennon, N.; Wu, J.; Brown, T.B.; Radford, A.; Amodei, D.; Christiano, P.; Irving, G. Fine-tuning Language Models from Human Preferences. ar**v 2019, ar**v:1909.08593. [Google Scholar]
- Stiennon, N.; Ouyang, L.; Wu, J.; Ziegler, D.; Lowe, R.; Voss, C.; Radford, A.; Amodei, D.; Christiano, P.F. Learning to Summarize with Human Feedback. Adv. Neural Inf. Process. Syst. 2020, 33, 3008–3021. [Google Scholar]
- Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training Language Models to Follow Instructions with Human Feedback. Adv. Neural Inf. Process. Syst. 2022, 35, 27730–27744. [Google Scholar]
- Hendler, J. Understanding the Limits of AI coding. Science 2023, 379, 548. [Google Scholar] [CrossRef] [PubMed]
- Chen, B.; Zhang, F.; Nguyen, A.; Zan, D.; Lin, Z.; Lou, J.G.; Chen, W. CodeT: Code Generation with Generated Tests. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
- White, A.D.; Hocky, G.; Ansari, M.; Gandhi, H.A.; Cox, S.; Wellawatte, G.P.; Sasmal, S.; Yang, Z.; Liu, K.; Singh, Y.; et al. Assessment of Chemistry Knowledge in Large Language Models That Generate Code. Digit. Discov. 2023, 2, 368–376. [Google Scholar] [CrossRef] [PubMed]
- Howard, J.; Ruder, S. Universal Language Model Fine-tuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; pp. 328–339. [Google Scholar]
- Wei, J.; Bosma, M.; Zhao, V.; Guu, K.; Yu, A.W.; Lester, B.; Du, N.; Dai, A.M.; Le, Q.V. Finetuned Language Models are Zero-Shot Learners. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-encoding Variational Bayes. ar**v 2013, ar**v:1312.6114. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Settles, B. Active Learning Literature Survey; University of Wisconsin: Madison, WI, USA, 2009. [Google Scholar]
- Cohn, D.A.; Ghahramani, Z.; Jordan, M.I. Active Learning with Statistical Models. J. Artif. Intell. Res. 1996, 4, 129–145. [Google Scholar] [CrossRef]
- Settles, B.; Craven, M.; Friedland, L. Active Learning with Real Annotation Costs. In Proceedings of the NIPS Workshop on Cost-sensitive Learning, Vancouver, BC, Canada, 8–13 December 2008. [Google Scholar]
- He, J.; Vechev, M. Large Language Models for Code: Security Hardening and Adversarial Testing. ar**v 2023, ar**v:2302.05319. [Google Scholar]
- Pearce, H.; Ahmad, B.; Tan, B.; Dolan-Gavitt, B.; Karri, R. Asleep at the Keyboard? Assessing the Security of Github Copilot’s Code Contributions. In Proceedings of the IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 22–26 May 2022; pp. 754–768. [Google Scholar]
- Peace, A.G.; Galletta, D.F.; Thong, J.Y. Software Piracy in the Workplace: A Model and Empirical Test. J. Manag. Inf. Syst. 2003, 20, 153–177. [Google Scholar]
- Reavis Conner, K.; Rumelt, R.P. Software piracy: An Analysis of Protection Strategies. Manag. Sci. 1991, 37, 125–139. [Google Scholar] [CrossRef]
- Limayem, M.; Khalifa, M.; Chin, W.W. Factors Motivating Software Piracy: A Longitudinal Study. IEEE Trans. Eng. Manag. 2004, 51, 414–425. [Google Scholar] [CrossRef]
- De Laat, P.B. Copyright or Copyleft?: An Analysis of Property Regimes for Software Development. Res. Policy 2005, 34, 1511–1532. [Google Scholar] [CrossRef]
- Kelty, C.M. Culture’s Open Sources: Software, Copyright, and Cultural Critique. Anthropol. Q. 2004, 77, 499–506. [Google Scholar] [CrossRef]
- The United States Copyright Office, Library of Congress. Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence. 2023. Available online: https://www.federalregister.gov/d/2023-05321 (accessed on 26 April 2023).
- Zheng, L.; Joe-Wong, C.; Tan, C.W.; Chiang, M.; Wang, X. How to Bid the Cloud. In Proceedings of the ACM Conference on Special Interest Group on Data Communication (SIGCOMM), London, UK, 17–21 August 2015; pp. 71–84. [Google Scholar]
- Zheng, L.; Joe-Wong, C.; Brinton, C.; Tan, C.W.; Ha, S.; Chiang, M. On the Viability of a Cloud Virtual Service Provider. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, Antibes Juan–les–Pins, France, 14–18 June 2016; pp. 235–248. [Google Scholar]
- Guo, S. INTITNI/CopilotForXcode: The Missing GitHub Copilot and ChatGPT Xcode Source Editor Extension. Available online: https://github.com/intitni/CopilotForXcode (accessed on 18 May 2023).
Title | Year | Focus Area |
---|---|---|
A Survey of Machine Learning for Big Code and Naturalness [15] | 2019 | Big Code and Naturalness |
Software Vulnerability Detection Using Deep Neural Networks: A Survey [16] | 2020 | Security |
A Survey on Machine Learning Techniques for Source Code Analysis [17] | 2021 | Code Analysis |
Deep Security Analysis of Program Code: A Systematic Literature Review [18] | 2022 | Security |
A Survey on Pretrained Language Models for Neural Code Intelligence [19] | 2022 | Code Summarization and Generation, and Translation |
Deep Learning Meets Software Engineering: A Survey on Pre-trained Models of Source Code [20] | 2022 | Software Engineering |
Software as Storytelling: A Systematic Literature Review [21] | 2023 | Storytelling |
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing [22] | 2023 | Prompt-based Learning |
Dataset Name | Year | Sample Size | Language(s) | Supported Task(s) | Online URL |
---|---|---|---|---|---|
GitHub Java Corpus [23] | 2013 | 14.7K | Java | Code Completion | https://groups.inf.ed.ac.uk/cup/javaGithub/ |
Description2Code [24] | 2016 | 7.6K | Java, C# | Code Generation, Code Summarization | https://github.com/ethancaballero/description2code |
BigCloneBench [25] | 2015 | 5.5K | Java | Defect Detection, Clone Detection | https://github.com/clonebench/BigCloneBench |
CodRep [26] | 2018 | 58K | Java | Code Refinement, Defect Detection | https://github.com/ASSERT-KTH/CodRep-competition |
CONCODE [27] | 2018 | 104K | Java | Code Generation | https://github.com/sriniiyer/concode |
WikiSQL [28] | 2018 | 87K | SQL | Code Summarization | https://github.com/salesforce/WikiSQL |
Bugs2Fix [29] | 2019 | 122K | Java | Defect Detection, Code Refinement | https://sites.google.com/view/learning-fixes |
Devign [30] | 2019 | 26.4K | C | Code Generation, Defect Detection | https://sites.google.com/view/devign |
CodeSearchNet [31] | 2019 | 2M | Python, Javascript, Ruby, Go, Java, PHP | Code Generation, Code Summarization, Code Translation | https://github.com/github/CodeSearchNet |
The Pile [32] | 2020 | 211M | Python | Coder Generation | https://pile.eleuther.ai |
CodeNet [33] | 2021 | 13M | C++, C, Python, Java | Code Generation, Code Refinement | https://github.com/IBM/Project_CodeNet |
CodeXGLUE [34] | 2021 | 176K | Python, Java, PHP, JavaScript, Ruby, Go | Code Generation, Code Completion, Code Summarization, Defect Detection | https://github.com/microsoft/CodeXGLUE |
HumanEval [35] | 2021 | 164 | Python | Code Generation | https://github.com/openai/human-eval |
APPS [36] | 2021 | 10K | Python | Code Generation | https://github.com/hendrycks/apps |
Codeparrot [37] | 2022 | 22M | Python | Code Generation | https://hf.co/datasets/transformersbook/codeparrot |
CodeContests [38] | 2022 | 13.6K | C++, Java, JavaScript, C# and 8 more | Code Generation | https://github.com/deepmind/code_contests |
CERT [39] | 2022 | 5.4M | Python | Code Generation | https://github.com/microsoft/PyCodeGPT |
InCoder [40] | 2022 | 670K | Python, JavaScript, HTML and 24 more | Code Generation, Code Summarization | https://github.com/dpfried/incoder |
PolyCoder [41] | 2022 | 1K | C, C++, Java, JavaScript, C#, Go and 6 more | Code Generation | https://github.com/VHellendoorn/Code-LMs |
ExecEval [42] | 2023 | 58K | Ruby, Javascript, Go, C++, C and 6 more | Code Sumarization, Code Generation, Code Translation | https://github.com/ntunlp/xCodeEval |
Model | Type | AI-Assisted Programming Tasks |
---|---|---|
Encoder-only | Understanding | Code Summarization, Code Translation |
Decoder-only | Generation | Code Generation, Code Completion |
Encoder–decoder | Generation and Understanding | Code Generation, Code Refinement, Defect Detection, Clone Detection |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wong, M.-F.; Guo, S.; Hang, C.-N.; Ho, S.-W.; Tan, C.-W. Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review. Entropy 2023, 25, 888. https://doi.org/10.3390/e25060888
Wong M-F, Guo S, Hang C-N, Ho S-W, Tan C-W. Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review. Entropy. 2023; 25(6):888. https://doi.org/10.3390/e25060888
Chicago/Turabian StyleWong, Man-Fai, Shangxin Guo, Ching-Nam Hang, Siu-Wai Ho, and Chee-Wei Tan. 2023. "Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review" Entropy 25, no. 6: 888. https://doi.org/10.3390/e25060888