Transfer Learning and Analogical Inference: A Critical Comparison of Algorithms, Methods, and Applications
Abstract
:1. An Introduction to Human and Artificial Intelligence Learning
- Identify critical differences and similarities between transfer learning and analogical inference;
- Review relevant evidence from the fields of computer vision and natural language processing;
- Make recommendations for future research integrating transfer learning and analogical inference.
2. Brief Overview of Machine Learning
3. What Is Transfer Learning?
4. What Is Analogical Inference?
- Retrieval—accessing a similar scenario from long-term memory;
- Map**—the aligning of elements, structures, and concepts between the source and target;
- Evaluation—judgment of the quality of specific aspects, such as the inferences made, map** created, and/or the analogy in general.
5. Comparisons of Transfer Learning and Analogical Inference in Two Application Domains
5.1. Computer Vision
5.2. Natural Language Processing
Category | Cognitive-Science-Inspired Architectures | Vector Space Models (VSMs) | Transformer Language Models |
---|---|---|---|
Primary task(s) | Map elements between a base domain to a target domain and/or infer new elements within the target domain | Derive quantitative relationships (typically similarity via the cosine distance) between two bodies of text via static word embeddings | Derive contextual meaning for completing general verbal tasks such as text generation, translation, etc. |
Solvable Types of Analogy | |||
Word-based | Yes | Yes | Yes |
Sentence-based | Yes * | No | Maybe |
Story-based | Yes * | No | Maybe |
Example algorithms/ models |
|
5.3. Summary
6. Future Directions
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kemp, C.; Goodman, N.D.; Tenenbaum, J.B. Learning to learn causal models. Cog. Sci. 2010, 34, 1185–1243. [Google Scholar] [CrossRef] [PubMed]
- Illeris, K. A Comprehensive Understanding of Human Learning. In Contemporary Theories of Learning, 2nd ed.; Illeris, K., Ed.; Routledge: New York, NY, USA, 2018; pp. 1–14. [Google Scholar]
- Meltzoff, A.N.; Kuhl, P.K.; Movellan, J.; Sejnowski, T.J. Foundations for a new science of learning. Science 2009, 325, 284–288. [Google Scholar] [CrossRef] [PubMed]
- Lansdell, B.J.; Kording, K.P. Towards learning-to-learn. Curr. Opin. Behav. Sci. 2019, 29, 45–50. [Google Scholar] [CrossRef]
- Griffiths, T.L.; Tenenbaum, J.B. Theory-based causal induction. Psy. Rev. 2009, 116, 661–716. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Mitchell, M. Abstraction and analogy-making in artificial intelligence. Ann. N. Y. Acad. Sci. 2021, 1505, 79–101. [Google Scholar] [CrossRef] [PubMed]
- Gobet, F.; Lane, P.C.R.; Croker, S.; Cheng, P.C.H.; Jones, G.; Oliver, I.; Pine, J.M. Chunking mechanisms in human learning. Trends Cogn. Sci. 2001, 5, 236–243. [Google Scholar] [CrossRef]
- Barnett, S.M.; Ceci, S.J. When and where do we apply what we learn?: A taxonomy for far transfer. Psychol. Bull. 2002, 128, 612–637. [Google Scholar] [CrossRef]
- Billard, A.G.; Calinon, S.; Dillmann, R. Learning from humans. In Springer Handbook of Robotics; Siciliano, B., Khatib, O., Eds.; Springer: Secaucus, NJ, USA, 2016; pp. 1995–2014. [Google Scholar] [CrossRef]
- Lake, B.M.; Ullman, T.D.; Tenenbaum, J.B.; Gershman, S.J. Building machines that learn and think like people. Behav. Brain Sci. 2017, 40, e253. [Google Scholar] [CrossRef] [Green Version]
- Domingos, P. The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World; Basic Books: New York, NY, USA, 2015. [Google Scholar]
- Reed, S. Building bridges between AI and cognitive psychology. AI Mag. 2019, 40, 17–28. [Google Scholar] [CrossRef]
- Sarker, I.H. Machine learning: Algorithms, real-world applications and research directions. SN Comp. Sci. 2021, 2, 160. [Google Scholar] [CrossRef]
- Zhang, Y.; Yang, Q. An overview of multi-task learning. Natl. Sci. Rev. 2018, 5, 30–43. [Google Scholar] [CrossRef] [Green Version]
- Zhuang, F.; Qi, Z.; Duan, K.; Dongbo, X.; Zhu, Y.; Zhu, H.; **. Cog. Sci. 2001, 25, 245–286. [Google Scholar] [CrossRef]
- Duncker, K.; Lees, L.S. On problem-solving. Psychol. Monogr. 1945, 58, i–113. [Google Scholar] [CrossRef]
- Tammina, S. Transfer learning using VGG-16 with deep convolutional neural network for classifying images. Int. J. of Sci. Res. Publ. 2019, 9, 143–150. [Google Scholar] [CrossRef]
- Rajagopal, A.K.; Subramanian, R.; Ricci, E.; Vieriu, R.; Lanz, O.; Kalpathi, R.R.; Sebe, N. Exploring transfer learning approaches for head pose classification from multi-view surveillance images. Int. J. Comput. Vis. 2014, 109, 146–167. [Google Scholar] [CrossRef] [Green Version]
- Kan, M.; Wu, J.; Shan, S.; Chen, X. Domain adaptation for face recognition: Targetize source domain bridged by common subspace. Int. J. Comput. Vis. 2014, 109, 94–109. [Google Scholar] [CrossRef]
- Romera-Paredes, B.; Aung, M.S.H.; Pontil, M.; Bianchi-Berthouze, N.; Williams, A.C.D.C.; Watson, P. Transfer learning to account for idiosyncrasy in face and body expressions. In Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, Shanghai, China, 22–26 April 2013. [Google Scholar] [CrossRef]
- LeCun, Y.; Huang, F.J.; Bottou, L. Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 27 June–4 July 2004; pp. ii–104. [Google Scholar] [CrossRef] [Green Version]
- Yuan, X.; Li, D.; Mohapatra, D.; Elhoseny, M. Automatic removal of complex shadows from indoor videos using transfer learning and dynamic thresholding. Comput. Electr. Eng. 2018, 70, 813–825. [Google Scholar] [CrossRef]
- Gong, R.; Dai, D.; Chen, Y.; Li, W.; Paudel, D.P.; Gool, L.V. Analogical image translation for fog generation. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021. [Google Scholar] [CrossRef]
- Li, Q.; Tang, S.; Peng, X.; Ma, Q. A method of visibility detection based on the transfer learning. J. Atmos. Ocean. Technol. 2019, 36, 1945–1956. [Google Scholar] [CrossRef]
- Saenko, K.; Kulis, B.; Fritze, M.; Darrell, T. Adapting visual category models to new domains. In Proceedings of the European Conference on Computer Vision, Heraklion, Greece, 5–11 September 2010; pp. 213–226. [Google Scholar] [CrossRef]
- Gopalan, R.; Li, R.; Chellappa, R. Domain adaptation for object recognition: An unsupervised approach. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 999–1006. [Google Scholar] [CrossRef]
- Soh, J.W.; Cho, S.; Cho, N.I. Meta-transfer learning for zero-shot super-resolution. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3516–3525. [Google Scholar] [CrossRef]
- Fleuret, F.; Li, T.; Dubout, C.; Wampler, E.K.; Yantis, S.; Geman, D. Comparing machines and humans on a visual categorization test. Proc. Natl. Acad. Sci. USA 2011, 108, 17621–17625. [Google Scholar] [CrossRef] [Green Version]
- Kimg, J.; Ricci, M.; Serre, T. Not-So-CLEVR: Learning same–different relations strains feedforward neural networks. Interface Focus. 2018, 8, 20180011. [Google Scholar] [CrossRef]
- Barrett, D.G.T.; Hill, F.; Santoro, A.; Morcos, A.S.; Lillicrap, T. Measuring abstract reasoning in neural networks. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 511–520. [Google Scholar]
- Halford, G.S.; Wilson, W.H.; Philips, S. Processing capacity defined by relational complexity: Implications for comparative, developmental, and cognitive psychology. Behav. Brain Sci. 1998, 21, 803–831. [Google Scholar] [CrossRef] [Green Version]
- Hummel, J.E.; Holyoak, K.J. Distributed representations of structure: A theory of analogical access and map**. Psychol. Rev. 1997, 104, 427–466. [Google Scholar] [CrossRef]
- Hummel, J.E.; Holyoak, K.J. A symbolic-connectionist theory of relational inference and generalization. Psychol. Rev. 2003, 110, 220–264. [Google Scholar] [CrossRef] [Green Version]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, Proceedings of the 26th Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; pp. 3111–3119.
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar] [CrossRef]
- Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 2017, 5, 135–146. [Google Scholar] [CrossRef] [Green Version]
- Rogers, A.; Drozd, A.; Li, B. The (too many) problems of analogical reasoning with word vectors. In Proceedings of the 6th Joint Conference on Lexical and Computational Semantics, Vancouver, BC, Canada, 3–4 August 2017; pp. 135–148. [Google Scholar] [CrossRef]
- Günther, F.; Rinaldi, L.M.M. Vector-space models of semantic representation from a cognitive perspective: A discussion of common misconceptions. Perspect. Psychol. Sci. 2019, 14, 1006–1033. [Google Scholar] [CrossRef] [Green Version]
- Peterson, J.C.; Chen, D.; Griffiths, T.L. Parallelograms revisited: Exploring the limitations of vector space models for simple analogies. Cognition 2020, 205, 104440. [Google Scholar] [CrossRef] [PubMed]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Settlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. ar** with semantic relation networks. Psychol. Rev. 2022, 129, 1078–1103. [Google Scholar] [CrossRef] [PubMed]
- Holyoak, K.J.; Thagard, P. Analogical map** by constraint satisfaction. Cogn. Sci. 1989, 13, 295–355. [Google Scholar] [CrossRef]
- Kokiov, B. A hybrid model of reasoning by analogy. In Advances in Connectionist and Neural Computation Theory Vol. 2: Analogical Connections; Holyoak, K., Barnden, J., Eds.; Ablex Publishing Corporation: New York, NY, USA, 1994; pp. 247–318. [Google Scholar]
- Doumas, L.A.A.; Hummel, J.E.; Sandhofer, C.M. A theory of the discovery and predication of relational concepts. Psychol. Rev. 2008, 115, 1–43. [Google Scholar] [CrossRef] [Green Version]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Berstein, M.; et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
Family of Learning | Description |
---|---|
Symbolist (AKA inductive) | An inductive-reasoning-based method which is based on representing concepts through “symbols” which can be mathematically manipulated |
Connectionism | A bottom-up approach inspired by the brain’s ability to function based on firing neurons (“activation”) |
Bayesian | A technique based using on estimated event probability to predict outcomes, popular in causal situations |
Evolutionary | A biological-inspired approach based on the ability to evolve and mutate in hopes of achieving better performance |
Analogizers | A method based on identifying patterns and similarities between two domains and learning through various types of reasoning |
Type of Learning | Description |
---|---|
Deep | A subset of ML that utilizes artificial neural networks (ANNs) with three or more layers, sometimes called “deep neural networks (DNNs)” |
Supervised | Prediction of a classification or value (in the case of regression) where the data (used for training and testing) are all labeled and known |
Unsupervised | Utilizes only unlabeled data typically for clustering or association problems |
Semi-supervised | A hybrid approach between supervised and unsupervised involving labeled and unlabeled data |
Self-supervised | A type of unsupervised and/or semi-supervised methods where arbitrary labels are created for each data instance allowing it to act as a supervised learning problem |
Contrastive | Type of self-supervised methods where data instances are compared to one another to determine their clustering into classes |
Reinforcement | Approach utilizing a reward/penalty system upon evaluation of the environment and its goal(s) |
Multi-view | An instance where a specific problem can be accurately viewed in two different manners |
Multi-task | Technique focused on solving multiple related problems/tasks simultaneously |
Ensemble | A method that combines individual models into a larger model to improve performance |
Active | A specific type of semi-supervised learning where an “oracle,” commonly a human, is used to derive information about or assess the algorithm and/or its performance |
Online | A method that is best applied when the data to train the algorithm is introduced sequentially as the algorithm’s parameters are fine-tuned |
Zero-shot | Type of learning where an algorithm has “unseen” data outside of its initial training dataset with the ability to accurately identify and classify said unseen data |
Few-shot/low-shot/one-shot/multi-shot | A semi-supervised approach that uses a very small dataset of labeled instances (or only one instance in the case of one-shot) |
Transfer | A technique where some portion of knowledge from one domain is transferred and applied to a different but related domain |
Analogical inference | Method where an analogy is formed between two different but related domains to draw inferences between each other as well as specifically the target domain |
Transfer Learning | Analogical Inference | |
---|---|---|
Transferred Knowledge Representation | Feature spaces | Relational structures |
Primary Tasks | Classification Generation | Classification Generation Map** Retrieval |
Amount of Required Training Data | Large amounts | Small amounts |
Scope of Transfer | Near | Near and Far |
Task Number | Task | Computer Vision | Natural Language Processing |
---|---|---|---|
1 | Using any portion of a pre-trained model | Utilizing a pre-trained ANN architecture with frozen layers to predict a subset of the classes it was originally trained on | Utilizing a pre-trained vector space model to determine the similarity between two documents |
2 | Parameter fine-tuning | Utilizing a pre-trained ANN architecture with unfrozen layers to predict image classes in the target dataset A more specific case of Task 1 | Adjusting the parameter values on a support vector machine to classify good and negative customer reviews |
3 | Feature extraction on similar/ analogous data | Transfer learning: Identifying important features on a target dataset given the features of a source dataset | Transfer learning: Identifying the most relevant words within a text document |
Analogical inference: Given the important visual elements of the source portion of an analogy, identifying the important elements of its target counterpart | Analogical inference: Given the important textual elements of the source portion of an analogy, identifying the important elements of its target counterpart | ||
4 | Drawing inference on new data based on a similar knowledge space | Transfer learning: Identifying the location of four wheels on a truck given their locations on a car | Identifying the best D-word to complete the given incomplete textual analogy, A:B::C:? |
Analogical inference: Inferring that since trucks have four wheels and cars are like trucks, cars must also have four wheels | |||
5 | Map** elements between two domains | Deciding which geometric elements of a source image corresponds best to the geometric elements of a target image (for example, utilizing A and C of Figure 6a, the large triangle of A can map to either the large or small square of C) | Given two analogous stories, identify which elements (characters, plot, setting, etc.) in the source story maps to the target story (for example, Figure 6e, determining whether to map the military general to the surgeon or the patient) |
6 | Re-representation | Identifying where a knee would be on a tree given its location on a human | Identifying the underlying common relationship between two different words that are usually not synonymous (for example, the relationship between the two sentences, “The runner trained for the marathon” and “The student studied for the test,” could be re-representations of the word “preparing”) |
8 | Schema abstraction (generalization) | Creation of a general representational structure (or schema) that derives from the identified relationships within the source and target data |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Combs, K.; Lu, H.; Bihl, T.J. Transfer Learning and Analogical Inference: A Critical Comparison of Algorithms, Methods, and Applications. Algorithms 2023, 16, 146. https://doi.org/10.3390/a16030146
Combs K, Lu H, Bihl TJ. Transfer Learning and Analogical Inference: A Critical Comparison of Algorithms, Methods, and Applications. Algorithms. 2023; 16(3):146. https://doi.org/10.3390/a16030146
Chicago/Turabian StyleCombs, Kara, Hong**g Lu, and Trevor J. Bihl. 2023. "Transfer Learning and Analogical Inference: A Critical Comparison of Algorithms, Methods, and Applications" Algorithms 16, no. 3: 146. https://doi.org/10.3390/a16030146