4.1. Advantages
GPT-4V demonstrates a remarkable capability of information retrieval from maps, especially embedded textual information. Throughout all prompts in the map reading and analysis sections (
Section 2 and
Section 3), GPT-4V consistently and successfully extracted the textual information, and provided a precise description of map elements, such as legend items, figure caption, and scales. This skill is particularly important when reading domain-specific maps with complicated scales and legend items, where precision is crucial.
GPT-4V can connect places observed from a map to its pre-trained geographic knowledge. For instance, as demonstrated in Prompt 2.5, GPT-4V was able to extend beyond the map’s displayed content and reference geographical information about locations not shown in the image, such as Cypress near the Houston area. Similarly, in Prompt S2, the geographical relationship between Hollywood and North Miami Beach was not explicitly mentioned on the map, yet GPT-4V was able to acquire position information about Hollywood and North Miami Beach. Although there was an error in our experimental results (the correct statement should be that Hollywood is north of North Miami Beach), such a capability in map reading is rare and noteworthy. This suggests that GPT-4V’s training model encompasses the geographic location information of these places, and that it understands how such information is interrelated. Consequently, GPT-4V is able to establish spatial connections, which proves to be incredibly useful in the context of map reading and analysis. By drawing from its large background information, GPT-4V can relate to and incorporate external geographic information, thereby enhancing the depth and accuracy of its interpretations within the map’s framework.
- 3.
Comprehending Complex Symbology
GPT-4V’s advanced capabilities in map reading, particularly in understanding complex map symbology (e.g., thematic maps with different symbology types, color schemes, and classifications), give it a significant advantage over traditional methods. Complex legends can often confuse human readers, but GPT-4V, supported by a large pre-trained language model, consistently identifies similar or analogous information to use as references. This allows for a more accurate interpretation of the information in the legends. For instance, as seen in Prompts 1.3 and 2.4, GPT-4V was able to discern color information corresponding to different grades or symbols representing various data representations. GPT-4V goes beyond merely extracting this information; its real value lies in utilizing the large language model to comprehend and elucidate the significance of this information. In an example like Prompt S6, GPT-4V did not just list the content indicated by the legend but also inferred conclusions about the data, such as associating darker shades with higher income levels.
- 4.
Spatial Patterns Recognition
GPT-4V also exhibits an outstanding capacity for pattern recognition, as evidenced by its performance in experiments involving point patterns. In scenarios such as those presented in Prompts 2.1 and 2.2, GPT-4V successfully identified different point patterns—dispersed, regular, and random—matching the discernment capabilities of the human eye. Moreover, in tasks related to comparison, GPT-4V proved its mettle, as seen in Prompt 2.5, where it accurately detected changes in nighttime light before and during the winter storm. While there were occasional instances where it could not provide image comparison responses, generally, when provided with ample contextual conditions and after some prompt engineering, GPT-4V could effectively recognize and compare images.
GPT-4V shows an exceptional ability to process maps with high-resolution and precise information, outperforming human capabilities in certain aspects. For example, in Prompt S6, where maps contain an overwhelming number of elements, humans may struggle to quickly locate the necessary information amidst the complexity. GPT-4V, on the other hand, can identify textual information promptly with a wide range of associated background knowledge. This capability presents a significant opportunity, particularly considering the vast archives of historical maps that remain unused due to their complexity. The widespread application of GPT-4V to such historical maps could extract and weave together a wealth of information, creating a new network of interconnected map information. This network has the potential to inspire novel findings, transforming the way we comprehend historical cartography and its narratives.
- 6.
Understanding Domain-Specific Maps
GPT-4V has the potential to greatly assist laymen in reading domain-specific maps, which often come with a steep learning curve due to map complexity. Maps rich in specialized content, like Local Indicators of Spatial Association (LISA) maps (Prompt S4) or those employing the Köppen climate classification (Prompt S3), typically require a solid background in the subject matter to be fully understood. Prior to GPT-4V, a person would need to turn to search engines to supplement their understanding with background knowledge. GPT-4V, however, can streamline this process by directly providing relevant information, paving a new pathway for understanding domain-specific maps. By leveraging its pre-trained large language model, it can offer comprehensive and related information, aiding people from non-specialized fields in quickly gras** the content of complex maps. This feature of GPT-4V not only enhances the accessibility of specialized geographical data but also enriches the user’s learning experience by simplifying the acquisition of domain knowledge.
GPT-4V improves the efficiency of reading and analyzing maps when compared with humans. In most of our tests, the response time from OpenAI API is less than 20 s, which is notably faster than what it typically takes for a person to observe a map and type the descriptions, e.g., ranging from 30 to 95 words per minute (wpm) with a mean around 50 wpm [
35]. A sample response in our test is around 200 words, which may take 4 min to type, and the GPT-4V’s response time is approximately 12 times faster than that of a human, while this calculation does not even factor in the additional time humans require for organizing and structuring their responses. Furthermore, in scenarios that require reading or analyzing multiple maps, the time saving becomes even more pronounced. Humans may require substantially more time to comprehend each map individually, whereas GPT-4V can be set to process multiple requests simultaneously through the GPT-4 API.
4.2. Disadvantages
In the evaluation of GPT-4V’s performance, a notable concern is its accuracy in quantitative assessments. During tests, such as those associated with Prompt S2, GPT-4V demonstrated limitations in providing precise scale measurements from two maps. In response to Prompt 2.6, GPT-4V excels in qualitative analysis, suggesting its suitability for descriptive tasks rather than accurate quantitative evaluations. GPT-4V consistently indicated that it does not support quantitative analysis on maps, which implies that GPT-4V can effectively interpret and describe map content but remains restricted for quantitative tasks. Users should be aware of this limitation and may prefer to utilize GPT-4V in contexts where descriptive insight and qualitative interpretation are the primary objectives.
- 2.
Dependence on Prompt Engineering
The application of GPT-4V in map analysis often requires careful prompt engineering. In most cases, initial inquiries rarely yielded comprehensive answers (e.g., Prompt 2.1). Thus, refining the prompts is essential to guide the AI toward the desired range of responses, enhancing the relevance and accuracy of the information provided. In other words, prompt engineering is a cornerstone of effective AI deployment. This iterative approach of fine-tuning prompts is a critical step to effectively leverage GPT-4V’s capabilities in map reading and analysis, necessitating significant effort and collaboration as an additional step.
- 3.
Difficult Results Validation
The current version of GPT-4V encounters technical difficulties, particularly with analyzing graphs or text involving varying colors or styles, such as solid, dashed, or dotted lines. This challenge might be caused by the way data are parsed and fed into the GPT model. While immediate improvements on this front may be challenging, future updates and releases from OpenAI could potentially address and alleviate these issues.
The validation of results provided by GPT-4V presents a concern. Currently, the validation of its experimental outcomes is conducted by the authors, which could potentially introduce biases. Consequently, our experiments do not definitively conclude that GPT-4V always delivers high-quality results. A solution in future studies could be to increase the number of experiments to further explore the stability and reliability of GPT-4V.
GPT-4V is like a “black box”, with its underlying principles and logic challenging to explain. In essence, it relies on its large pre-trained language model and deep neural networks in transformers to generate responses. The architecture of GPT-4V is conceptually straightforward, yet providing a clear explanation of its internal process is complex. The responses it generates can vary, sometimes depending more on the information extracted directly from the map, and at other times, leaning heavily on its pre-trained data. The balance between extracting visual content from the map and utilizing information from its large language model training is not easily controllable. This variation in results further complicates the explanation of GPT-4V’s mechanism.
- 6.
Reproducibility Concern
The reproducibility and performance of GPT-4V in our experiments raise concerns. Due to the nature of large language models and GPT’s characteristics, consistent results cannot be expected. Thus, there is a possibility that our current findings could be coincidental. Moreover, the scope of our experiments may be limited, indicating GPT-4V’s capabilities only under specific conditions. Its performance in more complex tasks remains uncertain and requires further experiments. Although GPT-4V uses complicated algorithms to predict the next word, which may impact reproducibility, it is likely to produce similar outcomes when given the same prompts and images. However, there is still a need for more experiments to help better understand and validate GPT-4V’s capabilities in map reading and interpretation.
It is noteworthy that in our experiment, there were several instances where GPT-4V refused to respond to our map-related queries. As outlined in the GPT-4V system card, such non-responsiveness can be triggered by issues like harmful content, privacy concerns, cybersecurity, or multimodal jailbreaks [
22]. Yet, in our case, these triggers did not evidently apply to our prompts. This discrepancy suggests a possible misidentification of such triggers by the system. While revising the phrasing of our prompts sometimes resulted in an effective response from GPT-4V, this inconsistency highlights a need for clearer guidelines in GPT-4V’s future documentation, particularly regarding what types of prompts the system can process.
4.3. Recommendations
Based on the aforementioned advantages and disadvantages, we produce the following recommendations for when and how to best use GPT-4V for map reading and analysis:
First, GPT-4V can be a great assistant with fast information retrieval from large-volume, high-frequency, high-resolution maps with complex symbology. Our experiments have demonstrated its capabilities of correctly reading maps of various types, styles, and topics. Thanks to its machine nature that allows for programming and automation, it is safe to recommend using GPT-4V for processing batch maps for reading and analysis simultaneously. This multi-processing capability is particularly valuable in fields that require processing a large volume of map data in a short time, such as real-time monitoring of hotspots of crime incidents, car accidents, or disease outbreaks. Similarly, it is beneficial for long-term large-scale map-based monitoring, such as coastal erosion detection from time-series satellite-image maps, or drought monitoring from daily precipitation maps. Automating the process of examining batches of maps and summarizing observable patterns in writing using GPT API is practical low-hanging fruit benefiting from this new technology.
Second, it significantly lowers the learning curve of map reading for many. Proper map reading requires basic knowledge of cartography and geography, and it takes practice. Even for someone who is skillful and experienced, it is still challenging to read reference maps of unfamiliar geographic regions, or thematic maps of unfamiliar topics. GPT-4V can well address such challenges thanks to its pre-trained geography and domain knowledge. For instance, GPT-4V serves as a “local guide” to explain map labels of unfamiliar places or place names in a foreign language. It can also translate jargon that appears in maps to layman’s terms to ease the way of understanding spatial patterns.
Third, though GPT-4V showed spectacular performance in most tasks tested, it still presents some limitations in recognizing patterns in maps (
Figure S8). Such mistakes do not always occur, making it hard to identify the core issue. Thus, our recommendation for the use of GPT-4V in reading and analyzing maps is to run it more than once and synthesize the results derived from responses to avoid using casual false interpretations from the model.
Fourth, GPT-4V can facilitate the research process in geographic information science. Spatial pattern recognition from maps often serves as the first step of exploratory spatial data analysis, especially in the big data era. Various research hypotheses can be formed based on the observed patterns. And these hypotheses can be further validated with confirmatory analysis using real-world data to form new empirical findings or even new theories. GPT-4V can significantly enhance this process by mining spatial patterns from maps, summarizing the patterns in writing, and comparing them with the literature in its pre-trained database to suggest which patterns are worth further study.