The lexicographic potential of artificial intelligence: a case study of English loanwords in the Croatian language

Authors

  • Katica Balenović University of Zadar
  • Jakov Proroković University of Zadar

DOI:

https://doi.org/10.33604/sl.19.36.3

Keywords:

ChatGPT, lexicography in language contact, overgeneralisation errors, corpus-based sampling, loanwords

Abstract

The advent of generative artificial intelligence (AI) and large language models (LLMs) has introduced new possibilities in lexicography, particularly in defining dictionary entries with precision, while reducing the time cost compared to more traditional methods or software tools. To test AI’s linguistic capabilities, our study goes beyond monolingual dictionary compilation and investigates the potential of the ChatGPT model in distinguishing between specific senses of loanwords in an L2 context. A corpus-based sampling of target English words was used to assess ChatGPT’s ability to delineate different word senses in which regularly occurring loanwords can be realised in the Croatian language context. The findings indicate that AI demonstrates notable proficiency in providing definitions in general, albeit with observable flaws when responding to prompts that specifically inquire about the possible senses or word classes of targeted loanwords in their L2 setting. Its accuracy diminishes when dealing with less frequently used loanwords, often exhibiting overgeneralisation from English (L1) to Croatian (L2). The AI’s tendency to produce erroneous examples, with suggested usages that lack attestation in language corpora, is discussed in detail, with the results supporting the notion that the model primarily interprets loanwords from an English perspective, regardless of the language used in the prompt. A comparison between AI responses from early 2024 and early 2025 suggests an improvement in the 2025 model, which exhibits a more nuanced handling of ambiguous cases. However, inconsistencies persist, particularly in how frequency of use correlates with the number of senses, much of which is interpreted as ChatGPT’s tendency to sometimes prioritise generating a response at the cost of accuracy.

Downloads

Published

2025-06-16

Issue

Section

Original scientific paper