The lexicographic potential of artificial intelligence: a case study of English loanwords in the Croatian language

Katica Balenović; Jakov Proroković

doi:10.33604/sl.19.36.3

The lexicographic potential of artificial intelligence: a case study of English loanwords in the Croatian language

Authors

Katica Balenović University of Zadar
Jakov Proroković University of Zadar

DOI:

https://doi.org/10.33604/sl.19.36.3

Keywords:

ChatGPT, lexicography in language contact, overgeneralisation errors, corpus-based sampling, loanwords

Abstract

The advent of generative artificial intelligence (AI) and large language models (LLMs) has introduced new possibilities in lexicography, particularly in defining dictionary entries with precision, while reducing the time cost compared to more traditional methods or software tools. To test AI’s linguistic capabilities, our study goes beyond monolingual dictionary compilation and investigates the potential of the ChatGPT model in distinguishing between specific senses of loanwords in an L2 context. A corpus-based sampling of target English words was used to assess ChatGPT’s ability to delineate different word senses in which regularly occurring loanwords can be realised in the Croatian language context. The findings indicate that AI demonstrates notable proficiency in providing definitions in general, albeit with observable flaws when responding to prompts that specifically inquire about the possible senses or word classes of targeted loanwords in their L2 setting. Its accuracy diminishes when dealing with less frequently used loanwords, often exhibiting overgeneralisation from English (L1) to Croatian (L2). The AI’s tendency to produce erroneous examples, with suggested usages that lack attestation in language corpora, is discussed in detail, with the results supporting the notion that the model primarily interprets loanwords from an English perspective, regardless of the language used in the prompt. A comparison between AI responses from early 2024 and early 2025 suggests an improvement in the 2025 model, which exhibits a more nuanced handling of ambiguous cases. However, inconsistencies persist, particularly in how frequency of use correlates with the number of senses, much of which is interpreted as ChatGPT’s tendency to sometimes prioritise generating a response at the cost of accuracy.

Downloads

Published

2025-06-16

Issue

Vol. 19 No. 36 (2025)

Section

Original scientific paper

License

Copyright for papers published in this journal is retained by the authors, with first publication rights granted to the journal (this applies to both print and electronic issue). Papers in the journal are licensed under the Creative Commons: Attribution (CC-BY), which permits users to copy and redistribute the material in any medium or format, as well as to remix, transform and build upon material in educational and other settings, provided that the credit is given to the author and that the original work is properly cited. Complete legal background of license is available at: https://creativecommons.org/licenses/by/4.0/legalcode. It is the author’s responsibility to obtain permission to reproduce material from other sources. They also bear full responsibility in any cases of copyright infringement.

The lexicographic potential of artificial intelligence: a case study of English loanwords in the Croatian language

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

Developed By

Language

Information

Make a Submission