Words Lost and Found
The Diachronic Dynamics of the Arabic Lexicon
Petr Zemánek and Jiří Milička
This current volume examines the Arabic lexicon from the point of view of changes on a chronological axis. Our approach is both lexical and dynamic, as there are several points that address the dynamic behavior of the lexical items and their subsystems. The lexicon is regarded as a wide variety of views, not just a list of words, therefore, we examine the lexical lists from several points of view, with only one connected with words as such. The composition of the words, especially of the phonemes (in our case, consonants), consisting of the sum of the words are also observed. In addition, various samples of words, such as chosen types of parts of speech, and morphological classes, such as plurals, are taken into consideration.
At the same time, we work with quantitative data. Based on a corpus of Arabic that covers all the historical phases of Arabic (cf. below), we were able to acquire data from the exact historical phases of the use of Arabic in written records, and we have used this data as the basis of our considerations.
We only consider such lexemes that exhibit a frequency of at least 20 occurrences in the whole corpus. This threshold is set up arbitrarily, but takes into consideration the time span and number of chronological divisions, so that there is still a chance that the distribution of the occurrences is not only singulative and concentrated only into one work or in titles by only one author.
The behavior of the sets of lexemes is investigated by applying the hypothesis that there ought to be some general patterns that enable to see the type of changes that have occurred in Arabic throughout the ages.
We are interested in relations between frequency and stability, which are often posited as those that influence the possible survival of a given lexeme in the lexicon. At the same time, the usage of highly frequent words can have variations across time. This will also be one of our points of interest.
The loss of words or creation of new words is a process that occurs in all languages. Arabic is certainly no exception, although, in the case of classical Arabic, there are many signs of an increased conservative character.
The question of the characteristic words for a given century was also investigated: a keyness metric based on the Relative Risk was used to obtain the keywords for each century. The word distribution in each century was compared with the rest of the corpus and the most overrepresented words were retrieved. For this purpose, we have used the Average Reduced Frequency (ARF) of the words instead of the plain word frequency, which was
necessary to sort out words that are locally frequent but marginal from a broader point of view, i.e. word types that are only prominent in one or two texts and scarce in the rest of the sample. A detailed description of these metrics can be found in Chapter 7.
Some of the data we obtained is available in the appendix. Namely, there are words identified as loans, followed by an overview of names identified as Turkic.
Authors’ contribution: Jiří Milička prepared the raw data for the study, designed, implemented and described the method of keyword extraction, prepared the lemmatization and the metric. Petr Zemánek checked the raw data and formulated the vast majority of the text of the study.
We are grateful to a number of students and other co-workers who took part in the preparation of the data, namely Adéla Provazníková, Martina Hainová, and Michal Láznička.