The Surrey Lexical Splits Project

Many languages express different grammatical meanings through changes in word form, a phenomenon known as inflection. Often this is perfectly transparent: adding -s to a verb like ‘talk’ gives the third person singular present tense (s/he talks), and adding -ed gives the past tense (s/he talked). But sometimes the process is more involved, as with ‘go’, whose past tense went has a different form entirely. We say here that the word is “split” between two distinct stems. But splits can involve much more than just different forms. Russian verbs, for instance, display a present–past split in the categories of meaning that the forms represent. The present tense marks person, while the past tense marks gender, so that verbs are split in their use of the person and gender features. A lexical split, then, is any inconsistency in the paradigm of an inflected word. The Lexical Splits project, funded by the Arts and Humanities Research Council (AHRC), is a cross-linguistic investigation of such splits.

As part of this project, we developed the following open access resources: (1) a cross-linguistic database of lexical splits; (2) interactive visualisations of the verb paradigms of Chichimec (Otomanguean, Mexico); and (3) interactive visualisations of the verb paradigms of Skolt Saami (Finno-Ugric, Finland).

Surrey Lexical Splits Database

Chichimec verb paradigm visualisations

Skolt Saami verb paradigm visualisations

The Surrey Lexical Splits Database presents examples of lexical splits from around the world. Splits are represented at three levels of abstraction: (i) a surface realisation is a concrete example of a lexical split encountered in a language (and an example lexeme is always presented with these records); (ii) a component split is an abstract representation of the factor, or factors, that contribute to one or more surface realisations; (iii) a shared pattern identifies component splits that affect an identical set of cells. Viewing splits in this way allows users to explore the diverse ways that contributing factors interact with each other to produce different surface realisations. A detailed description of how to use and understand the database is provided on the database homepage.

The value of breaking down lexical splits into component parts is vividly illustrated on our two interactive websites, which present visualisations of a diverse sample of lexical splits from Chichimec and Skolt Saami. Understanding inflectional paradigms is all too often constrained by the medium in which they appear (e.g. static, black-and-white tables in journal articles), which can make it difficult to interpret the data or, worse still, can have the effect of obscuring interesting patterns and observations. Our visualisations of Chichimec and Skolt Saami verbs illustrate how technology can aid us in viewing complex linguistic data. Rows and columns can be rearranged by dragging-and-dropping and the whole paradigm can be transposed with a single click. To aid in finding patterns, cells can be grouped, coloured in, and highlighted in various ways. For complex data, as we find abundantly in Chichimec and Skolt Saami, individual layers of analysis can be viewed independently of the rest of the analysis, by sliding individual layers in and out. This allows researchers and speakers to build up a picture of the contribution each layer makes to the overall complexity of the paradigm, and to appreciate the elegance underlying these complex data.