Code-Switching in Tunisian Arabic: A multifactorial Random Forest Analysis
Published in Corpora, 2023
Recommended citation: Ben Youssef, Chadi & Stefan Th. Gries. (to appear). Code-switching in Tunisian Arabic: A multifactorial random forest analysis. Corpora. http://cbyoussef.github.io/files/2023_CBY-STG_CSinTunAr_Corpora.pdf
Abstract
This paper explores the morphosyntactic and cognitive principles influencing code-switching (CS) from Tunisian Arabic to French. We annotate data from the TuniCo corpus for many variables and run a Random Forest to overcome the methodological challenges typically associated with low-resource languages and imbalanced data. We find CS is not affected by any factor in isolation, but by a constellation of interactions. Our results partially confirm previous findings: (i) to maintain the code-integrity at the phrase and discourse levels, speakers tend to switch dependent parts-of-speech when the latter’s head is switched; (ii) NPs are a prime location for CS; and (iii) speakers are attuned to the cognitive load they impose on themselves and/or on listeners..