What is Estonian Interlanguage Corpus (EIC)?
The Estonian interlanguage (learner language) corpus of the Tallinn University is a collection of written texts produced by the learners of Estonian as a second or foreign language (L2). The corpus contains Estonian language proficiency examination writings as well as language course assignments, and texts written by secondary school students participating at the olympiad of Estonian as L2. The subcorpora also include writings of Russian both by native speakers and Estonian-speaking L2 learners, and a reference corpus of argumentative newspaper articles.
EIC features a user interface, multi-level annotation and tagging scheme, statistics module, and various language processing tools. The user interface allows for a versatile text search, based on subcorpora, textual features (e.g., language, proficiency level, genre), author meta-data (e.g., first language, age, gender, social status) and error types.