EMIC
The English-Medium Instruction Corpus (EMIC) is a multimodal and interdisciplinary spoken academic corpus developed at Middle East Technical University (METU). It offers an extensive and principled collection of naturally occurring EMI classroom interactions across a wide range of disciplines, instructional formats at METU, Türkiye.
Spanning data collected between 2021 and 2025, EMIC includes transcriptions of over 90 hours of video-recorded EMI classroom interactions. The speech event types represented in EMIC include lectures, seminars, and active learning environments such as labs and studios, capturing the diversity of instructional discourse and pedagogical practices in real-time.
The corpus currently includes data from more than 30 departments across six major academic disciplines:
• Arts and Humanities
• Design and Architecture
• Educational and Applied Sciences
• Engineering and Technology
• Natural and Life Sciences
• Social Sciences and Management
These categories were defined through a cross-referencing of course codes, instructional practices, and disciplinary discourse norms, acknowledging the complexity and intersectionality of disciplinary classifications. (Işık-Güler, Turan, Şimşek-Tontuş & Köse, 2024)
With its comprehensive scope, EMIC is comparable to other available academic corpora (e.g., BASE, MICASE, ELFA, EmiBO) and differs in its recency and unique understanding of representativeness, interactivity, and multimodality. EMIC data in its entirety is transcribed using Jeffersonian Transcription Conventions. A selected portion is currently being annotated to include gestures, spatial movement, and other non-verbal modes (see EMIGeCo). Our approach to the transcription and annotation of the data enables not only corpus linguistic analyses but also multimodal conversation analysis (CA) of academic communication.
The EMIC corpus offers researchers and educators insights into:
• Real-time EMI instructional language and pedagogical discourse
• Disciplinary variations in EMI classroom interaction
• Patterns of student participation and lecturer questioning
• Turn-taking, word count, and interactional density
• Integration of multimodal teaching resources
Through its robust data architecture and interdisciplinary design, EMIC contributes to a deeper understanding of EMI discourse and provides an empirical foundation for pedagogy, policy, and further corpus-based research in multilingual higher education.
For the time being, only in-house access and use are possible. Please email emi@metu.edu.tr for questions and queries.