Centre of Indian Language Data (COIL-D)
COIL-D is a consortium-mode funded project led by IIT Patna with MIT Manipal and other national partners. The project develops language resources for Human Language Technology, including standards, guidelines, benchmarks, and resources for Machine Translation and NLP tools.
- MIT Manipal contributes parallel corpora for Dravidian languages: Kannada, Tamil, Malayalam, and Telugu.
- The broader language-technology ecosystem connects with open multilingual translation resources such as AI4Bharat's IndicTrans2.
- COIL-D datasets are also visible through the
coild-aikoshcollection on Hugging Face.