India is positioned at a rare crossroads of linguistic variety and swift technological acceptance. This mix is reconfiguring the global stage for artificial intelligence in terms of development, deployment, and even scaling. The 1.4 Bn+ population and the multiplicity of languages and dialects make the ‘Indian linguistic landscape’ a challenge as well as an opportunity for AI researchers to build a Human-Centric and Accessible AI system. The Multilingual Terrain: A Complex Dataset The open scale is the most noticeable characteristic of India’s rich diversity in languages: the country acknowledges 22 official languages and many more local and tribal languages.  Language surveys estimate that there are more than 400 different languages spoken on the subcontinent, each of them having its own unique syntax, writing system, and phonetic structure.  Richness in numerical terms is not matched with the digital presence of Indian languages; they still form a very small part of the global AI training data set. The online text documents generated in Indic languages are just about 0.1%, while English constitutes almost 59% of the total web content. This data imbalance has direct repercussions on AI: the models that are mainly trained on English corpora tend to perform poorly in low-resource…  ​Read MoreInc42 Media