BharatGen Leads India’s Sovereign AI Mission with Multimodal Models for Indian Languages – PIB

Published By: BharatGen
BharatGen Sovereign AI Models initiative is India’s first government-supported effort building foundational AI models across text, speech and vision, tailored to Indian languages and societal needs, as highlighted by PIB.

BharatGen Sovereign AI Models initiative is India’s first government-supported effort building foundational AI models across text, speech and vision, tailored to Indian languages and societal needs, as highlighted by PIB.

BharatGen Sovereign AI Models: India’s Government-Backed Multimodal AI Initiative

BharatGen is the first government-supported national initiative to develop a range of sovereign foundational AI models tailored to Indian languages and societal contexts. It spans multiple modalities, including text (via Large Language Models), speech (Text-to-Speech and Automatic Speech Recognition), and vision-language systems.

Currently, BharatGen’s AI models support 15 Indian languages which include Hindi, Assamese, Bengali, Gujarati, Kannada, Maithili, Malayalam, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Sindhi, Tamil and Telugu. Soon, all 22 scheduled Indian languages will be covered.

BharatGen Sovereign AI Models initiative is India’s first government-supported effort building foundational AI models across text, speech and vision, tailored to Indian languages and societal needs, as highlighted by PIB.
Alt: BharatGen Sovereign AI Models for Indian Languages | PIB India

BharatGen has released domain-specific fine-tuned models for Ayurveda (Ayur Param), Indian agriculture (Agri Param) and Indian legal domain (Legal Param). In addition, all BharatGen models (text, speech and vision) are useful for applications across healthcare, agriculture, education and governance.

Two Technology Innovation Hubs, namely TIH Foundation for IoT and IoE, IIT Bombay and IITM Pravartak Technologies Foundation, IIT Madras, under the National Mission on Interdisciplinary Cyber-Physical Systems (NM-ICPS) of Department of Science and Technology (DST) are currently active as part of BharatGen network.

The following institutions are a part of the BharatGen consortium:

Institution Name
Role in BharatGen
Indian Institute of Technology, Bombayl
Lead institution, guiding research and integration across consortium partners
International Institute of Information Technology, Hyderabad
Vision-language document modeling
Indian Institute of Technology, Madras
Speech foundation model development and evaluation
Indian Institute of Technology, Kanpur
Legal AI research, domain-specific datasets, and developing tokenization strategies for multilingual models
Indian Institute of Technology, Hyderabad
Advanced tokenization and vocabulary optimization for large multilingual LLMs
Indian Institute of Technology, Mandi
Inclusive multilingual model development and research on efficient training strategies for LLMs
Indian Institute of Management, Indore
Bharat-centric evaluation and benchmarking of LLMs, multilingual and multimodal data collection

Related Post

Share:

Scroll to Top