BharatGen Sovereign Multilingual AI India initiative gains momentum as Prof. Arnab Bhattacharya explains on DD News how India is building an indigenous, multilingual large language model rooted in Indian languages, legal systems and cultural knowledge.
India’s AI story is entering a decisive phase. The question is no longer whether India will use artificial intelligence. The real question is whether India will build it on its own terms.
In a recent DD interview, Prof. Arnab Bhattacharya from IIT Kanpur shared the vision behind BharatGen — a multilingual, India-centric large language model designed for the country’s linguistic diversity, legal systems, cultural heritage, and knowledge traditions.
BharatGen is not positioned as just another AI model. It is being built as a foundational ecosystem for India’s digital future.
A Multilingual Model for a Multilingual Nation
India is home to 22 scheduled languages and hundreds of dialects. Most global AI systems struggle with this level of diversity. BharatGen has taken this challenge head on.
The model initially began with Hindi and English. It is now available in 15 Indian languages and is expanding toward all 22 scheduled languages. Work is also progressing on tribal and unscheduled languages such as Santali.
The approach is strategic. Indian languages share deep structural similarities. By leveraging advances in computational linguistics, BharatGen can scale across languages without starting from scratch each time.
The long term goal is simple and powerful:
Every Indian should be able to interact with AI in their own language.
Built on India-Centric Data
Most global AI models are trained primarily on Western datasets. BharatGen is different.
Its training data includes:
- Indian literature
- Government documents
- Legal records
- Newspaper archives
- OCR digitized heritage content
This India-first data foundation ensures that the model understands Indian contexts, laws, governance systems, and cultural references far more accurately.
This is not just about translation. It is about contextual intelligence.
Transforming the Legal Ecosystem
One of the most compelling applications discussed was in the legal domain.
India follows a precedent-based legal system. Courts face massive case backlogs. Legal language is complex and inaccessible to many citizens.
BharatGen can support:
- Citizens by simplifying legal language
- Judges by summarizing case documents and extracting key points
- Lawyers by identifying relevant past precedents quickly
Importantly, the model does not replace human judgment. Decisions remain with judges. AI becomes an assistive intelligence layer that improves efficiency and reduces pendency.
This is where AI moves from hype to real public value.
Alt: AI Impact Summit: BharatGen के Prof. Arnab Bhattacharya से ख़ास बातचीत | AI Startups in India
Integrating the Indian Knowledge System
Another defining feature of BharatGen is the integration of India’s knowledge heritage.
Prof. Bhattacharya highlighted examples that reflect how generative thinking has long existed in Indian intellectual traditions:
- Panini’s Ashtadhyayi as a rule-based generative grammar system
- Baudhayana Sulba Sutra describing geometric principles
- The Virahanka series linked to what is known globally as the Fibonacci sequence.
- Historical texts describing early mechanical concepts
Many global AI systems do not adequately represent these contributions. BharatGen aims to ensure that India’s knowledge systems are preserved, referenced, and accessible in the AI age.
This is not about nostalgia. It is about intellectual balance.

Alt: AI in Open and Distance Education India
Sovereign AI: Control Over Data and Infrastructure
A central theme of the interview was Sovereign AI.
Sovereignty in AI means:
- Data remains under national control
- Infrastructure and servers are managed within India
- Research and model development are indigenous
- The ecosystem includes academia, industry, startups, and government
India’s population of 140 crore represents not just scale, but strength. The diversity of language, knowledge, and lived experiences provides one of the richest AI training environments in the world.
However, without ownership and infrastructure control, that strength cannot translate into leadership.
BharatGen is an attempt to build that foundation.
Preserving Oral and Folk Traditions
India’s cultural wealth is not limited to written texts. Many traditions are oral. Folk knowledge in villages often remains undocumented.
Through speech models and field recordings, BharatGen can help:
- Capture endangered languages
- Preserve tribal dialects
- Digitally archive oral heritage
This expands the idea of AI beyond productivity tools into cultural preservation.
AI becomes not just a technology layer, but a memory layer for the nation.
An Ecosystem, Not Just a Model
BharatGen did not emerge from a single institution. It began with academicians but is now envisioned as a national ecosystem.
Engineers, researchers, data annotators, startups, media, students, and policymakers all have roles to play.
The message from the interview was clear:
If India does not build its own AI systems, others will build them for us. And those systems may not reflect India’s priorities.
BharatGen is therefore both a technological initiative and a collective responsibility.
India’s Moment in AI
Global AI development is accelerating. Countries are investing heavily in models that reflect their economic and strategic priorities.
For India, the opportunity is unique.
With linguistic diversity, demographic scale, deep civilizational knowledge, and a strong technology base, the building blocks are already present.
BharatGen represents a step toward ensuring that India’s AI future is not imported, but built.
The journey is still unfolding. But the direction is clear.
India is not just adopting AI. India is shaping it.


