The Challenge: AI That Understands India
That question became the spark for what they’re unveiling today—a framework that could fundamentally change how Indians access legal information.
The Awkward Truth About AI in India
As Pankaj pointed out during our conversations, this gap isn’t just a technical problem—it’s a business and equity problem. While global AI companies optimize for English-speaking users in developed markets, billions of Indians remain locked out of AI-powered solutions simply because their language, their legal system, and their problems aren’t part of the training data.
But here’s the thing: we can’t wait for perfect models to emerge organically. Citizens are getting wronged today. People are being scammed right now. And most of them have no idea what the law says about it.





Alt: How AI Can Empower Every Indian Citizen
Enter: InstructLab
What if there was a way to take an existing AI model and make it genuinely good at understanding Indian legal scenarios—without needing a team of 50 PhDs?
That’s the bet they’re making with InstructLab, an open-source framework developed collaboratively by IBM Research, IIT Bombay, and BharatGen. The core idea is elegant: democratize model alignment.
Watch It Work: Real Scenarios
During today’s presentation, Pankaj Singh walked us through live demonstrations of what this actually looks like in practice. His live narration highlighted not just the technical capability, but the human implications—how each improvement in the model translates into real empowerment for Indian citizens. Amit provided the technical deep-dives, explaining the architecture and the reasoning behind design choices.
Scenario 1: Property Damage
Scenario 2: The Legal Loophole Nobody Knows About
Here’s where it gets interesting. The model surfaced a clause that most Indians—including lawyers—don’t immediately recall: a child under seven years of age cannot be prosecuted for any crime, regardless of the act committed.
That’s not just information. That’s potentially life-changing legal knowledge, served instantly in plain language.
As Pankaj pointed out during the demo, this is exactly what breaks traditional legal gatekeeping—it empowers citizens with knowledge they would otherwise never discover, no matter how many legal forums they browse or how much they’re willing to pay for a lawyer’s initial consultation.
Amit added the technical context: “This nuance emerges because we’ve trained the model specifically on Indian legal scenarios, with domain experts validating edge cases. It’s not magic—it’s careful data work and alignment.”
Scenario 3: Breaking the Language Barrier
They asked the same query in Hindi: “Mere car ka sheesha kisi ne toda.”
And the system responded—in Hindi—with all relevant clauses also in Hindi.
Think about what this means. Legal access in India has historically been gatekept by English proficiency. With this framework, a farmer in Maharashtra, a shopkeeper in Tamil Nadu, or a homemaker in Odisha can understand what the law says about their specific situation, in their own language.
This multilingual capability wasn’t accidental—it was a deliberate focus. Pankaj championed this from a user accessibility standpoint: “If we’re building this for India, it has to work in Indian languages. Period.” Amit and the team then solved the technical challenges of making synthetic data generation work across Indic languages—a problem that’s far harder than it sounds.
As Pankaj noted, this single feature potentially unlocks legal access for hundreds of millions of Indians who’ve never had it before.
Scenario 4: The WhatsApp Scam (The One That Started It All)
Amit’s mother’s situation. Her sister-in-law pretending to be in crisis, asking for money via WhatsApp. A scam so common it’s almost ubiquitous in India.
The model pulled relevant sections from the Bharatiya Nyaya Samhita on impersonation and fraud. Yes, there’s more in the IT Act. Yes, this isn’t the complete legal picture. But for a citizen wanting to understand what just happened and what the law says about it, this is transformative.
How It Actually Works
InstructLab has two main components—one that Amit’s team architected, and one that Pankaj ensured would be usable by organizations beyond IBM:
1. Synthetic Data Generation Pipeline
This is where domain expertise meets machine learning. Instead of manually labeling thousands of examples, the system generates synthetic training data aligned with Indian legal scenarios, Indic languages, and cultural contexts. Amit’s team has been continuously improving this for languages like Hindi, Tamil, Telugu, and others—with Pankaj pushing for benchmarks that prove it actually works in production.
2. Alignment Pipeline
Why This Matters Beyond Law
They picked legal access as their initial use case because it’s urgent and universally relatable. But the framework extends far beyond law.
Imagine aligned models for:
- Healthcare guidance in Indian languages
- Agricultural advisory tailored to regional climates
- Financial literacy for underbanked populations
- Tax and compliance guidance for small businesses
The bottleneck was never can we build these. It was always can we make it accessible to teams that don’t have massive budgets.
This is where Pankaj’s vision for scalability and Amit’s technical roadmap converge: building infrastructure, not just proof-of-concepts. The goal is to make InstructLab the default way teams in India align AI models for their specific contexts.
InstructLab answers that question.
This Is Just the Beginning
Here’s what they’re not claiming: this is perfect. Legal guidance from an AI model should never replace actual legal counsel. Their demo came with that disclaimer, and it bears repeating.
What they are saying is that this makes it exponentially easier for Indians to understand their legal rights, in their language, without having to navigate the labyrinth of finding and affording a lawyer for basic questions.
And critically—they’re open-sourcing this. The community can contribute. Organizations can build on it. The framework gets better collectively, not just at BharatGen or IBM or IIT Bombay, but across the entire Indian tech ecosystem.
Pankaj’s final point on this was powerful: “Open-sourcing isn’t altruism. It’s pragmatism. The best ideas will come from communities we haven’t even thought of yet. Our job is to build the foundation and get out of the way.”
What's Next?
Because the real value isn’t in what BharatGen, IBM, or IIT Bombay builds alone.
It’s in what we build together.
The conclave reinforced a clear national consensus:
Source: PM Modi LinkedIn


