Demystifying Legal Access: How AI Can Empower Every Indian Citizen

Published By: BharatGen
InstructLab AI Legal Access India BharatGen IBM Amit Singhee Pankaj Singh

The Challenge: AI That Understands India

Today, we’re thrilled to share a groundbreaking initiative that brings together two visionary leaders from different corners of India’s AI ecosystem.
Amit Singhee, Director of IBM Research, brings deep technical expertise in generative AI and a relentless focus on making AI work for India’s unique context. His curiosity about real-world problems—like what the law says when you’re scammed—drives the innovation behind InstructLab.
Pankaj Singh, VP of Business and Data at BharatGen, is the pragmatist who ensures brilliant ideas don’t stay confined to research papers. With a sharp eye for scalability, user adoption, and business impact, Pankaj has been instrumental in translating technical innovation into a framework that actual organizations can use. His perspective on how AI solutions reach the people who need them most has shaped every aspect of this project.
Together with teams from IBM Research, IIT Bombay, and BharatGen, they’ve spent the last six months building something that could fundamentally change how Indians access legal information.
Three days before presenting at this year’s AI Alliance summit, Amit got a panicked message. His mother’s sister-in-law had texted her on WhatsApp claiming to be in trouble and asking for money urgently. It was a scam—but it got Amit thinking: What would most Indians do in this situation? And more importantly, what does the law actually say about it?

That question became the spark for what they’re unveiling today—a framework that could fundamentally change how Indians access legal information.

The Awkward Truth About AI in India

Let’s be honest. If you’ve used an AI model trained primarily on Western data to understand Indian legal nuances, you know the feeling. It’s like asking someone who’s never been to India to explain local property laws. They’ll give you something, but it’ll be riddled with irrelevant clauses, cultural misunderstandings, and frankly, noise.

As Pankaj pointed out during our conversations, this gap isn’t just a technical problem—it’s a business and equity problem. While global AI companies optimize for English-speaking users in developed markets, billions of Indians remain locked out of AI-powered solutions simply because their language, their legal system, and their problems aren’t part of the training data.

But here’s the thing: we can’t wait for perfect models to emerge organically. Citizens are getting wronged today. People are being scammed right now. And most of them have no idea what the law says about it.

Alt: How AI Can Empower Every Indian Citizen

Enter: InstructLab

What if there was a way to take an existing AI model and make it genuinely good at understanding Indian legal scenarios—without needing a team of 50 PhDs?

That’s the bet they’re making with InstructLab, an open-source framework developed collaboratively by IBM Research, IIT Bombay, and BharatGen. The core idea is elegant: democratize model alignment.

Amit’s insight was the technical foundation—how do we make model alignment accessible. Pankaj’s insight was equally crucial—how do we ensure this actually gets built and used in the real world.
Instead of organizations hoarding the ability to fine-tune AI models, they’re building tools that let anyone—domain experts, community organizations, even solo entrepreneurs—take a base model, feed it domain-specific data, and create something genuinely specialized.

Watch It Work: Real Scenarios

During today’s presentation, Pankaj Singh walked us through live demonstrations of what this actually looks like in practice. His live narration highlighted not just the technical capability, but the human implications—how each improvement in the model translates into real empowerment for Indian citizens. Amit provided the technical deep-dives, explaining the architecture and the reasoning behind design choices.

Scenario 1: Property Damage

Imagine neighbors from your building deliberately damage your car. You want to know: What does the law say?
When they fed this query to the aligned model, it returned ranked, relevant clauses from the Bharatiya Nyaya Samhita. Each result was contextually appropriate—no noise, no irrelevant offences.
Compare that to the unaligned model. Same query. Different result. It pulled clauses about “offences affecting the human body” and religious damages—information that’s technically in the legal system but completely irrelevant to your situation.
This is where Pankaj’s observation became sharp: “Look at what just happened. The unaligned model gives you information, but it doesn’t help you. It’s noise. An ordinary citizen would give up. With the aligned model, they have answers in seconds.” That’s the problem they’re solving.

Scenario 2: The Legal Loophole Nobody Knows About

Then they tweaked the scenario. What if it wasn’t deliberate? What if children playing nearby broke your car’s glass?

Here’s where it gets interesting. The model surfaced a clause that most Indians—including lawyers—don’t immediately recall: a child under seven years of age cannot be prosecuted for any crime, regardless of the act committed.

That’s not just information. That’s potentially life-changing legal knowledge, served instantly in plain language.

As Pankaj pointed out during the demo, this is exactly what breaks traditional legal gatekeeping—it empowers citizens with knowledge they would otherwise never discover, no matter how many legal forums they browse or how much they’re willing to pay for a lawyer’s initial consultation.

Amit added the technical context: “This nuance emerges because we’ve trained the model specifically on Indian legal scenarios, with domain experts validating edge cases. It’s not magic—it’s careful data work and alignment.”

Scenario 3: Breaking the Language Barrier

They asked the same query in Hindi: “Mere car ka sheesha kisi ne toda.”

And the system responded—in Hindi—with all relevant clauses also in Hindi.

Think about what this means. Legal access in India has historically been gatekept by English proficiency. With this framework, a farmer in Maharashtra, a shopkeeper in Tamil Nadu, or a homemaker in Odisha can understand what the law says about their specific situation, in their own language.

This multilingual capability wasn’t accidental—it was a deliberate focus. Pankaj championed this from a user accessibility standpoint: “If we’re building this for India, it has to work in Indian languages. Period.” Amit and the team then solved the technical challenges of making synthetic data generation work across Indic languages—a problem that’s far harder than it sounds.

As Pankaj noted, this single feature potentially unlocks legal access for hundreds of millions of Indians who’ve never had it before.

Scenario 4: The WhatsApp Scam (The One That Started It All)

Amit’s mother’s situation. Her sister-in-law pretending to be in crisis, asking for money via WhatsApp. A scam so common it’s almost ubiquitous in India.

The model pulled relevant sections from the Bharatiya Nyaya Samhita on impersonation and fraud. Yes, there’s more in the IT Act. Yes, this isn’t the complete legal picture. But for a citizen wanting to understand what just happened and what the law says about it, this is transformative.

Pankaj framed it perfectly: “Three days ago, this happened to Amit’s mother. Tomorrow, it’ll happen to someone watching this presentation. The day after, to someone else. This tool doesn’t replace a lawyer, but it gives ordinary people a fighting chance to understand what happened to them.”

How It Actually Works

InstructLab has two main components—one that Amit’s team architected, and one that Pankaj ensured would be usable by organizations beyond IBM:

1. Synthetic Data Generation Pipeline

This is where domain expertise meets machine learning. Instead of manually labeling thousands of examples, the system generates synthetic training data aligned with Indian legal scenarios, Indic languages, and cultural contexts. Amit’s team has been continuously improving this for languages like Hindi, Tamil, Telugu, and others—with Pankaj pushing for benchmarks that prove it actually works in production.

2. Alignment Pipeline

Once you have quality training data, this pipeline tunes your base model. They started with EM25, a capable multilingual retriever, but the framework works with any base model. The result: a model that understands Indian legal contexts.
Pankaj’s contribution here was critical: ensuring the alignment pipeline is simple enough that teams without PhDs can use it, yet powerful enough to deliver production-grade results.

Why This Matters Beyond Law

They picked legal access as their initial use case because it’s urgent and universally relatable. But the framework extends far beyond law.

Imagine aligned models for:

The bottleneck was never can we build these. It was always can we make it accessible to teams that don’t have massive budgets.

This is where Pankaj’s vision for scalability and Amit’s technical roadmap converge: building infrastructure, not just proof-of-concepts. The goal is to make InstructLab the default way teams in India align AI models for their specific contexts.

InstructLab answers that question.

This Is Just the Beginning

Here’s what they’re not claiming: this is perfect. Legal guidance from an AI model should never replace actual legal counsel. Their demo came with that disclaimer, and it bears repeating.

What they are saying is that this makes it exponentially easier for Indians to understand their legal rights, in their language, without having to navigate the labyrinth of finding and affording a lawyer for basic questions.

And critically—they’re open-sourcing this. The community can contribute. Organizations can build on it. The framework gets better collectively, not just at BharatGen or IBM or IIT Bombay, but across the entire Indian tech ecosystem.

Pankaj’s final point on this was powerful: “Open-sourcing isn’t altruism. It’s pragmatism. The best ideas will come from communities we haven’t even thought of yet. Our job is to build the foundation and get out of the way.”

What's Next?

They’re opening InstructLab for the community. If you’re building in the legal space, healthcare, finance, agriculture, or any domain where Indian context matters—they want to hear from you.
The framework is there. The synthetic data pipeline is there. What they need now is an ecosystem of builders, domain experts, and organizations who see this not as their project, but as our collective infrastructure.

Because the real value isn’t in what BharatGen, IBM, or IIT Bombay builds alone.

It’s in what we build together.

The conclave reinforced a clear national consensus:

Source: PM Modi LinkedIn

Related Post

Share:

Scroll to Top