Introducing AgriParam-1: India’s First Agriculture-Focused Instruction-Tuned Foundational LLM

Published By: BharatGen

At BharatGen, we’re proud to unveil Param-1 (2.9B) an instruction-tuned foundational language model purpose-built for India’s agriculture domain. This model isn’t just a technical achievement, it’s a step toward democratizing AI for Bharat’s most foundational sector: Agriculture.

Architecture: A Lean Model with a Purpose

Param-1 is a 2.9B parameter decoder-only transformer model hosted on Hugging Face. Unlike general-purpose models, Param-1 has been fine-tuned from the ground up to understand, process, and generate agriculture-specific content tailored for Indian contexts, in both English and Hindi.

➡️ Model Card & Weights

Data Preparation: Grounding the Model in Indian Agriculture

We knew that building a truly useful model meant building the right data pipeline first. Here’s how we approached it:

17,000 open-source agricultural news articles and knowledge passages were scrapped.
Each passage was annotated with 5 diverse questions using an open-source LLM.
We constructed a comprehensive India-centric taxonomy spanning crops, weather, soil, pests, fertilizers, market conditions, and government schemes.
Created agricultural personas such as smallholder farmers, agri-extension officers, policymakers, etc.
From this framework, we generated 2 million QnA pairs, each grounded in taxonomy and persona contexts.
All content was translated into Hindi, ensuring bilingual robustness.
Finally, we generated 6 million multi-turn conversations, simulating real agri-dialogues in rural and advisory settings.

The result? A massively domain-aligned dataset that captures not just facts, but also the intent and context of agricultural communication in India.

Training at Scale: From Raw Data to Instruction Finesse

Training Param-1 was a large-scale, precision-engineered process:

Prompt Template: We designed a custom instruction prompt template optimized for multi-turn, context-rich inference.
Training Framework: Leveraged Hugging Face with torchrun for multi-node distributed training.
Training Samples: 12 million samples
Test Set: 1.2 million examples
Epochs: 3
Scheduler: Linear with warmup
Learning Rate:
- Base: 5e-6
- Min: 5e-7
Batch Configuration:
- Global Batch Size: 1024
- Micro Batch Size: 4
- Gradient Accumulation: 32
Tokens Added: <user>, <assistant>, <context>, <system_prompt> to enable clean instruction formatting
Vocabulary Size: 256k + 4

This setup ensured efficient convergence, high instruction alignment, and strong multi-turn memory essential for the conversational use cases we’re targeting.

Evaluation Strategy: Beyond BLEU and Accuracy

Standard LLM benchmarks often miss the mark for specialized domains. That’s why we’re evaluating Param-1 on two complementary tracks:

BhashaBench-Krishi (BBK): The Agricultural AI Gold Standard

We evaluate AgriParam-1 on BhashaBench-Krishi, India’s most comprehensive agricultural knowledge benchmark. Unlike generic benchmarks testing factual recall, BBK measures an AI’s ability to provide actionable, localized agricultural advice that Indian farmers actually need.

BBK’s Unique Value:

Authentic Content: 15,405 questions from 55+ real government agricultural exams (NABARD, ICAR, MP RAEO to PhD entrance)
Domain Depth: 25+ agricultural domains, 270+ topics covering soil science to government schemes
Bilingual Testing: English (12,648) + Hindi (2,757) questions matching AgriParam-1’s capabilities
Real-World Complexity: Easy/Medium/Hard questions plus diverse formats (MCQ, Assertion-Reasoning, Match-the-Column)

AgriParam-1’s BBK Advantages:

Domain Specialization: Superior performance on Agricultural Biotechnology, Plant Sciences, and Agronomy vs. generic models
Bilingual Consistency: Minimal English-Hindi performance gap unlike general LLMs (10-15% drops)
Contextual Understanding: Excels at medium-complexity questions mirroring real farmer queries
Regional Intelligence: Strong on Indian crop varieties, soil types, agro-climatic zones, and government schemes

What BBK Tests That Others Miss:

Policy Navigation: Government schemes and subsidy eligibility
Regional Adaptation: State-specific practices and crop calendars
Seasonal Intelligence: Kharif/Rabi/Zaid-appropriate advice
Practical Application: Scientific knowledge → actionable farming decisions

Why This Matters

Agriculture employs over 50% of India’s workforce, yet most LLMs today are optimized for global English-speaking audiences and generic use cases. Param-1 represents a shift in perspective: from global-first to Bharat-first, from general-purpose to sector-specialized.

Whether it’s powering Kisan helplines, chatbots for Krishi Vigyan Kendras, or voice assistants for rural India, Param-1 is our first step toward domain-native LLMs that serve real people with real needs.

📥 Want early access, partner with us, or contribute?
Follow our updates on Hugging Face

Got an issue or feedback? Our team is here to help.

Reach us at: