Senior Linguist (Text LLM – Model Evaluation)

Build India’s sovereign AI stack for a billion people and shape the future of technology

Job Summary

The Text LLM Model Evaluation Lead will own the end-to-end process of evaluating BharatGen’s text-based large language models. You will design human evaluation frameworks, test sets, rubrics, and metrics that assess model outputs across multiple tasks and languages. Working closely with ML engineers, linguists, and data operations, you’ll ensure that every model iteration is measured with rigor, fairness, and linguistic precision.

Key Responsibilities

Model Evaluation Design:
Design and manage evaluation frameworks for BharatGen’s Text LLM, covering diverse tasks such as summarization, dialogue, question answering, reasoning.
Define evaluation dimensions (coherence, factuality etc.).
Develop human evaluation rubrics, task-specific test sets for multiple languages.
Establish evaluation workflows using human judgment that complement automated metrics.
Create documentation, checklists, and SOPs to ensure replicability of evaluations across model versions.

Minimum Qualifications and Experience

Required Expertise

Proficiency in Python and agentic frameworks such as LangGraph, DSPy, AutoGen, CrewAI, etc.
Experience collaborating with ML or data science teams on evaluation or model analysis workflows.
Experience managing multi-language annotation or evaluation projects is preferred.
Experience with multilingual LLMs or Indian language NLP.
Exposure to instruction-tuning, safety evaluation, or RLHF workflows.
Familiarity with bias detection, toxicity analysis, or fairness evaluation.
Prior experience training or mentoring annotation/evaluation teams.