Senior Linguist (Speech)
Build India’s sovereign AI stack for a billion people and shape the future of technology


Job Summary
The Speech Linguist Lead will own both pretraining data quality validation and model output evaluation for BharatGen’s speech technology efforts. You will design validation frameworks for large-scale ASR/TTS datasets, define linguistic and acoustic quality standards, and evaluate model outputs for intelligibility, fluency, and naturalness. This role requires a linguist or speech technologist who can bridge linguistic theory, acoustic data understanding, and operational execution — collaborating closely with ML engineers, data collection teams, and freelance linguists.
Key Responsibilities
- Speech Data Quality & Pretraining Validation:
- Define and operationalize quality standards for large-scale speech datasets across multiple Indian languages.
- Validate audio-text alignment, transcription accuracy, phonetic coverage etc. using structured sampling strategies.
- Establish quality review loops for vendor- or freelancer-generated ASR and TTS data.
- Recommend data cleaning, filtering, or balancing strategies to ML teams for improving pretraining corpora.
- Model Evaluations:
- Design and operationalize evaluation frameworks for speech models — covering recognition and generation.
- Define metrics and rubrics for intelligibility, fluency, pronunciation accuracy, prosody, naturalness, and contextual appropriateness.
- Develop and maintain human evaluation pipelines, including listening tests, MOS (Mean Opinion Score) surveys, and structured rating rubrics.
- Implement inter-annotator agreement tracking, quality audits.
- Process Design & Collaboration:
- Collaborate with the Data Operations Manager to structure validation and evaluation workflows across multiple languages.
- Train and mentor linguists and annotators on speech data quality and evaluation standards.
Minimum Qualifications and Experience
- Master’s or PhD in Linguistics/Computational Linguistics with a focus on speech with 3+ years of experience in speech technology projects (ASR/TTS)
Required Expertise
- Experience in designing or supervising speech annotation or evaluation workflows.
- Experience collaborating with ML teams to interpret or improve model performance.
- Experience in designing or evaluating ASR or TTS systems in Indian languages.
- Familiarity with speech corpus design — phoneme balancing, prosody control, or expressive speech.
- Prior work with human evaluation frameworks (listening tests, MOS, A/B testing).
