BharatGen Datasets

Patram 7B by BharatGen is India’s first document foundation model, built for visual document understanding. It combines a vision transformer with a 7B-parameter language model to analyse images and documents, supporting tasks like visual question answering and document analysis.

Our datasets cover legal, financial, agricultural and Ayurvedic knowledge, mental health question answering from medical literature, low and extremely low-resource Indic language benchmarks from official examination corpora and more. All datasets are openly available for research.

No.
BharatGen Model Title
View Details
1.
BharatGen : MHQA Dataset
2.
AVVP Audio Visual Video Parsing (GIVE TEXT ONLY)
3.
A Benchmark and Dataset for Post-OCR text correction in Sanskrit
4.
MALTA
5.
Math Word Problems in Hindi and English (GIVE TEXT ONLY)
6.
Multilingual Table Detection (MTD)
7.
S3VQA (Select Substitute and Search for Open Domain Visual Question Answering)
8.
Saamayik-master
9.
Vāksañcayaḥ - Sanskrit_ASR_Corpus
10.
DictDis DataSet
11.
RUDDER_DATASET
12.
CATALIST
13.
QA Dharmapal
14.
LexGen
15.
Indic Q&A Benchmark
16.
IKSwiki If it is Dharamawiki then
17.
Bhashabench-Krishi

Application Process

Current Openings

Gen AI Engineer

Research & Development

Design and implement advanced generative AI models optimized for Indian languages and cultural contexts.

LLMs PyTorch Transformers NLP Python

AI Stack Engineer

Engineering

Build and maintain scalable AI infrastructure, MLOps pipelines, and deployment systems.

Kubernetes Docker MLOps Python AWS

Linguist Manager

Language Research

Lead linguistic research and development for Indian languages, dialects, and cultural nuances.

Linguistics NLP Team Management Research Indic Languages

Gen AI Engineer

Research & Development

Design and implement advanced generative AI models optimized for Indian languages and cultural contexts.

LLMs PyTorch Transformers NLP Python

AI Stack Engineer

Engineering

Build and maintain scalable AI infrastructure, MLOps pipelines, and deployment systems.

Kubernetes Docker MLOps Python AWS

Linguist Manager

Language Research

Lead linguistic research and development for Indian languages, dialects, and cultural nuances.

Linguistics NLP Team Management Research Indic Languages

AI/ML Intern

Internship Program

Build and maintain scalable AI infrastructure, MLOps pipelines, and deployment systems.

Kubernetes Docker MLOps Python AWS
Current Openings

Please use the application form below to express your interest in joining BharatGen.

Apply Now
Ready to Join BharatGen?

Click below to open our application form in a new tab.

Contact Us

Contact Us

Scroll to Top