The Moat Is the Dataset, Not the Model: Building an AI History Tutor That Runs Offline

Since January 2025, raw model performance has saturated. Every week there is a new benchmark, a new state-of-the-art, a new demo that trends for a day. Almost none of them have users. Almost none of them make money. That gap — between an impressive POC and a product someone pays for — is the whole game now.

So when I entered Google's Gemma 3n Impact Challenge on Kaggle, I didn't set out to build a demo for the judges. I set out to build the first version of a real product, for a real audience, with a defensible reason to exist. The hackathon was just a deadline. I've been teaching this exact discipline in my AI sessions since 2024, so I treated my own submission the way I tell engineers and founders to treat theirs.

This is the build. More usefully, it's the framework.

Think before building: the feasibility test

This is a framework I teach. In my AI trainings, before anyone learns product building, we learn what product-market fit is and how to test for it — before a line of code is written. The most expensive mistake in AI is building first and asking who it's for later. So when I entered this hackathon I ran the idea through the same feasibility test I teach my own students: a short workflow that decides whether it can work as a business, not just as a demo. Test before you spend. If an idea can't pass these, no amount of engineering saves it later.

Here's the test, evaluated step by step against History Study Buddy:

Feature — what is the one thing it does better than anything else? An AI history teacher that makes NCERT fun through role-play — not a general chatbot you have to interrogate. One sharp capability, not a feature list.
Audience size — who is it for, how many, and can I reach them? NCERT grades 6–12: ~12 crore students (6.5 crore in classes 9–12 alone, UDISE+ 2024–25), plus UPSC and state-exam aspirants. Large, specific, reachable.
Comparison — why would they switch from what they use today? Not by being a cheaper Vedantu or another lecture library. It's a complementary, different kind of learning — you don't watch a chapter, you live it (how, below). And the distinction is commercial, not just pedagogical: you only move an audience with a step-change, never a marginal gain. Think SpaceX — a slightly cheaper rocket would have moved no one; collapsing the cost of orbit by an order of magnitude moved the whole market. A different kind of learning is that step-change; a nicer lecture library isn't.
Defensibility — once it works, what stops someone copying it? A custom dataset that's genuinely hard to reproduce, served by a small offline model that's cheap to run. This is the whole thesis of the post — more below.
Monetization — does the math close without leaning on a discount? At India's price points the plan has to reach profit on its own economics, not on a coupon. If profitability depends on a discount, the product is weak — a real 10× doesn't need one.

Pass every step and you have an idea worth building; fail one and you've saved yourself months. History Study Buddy cleared all five. So this isn't a checklist I ran once — it's the same test, run harder, that later pushed my next build, Aitihasika, to pivot its audience from school students to UPSC aspirants before a single asset was generated. Think before building.

The model is a replaceable abstraction. The dataset is not.

Most people building on LLMs obsess over the model. Which base? Which size? Which provider? It's the wrong fixation. The base model is a replaceable abstraction — Gemma 3n today, Qwen tomorrow, a paid API the day after. You can swap it in an afternoon. Nobody has a moat there.

The moat is the IP you build on top — and for a vertical product, that IP is a domain dataset nobody else has. So that's where I spent the effort.

The training approach: not another course — history you discover

This part is personal. Somewhere along the way, we made all of education boring. Science got reduced to memorization — and history, which is nothing but stories, got reduced to dates to mug up the night before an exam. I never accepted that. I love learning, and since 2016 I've carried a make-learning-fun mindset into every room I've taught in — across a decade of training, that's 300,000+ developers. I'm a developer and a teacher, both. A great teacher doesn't just impart facts; they weave narratives — turning dry dates into quests, and turning students who are merely educated into ones who are enlightened. That conviction is what I built into this product — starting with the dataset itself.

Because every existing product teaches history the same way: a lecture, some notes, a bank of MCQs. That's telling. I wanted the opposite — one principle, taken straight from my master prompt: "show history through their eyes, not tell it" — make the facts feel discovered, not listed. The model isn't trained to answer history questions. It's trained to let a student live the chapter.

Hand-authoring 50,000 such examples is impossible, so I used meta-prompting: a master prompt that instructs a strong LLM to act as a "master historical storyteller and educational narrative designer" and turn each NCERT chapter into an immersive, multi-character role-play. Every chapter follows the same five-part structure:

Meet the characters — two or three people from different social backgrounds (a potter, an overseer's daughter, a soldier)
Set the scene — an atmospheric paragraph: the sights, sounds, and tensions of the period
The story unfolds — a first-person narrative where dialogue and conflict reveal the bolded key terms in context, never as a list
Mission log — a short recap of what was just experienced
The crossroads — a role-play dilemma that forces the student to choose

Here's what that produces. In the Harappan chapter you're Amri, a young potter from Mohenjodaro's Lower Town, carrying a painted pot up to the Citadel. The overseer's daughter hands you a red carnelian bead "from Lothal," shows you a steatite seal carved with a unicorn and a script no one can read, and walks you past the Great Bath. A guard mutters that fewer ships are arriving from Meluhha and the river is lower than it used to be. You've just absorbed urban planning, long-distance trade, the undeciphered script, and the theories of decline — without being taught any of them. Then the crossroads: a trader offers your father a faster potter's wheel from Mesopotamia — double the output, but the elders warn it breaks tradition. Do you take it?

And Amri isn't hardcoded. He's one instance, not a fixed script — every time a student opens the chapter, the fine-tuned model generates a fresh cast and a new story: a different character, a different journey through the same key terms. No two sessions are alike. The five-part structure stays constant; the characters and plot are generated live, every time.

That nesting isn't decorative. It forces the model through a chain of reasoning — character → scene → narrative → recap → dilemma — far richer than flat Q&A pairs, and it teaches the tutor to teach in scenes. And it doesn't only narrate — it turns the question back on you. Ask why the Indus cities built grand drains and public baths but left almost no trace of kings or wars, and it won't just answer; it asks what that tells you about what the society valued. That Socratic turn builds critical thinking, not recall — courses test what you remember; this trains how you think.

How much data this takes

This story-shaped dataset is the moat. I've seeded it with 2,600 manually vetted examples across 88 NCERT chapters, grades 6 to 12, with a roadmap to 50,000+ — at which point it becomes the largest curated dataset of its kind, and the real asset of the company. You don't need a million rows to begin: a rule of thumb worth internalizing is that 5,000–10,000 expert-annotated examples is enough to start a serious prototype. Which is also why domain experts aren't optional — the fastest prototype comes from a domain expert and an AI engineer side by side: the expert supplies judgment the model can't, the engineer the leverage the expert can't.

Fine-tuning over RAG — a deliberate choice

With the dataset built, the next stage is standard: take the capability of a larger model and transfer it into a smaller one. There were two routes — retrieval (RAG) or fine-tuning. I chose fine-tuning, because I wanted the model to internalize behavior and nuance — how to narrate, how to stay in character, how to make history feel alive — not just look facts up at query time. RAG is a lookup. I wanted a teacher.

The technical stack, kept deliberately boring and reproducible:

Base model: Gemma 3n (2B-it)
Method: QLoRA via unsloth — only adapter weights train, the base stays frozen, no catastrophic forgetting
Epochs: 3 — enough to learn the style, few enough to avoid rote regurgitation
Hardware: two scripts, one CPU-local for testing and one Colab-GPU (fp16, pinned memory, larger batches, auto-save to Drive against session disconnects)

unsloth cuts memory and roughly triples training speed, which is what makes this viable on free Colab. None of this is exotic — and that's the point. The clever part was the dataset; the training is plumbing.

Shipping it: offline-first, sub-500ms, no internet required

The product is a Next.js front end talking to a locally running Ollama instance over an /api/chat route, with the fine-tuned Gemma 3n serving responses in under 500ms. A keyword fallback keeps the pipeline demonstrable even if Ollama isn't running.

Running offline isn't a limitation here — it's a feature that maps directly to the defensibility row of the framework. 60% of rural Indian students lack reliable internet. An on-device model means no data leaves the device, no per-query API bill, and deployment by USB drive or local network. It's cheap to serve at India scale precisely because there's no cloud GPU in the loop. The thing that makes it defensible (custom data + small offline model) is the same thing that makes it cheap.

Why this matters beyond one hackathon

India doesn't need more AI demos. It needs profitable AI products — built on models like Gemma 3n, fine-tuned for a specific job, served free or cheap, at quality. The whole stack to do this now exists in the Gemini and Claude ecosystems; the constraint isn't capability, it's the discipline to pick an audience and build a moat.

And the economics need honest framing. At India's price points, this is a 10–20× return business, not a 100× one — revenue in year one, profit in year two. That's not a weakness. It's a real company serving real students. Capital that prices every market like Silicon Valley will keep missing it.

The lesson I keep coming back to: don't fall in love with the model. Build the dataset nobody else has, and make it cheap to serve. That's the part competitors can't copy in a weekend. And design the feature, from day one, so that it carries defensibility and productization potential — not a demo you abandon when the hackathon ends, but a first version you can keep compounding.

That's exactly the trajectory this is on. It started as a hackathon project built on 7 August 2025, evaluated from the very start through the product lens above. It didn't stay a submission — the hackathon took it from a pre-seed idea to an actual seed-stage product. I'm building it now, and I'll keep refining it over time, accumulating features chapter by chapter, dataset by dataset, toward something students open every day. The hackathon wasn't the finish line. It was version one — and I'm still building.

The framework above — Feature → Audience → Comparison → Defensibility — is the spine of the AI Product-Building workshop I run through Purna Medha. We teach engineering teams and founders to ship products, not POCs: how to pick a defensible audience, build domain IP, and fine-tune small models that are cheap to serve. If your team keeps building demos that never reach users, that's exactly the gap we close.

Project: History Study Buddy — Kaggle submission (7 August 2025). Built for Google's Gemma 3n Impact Challenge, Kaggle.