Prompt Engineering
Zero-shot, few-shot, chain-of-thought, ReAct — the patterns that consistently improve LLM outputs. With real before/after examples for every technique.
The same LLM gives completely different answers to the same question depending on how the question is phrased. Prompt engineering is the discipline of phrasing questions to get reliably correct answers.
An LLM is a function that maps text to text. The input is the prompt. The output quality depends almost entirely on the prompt quality. A vague prompt produces a vague answer. A specific, structured prompt with context, examples, and output format constraints produces a specific, structured, correct answer.
This is not about tricks or jailbreaks. It is about understanding how LLMs process instructions and giving them what they need to perform well: role context, task clarity, examples of desired output, constraints on format, and explicit reasoning instructions for complex tasks. Every pattern in this module has been tested in production NLP systems across Indian tech companies.
You hire a brilliant new analyst at Razorpay. On day one you ask: "analyse the data." They stare at you. Which data? What kind of analysis? What format should the output be? The analyst is capable — your instruction was the problem.
A good manager says: "Analyse last month's payment failure rates by city. I need a table with city, failure rate, and top failure reason. Flag anything above 5%. Here is an example of what I expect: [example]." Same analyst, dramatically better output. That is prompt engineering.
Zero-shot vs few-shot — when examples make all the difference
Zero-shot prompting gives the LLM a task with no examples — just a description of what to do. It works for common, well-defined tasks where the LLM has strong priors. Few-shot prompting adds 2–5 examples of (input, desired output) pairs before the actual query. The model infers the pattern from the examples and applies it.
Few-shot is dramatically more effective than zero-shot for tasks with specific output formats, domain-specific terminology, or nuanced classification boundaries that are hard to describe in words. At Swiggy, classifying complaint severity (P1/P2/P3) requires the exact boundary definition — examples teach it faster than descriptions.
Chain-of-thought — tell the model to think before answering
Chain-of-thought (CoT) prompting asks the LLM to show its reasoning step by step before giving the final answer. This dramatically improves performance on tasks that require multi-step reasoning — maths, logic, policy interpretation, risk assessment. Without CoT, the LLM jumps directly to an answer and often gets complex reasoning wrong. With CoT, it works through the problem systematically.
Structured output — get JSON every time, not sometimes
Production systems need machine-readable output from LLMs — JSON that can be parsed, validated, and inserted into a database. Asking for JSON without enforcement produces JSON sometimes and prose sometimes. Three techniques make it reliable: explicit format instruction, a JSON example in the prompt, and output parsing with retry on failure.
ReAct — Reasoning + Acting — the pattern behind AI agents
ReAct (Reasoning + Acting) interleaves the LLM's reasoning with tool calls. The LLM thinks about what to do, calls a tool to get information, observes the result, then thinks about the next step. This loop continues until the LLM has enough information to answer. ReAct is the foundation of every AI agent — the pattern behind LangChain, LlamaIndex, and production agentic systems.
System prompts — set role, tone, constraints, and output format once
The system prompt runs before every user message. It sets the LLM's persona, constraints, output format, and domain knowledge once — rather than repeating instructions in every user prompt. A well-written system prompt is the single highest-leverage prompt engineering investment for any production application.
Every common prompt engineering mistake — explained and fixed
You can prompt any LLM effectively. Next: build LLMs that use tools autonomously to complete multi-step tasks.
Module 53 showed ReAct as a prompting pattern — manually implemented in Python. Module 54 covers LLM Agents properly: function calling (structured tool use), memory across turns, multi-agent coordination, and the frameworks (LangChain, LlamaIndex) that make building agents practical in production.
Function calling, memory, multi-agent coordination, and the architecture behind every production AI agent.
🎯 Key Takeaways
- ✓Zero-shot prompting works for simple, well-defined tasks. Few-shot adds 2–5 (input, output) examples for tasks with specific output formats, domain terminology, or nuanced boundaries. Use 3–5 diverse examples covering edge cases — not just typical cases.
- ✓Chain-of-thought (CoT) dramatically improves multi-step reasoning. Add "Let's think step by step:" to any complex prompt. For arithmetic, always verify with code — LLM arithmetic is unreliable in production. CoT is most valuable for policy interpretation, risk assessment, and constraint satisfaction.
- ✓Structured output requires three reinforcements: explicit "return ONLY JSON" instruction, a complete schema with field names and types, and a concrete example output. Set temperature=0. Always strip markdown fences before parsing. Add retry logic — resend with correction message on parse failure.
- ✓ReAct (Reasoning + Acting) interleaves LLM reasoning with tool calls. The loop: Thought → Action → Observation → repeat until Final Answer. Always set max_steps. Use stop=["Observation:"] to prevent the LLM from generating fake observations. Detect and break loops when the same tool is called with same args twice.
- ✓The system prompt is the highest-leverage prompt engineering investment. Set role, persona, output format, constraints, and escalation rules once in the system prompt rather than repeating in every user prompt. A well-crafted system prompt eliminates the need for most per-request instructions.
- ✓Prompt templates with named placeholders make prompts reusable, testable, and maintainable. Store templates separately from code. Version them like code. Test them with a diverse evaluation set before deploying. Small prompt changes can have large output effects — always A/B test prompt changes before full rollout.
Discussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.