LLM Agents and Tool Use
Function calling, memory, multi-agent coordination, and the architecture behind every production AI agent.
A chatbot answers questions. An agent takes actions — it calls APIs, runs code, searches the web, writes files, and coordinates with other agents to complete multi-step tasks.
Module 53 showed ReAct as a prompting pattern — manually parsing tool calls from LLM text output. That works but is fragile. Modern LLM APIs support native function calling: you define tools as JSON schemas, the LLM returns a structured tool call object (not text to parse), you execute the function, return the result, and the LLM continues. No regex. No parsing failures. The LLM decides which tool to call, with which arguments, at each step of a multi-step task.
At Razorpay, an agent handling merchant disputes can: look up transaction details in the database, check the dispute deadline, draft a response email, send it via the email API, and update the CRM — all from a single natural language request from a support engineer. What took 20 minutes of copy-pasting across four tabs takes 30 seconds.
A chatbot is a knowledgeable advisor — they can tell you what to do but you have to do it yourself. An agent is a capable employee — you tell them what you want and they figure out the steps, use the right tools, and hand you the result. The difference is agency: the ability to act in the world, not just generate text about acting.
The key constraint: agents are only as reliable as the LLM driving them. Every tool call is an LLM decision — and LLMs make mistakes. Production agents need guardrails, confirmation steps for irreversible actions, and human-in-the-loop for high-stakes decisions.
Function calling — structured tool invocation without text parsing
Function calling lets you define tools as JSON schemas and pass them to the LLM alongside the user message. When the LLM decides to use a tool it returns a structured tool_call object — not text. You execute the function with the provided arguments, return the result as a tool message, and the LLM generates its next response informed by the result.
Agent memory — short-term, long-term, and semantic memory
A stateless agent forgets everything between conversations. Production agents need memory: what the user said earlier in this conversation, what this user has asked about in past sessions, and relevant facts retrieved from external storage. Three types of memory serve different purposes.
The full message history of the current conversation. Passed in every API call.
Facts about the user, preferences, and important outcomes from past sessions. Stored in a database.
Past conversations and documents stored as embeddings. Retrieve by semantic similarity.
Multi-agent coordination — orchestrator and specialist agents
Complex tasks exceed what a single agent can reliably handle. A multi-agent system uses an orchestrator agent that plans and delegates to specialist agents — each with its own tools, context, and expertise. The orchestrator never executes tools directly. It breaks the task into sub-tasks and routes each to the right specialist.
Guardrails — preventing agents from doing the wrong thing
Agents that can take real-world actions — send emails, call APIs, write to databases — need guardrails. Without them a hallucinating agent can send wrong emails to customers, corrupt database records, or call expensive APIs in loops. Three layers of protection are standard in production agent systems.
Every common agent mistake — explained and fixed
The NLP section is complete. Section 9 — Computer Vision — begins next.
You have completed the full NLP section: tokenisation, BERT, PEFT/LoRA, RAG, prompt engineering, and agents. Section 9 goes deeper into computer vision beyond the CNNs of Module 46 — image fundamentals, data augmentation, object detection with YOLO, and semantic segmentation. Every module builds directly on the deep learning foundation from Section 7.
How computers see images. Pixel values, colour channels, image tensors, normalisation, and the preprocessing pipeline every vision model expects.
🎯 Key Takeaways
- ✓An agent is an LLM that can take actions — call APIs, run code, write files — not just generate text. Function calling is the reliable way to implement this: define tools as JSON schemas, the LLM returns structured tool_call objects (not text to parse), you execute the function, return the result as a tool message, repeat.
- ✓The function calling message loop has four message types: user (question), assistant with tool_calls (LLM decides to call a tool), tool (result of the function execution), assistant without tool_calls (final answer). Pass all messages in every API call — the full history is the agent's working memory.
- ✓Three types of agent memory: short-term (conversation buffer — pass all messages each turn, compress when approaching context limit), long-term (persist key facts in a database across sessions, retrieve at session start), semantic (vector store of past conversations, retrieve by similarity to current query).
- ✓Multi-agent systems use an orchestrator that plans and delegates to specialist agents. The orchestrator never calls tools directly — it breaks the task into sub-tasks and routes each to the right specialist. Pass shared context between specialists so they do not contradict each other.
- ✓Production agents need three guardrail layers: tool-level validation (max call counts, logging, dry-run mode), confirmation for irreversible actions (send email, process refund, delete record — always require human approval), and output validation (check for hallucination signals before acting on agent output).
- ✓The biggest agent failure mode is irreversible actions based on hallucinated data. Classify every tool as reversible or irreversible. Queue irreversible actions for human review rather than executing synchronously. In development, always use dry_run=True. Never ship an agent that can take irreversible real-world actions without a confirmation step.
Discussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.