Agents, MCP, skills, harnesses, orchestration, super agents, OpenClaw — the space moves faster than anyone can track. This page gives you the mental model that makes everything else click. Read this first, then explore the tools.
At the highest level, AI is shifting from chat to agents. This is the single most important concept to understand before looking at any tools.
A chat model is simple: you ask a question, it gives an answer. It's question → answer. You do the work. The AI helps you think.
An agent is different: you give it a goal, it figures out the steps, takes action, uses tools, checks its work, and keeps going until it reaches an outcome. It's goal → result. The AI helps you do.
The difference isn't asking "summarize this email" vs an AI summarizing it for you. The real shift is saying: "Review my inbox, identify urgent items, draft replies, pull meeting notes, generate a proposal, create a payment link, update my tracker, and send me the final summary." That's not intelligence — that's workflow execution.
Once you understand these eight components, you can make sense of any agent platform — and move between them easily.
The language model that does the reasoning. Claude, GPT, Gemini, Llama — this is the intelligence layer. Different models have different strengths: some are smarter, some are better at execution, some are cheaper. In practice, the model that reliably completes agentic tasks often beats the one that scores highest on benchmarks.
The "smartest" model isn't always the best — the one that consistently finishes tasks wins.The reason an agent keeps working until the task is done. Without the loop, it's just a one-shot response. The loop is what makes it agentic — sense, think, plan, act, repeat until the goal is met.
This is the difference between "answer my question" and "complete my task."The things an agent can use to act in the world: read email, create calendar events, search databases, write files, call APIs, browse the web, run code. Without tools, an agent can only think — not do.
MCP is the standard that connects agents to tools.What the agent knows about you, your business, your preferences, and the current task. Stored in files like CLAUDE.md or agents.md. Think of it as the onboarding document you'd give a new employee.
Good context makes simple prompts powerful. Bad context makes clever prompts fail.What the agent remembers across sessions. Enterprise-grade agents implement a hierarchical memory architecture inspired by human cognition: short-term memory (the current step and recent context — like your working RAM), episodic memory (what happened in this session — a conversation summary or task log), semantic memory (general domain knowledge retrieved via RAG from vector databases), procedural memory (learned skills and workflows), and long-term memory (accumulated preferences, past decisions, and patterns stored permanently). Without memory, every session starts from zero.
A memory.md file is the simplest pattern. Daily journal entries create searchable long-term recall. Hierarchical tiers are the production pattern.Reusable instruction sets that tell the agent exactly how to handle specific tasks. If you've ever spent 20 minutes guiding an AI through creating a proposal or analyzing data — that should be a skill, not repeated work.
Skills are Standard Operating Procedures for AI.The platform that brings everything together. Claude Code, Cursor, OpenClaw, Lovable, Bolt.new, Replit — these are all different harnesses. Different environments for the same core idea. A well-architected harness provides modularity, context-sharing between components, and clear interfaces.
Once you understand the concepts, you can move between harnesses easily.Surrounding all modules is an alignment and safety layer that ensures the agent stays within boundaries. This includes an AI constitution (standing policies and rules), content filtering on inputs and outputs, permission management and sandboxing, and safe tool-use interfaces. The principle: governance is designed into the architecture, not bolted on later.
Higher stakes = tighter leash. Calibrate autonomy to risk.Not all reasoning is equal. Modern agents use structured reasoning frameworks that match the complexity of the task. Chain-of-Thought (CoT) handles basic step-by-step problem solving. ReAct interleaves reasoning with tool use — think, act, observe, repeat. Tree-of-Thought explores multiple solution paths in parallel for branching problems. Graph-of-Thought handles the most complex scenarios with interconnected reasoning nodes. The practitioner's rule: start with the simplest technique that could work, and escalate only when evidence shows you need more.
Every agent — regardless of the platform — runs on a Sense → Think → Plan → Act loop. Inside that loop, four cognitive modules work together, supported by memory and wrapped in governance. This is Diagram 1.4.0 from The Agentic Enterprise Strategy.
Perceive — The agent takes in the user's request through multimodal input processing. It reads the text, scans the workspace for existing files, checks if there's a design reference or screenshot attached. Everything the agent can sense about the request and its environment gets processed here. The output: a structured understanding of what's being asked.
Reason — Now the LLM core does its work. It figures out what this means and what needs to happen. "The user wants a portfolio site. I'll need HTML, CSS, maybe React. There's no existing code, so I'm starting from scratch. I should use a clean layout with a hero section, project cards, and a contact form." The reasoning layer doesn't have a plan yet — it doesn't know the sequence or the tools. But it knows what needs to happen.
Plan — Planning takes the reasoning output and puts things in order. It knows what needs to happen — now it figures out how and in what sequence. Step 1: scaffold the project structure. Step 2: build the layout components. Step 3: add styling. Step 4: populate with placeholder content. Step 5: test in browser. The planner also selects which tools and skills to use — the code editor skill, the file creation tool, the browser preview tool.
Act — The agent executes the plan. Writes the HTML file. Creates the CSS. Generates the component structure. Each action calls a tool — file write, code execution, terminal command. The actions are the agent's hands doing the work the plan laid out.
Observe — This is the step most people miss, and it's where most agents fail. After acting, the agent inspects the result. It takes a screenshot of the rendered page. "The hero section looks good, but the project cards are overlapping on mobile. The contact form is missing a submit handler." The observation feeds back into the reasoning layer — now the agent has new information.
Memory — Throughout the entire loop, memory is working. It stores context after perception, retrieves relevant knowledge during reasoning, records the plan for tracking, logs every action outcome, and saves observations for the next iteration. If the agent loops back to fix the mobile layout, it remembers what it already tried — it doesn't start from scratch.
Loop back → The observation ("cards overlapping on mobile") triggers a new reasoning cycle. The agent reasons about the CSS fix, plans the specific changes, acts by editing the stylesheet, observes again — cards look good now. Loop complete. The Alignment & Safety layer governs every step of every loop — filtering inputs, constraining tool permissions, validating outputs. That full cognitive loop — Perceive → Reason → Plan → Act → Observe → Memory → Loop — is what separates an agent from a chatbot.
The problem: LLMs by themselves are limited. They don't know your private data, and they can't take actions in the world. They're brains without hands.
MCP (Model Context Protocol) is the solution. It's a standard translator between your agent and your tools. Your LLM speaks one language. Gmail, Slack, Notion, GitHub, Stripe, and databases all speak different languages. MCP translates so the agent can talk to all of them in a standard way.
The analogy: MCP is the USB of AI. Before USB, every device needed its own cable, port, and driver. USB created one universal interface. MCP does the same — one protocol that lets any model connect to any tool.
Governed by the Agentic AI Foundation (AAIF) under the Linux Foundation. Co-founded by Anthropic, OpenAI, and Block. No single company controls the plumbing of the agent era.
With MCP connected, an agent can: read your Gmail, create calendar events, check Slack alerts, search Notion, create Stripe payment links, open GitHub issues, query databases, interact with cloud services — all through a single standard protocol. One integration, works across every AI model.
The most common misconception: MCP and Skills are the same thing. They're not. MCP gives agents hands. Skills give agents expertise. They sit at different layers of the cognitive architecture — and understanding where each lives is the key to building agents that actually work in production.
MCP has two sides: the Server and the Client. An MCP Server exposes a tool — it's the adapter that sits in front of Salesforce, Gmail, a database, or any system and translates its capabilities into the MCP standard. An MCP Client is what the agent uses to discover and call those servers. When you connect Claude to a Salesforce MCP server, Claude is the client and Salesforce is the server. The beauty: any client works with any server. Build one MCP server for your CRM and every agent on every platform can use it.
Your agent can also be both a server and a client. As a client, it calls MCP servers to read data and take actions. As a server, it exposes its own capabilities to other agents or systems — turning your agent into a tool that other agents can use. This is how multi-agent systems compose: Agent A calls Agent B via MCP, which in turn calls a database MCP server. Each layer speaks the same protocol.
There are now over 10,000 pre-built MCP servers available — covering everything from Google Workspace and GitHub to Stripe, Notion, Jira, SAP, and cloud databases. For most integrations, you don't build the server from scratch. You configure an existing one with your credentials and connect it to your agent. For proprietary systems, building a custom MCP server is a well-documented process — you're essentially writing an adapter that translates your system's API into MCP's standard format.
contract-review.md says "extract obligation clauses, compare against approved library, flag deviations with severity ratings, produce redline summary"Look at the diagram above. The LLM is the brain at the center, running the Sense → Think → Plan → Act loop. Context feeds it what it needs to know (your business, your rules). Tools via MCP give it hands to act on the world — and your agent can be both a server (exposing capabilities) and a client (consuming tools). Skills encode your team's expertise so the agent gets better at repeated tasks. Memory persists learning across sessions. A2A lets agents delegate to each other. And the Guardrails layer wraps everything — no action without authorization, no output without validation. That's the complete agent system.
Prompt engineering is becoming less important than context engineering. Instead of writing one magical prompt every time, the better approach is to load your agent with the right context so your prompts can stay simple. Context engineering is the art and science of giving AI agents the right information at the right time — and it's emerged as the #1 job of engineers building AI agents.
If you hired a real executive assistant, you wouldn't expect them to do great work on day one without understanding your business, customers, tools, and preferences. Agents are the same — they need onboarding.
Stored in files like CLAUDE.md, agents.md, or similar context documents.
Old way: Spend 10 minutes crafting a perfect prompt every time you need something. New way: Spend an hour setting up your context once, then use simple one-line prompts forever. "Draft a proposal for the Acme deal" works perfectly when the agent already knows your pricing, tone, templates, and CRM data. Every piece of context must earn its place — curate ruthlessly, reinforce key points, and continuously monitor what the agent actually uses.
You've seen how MCP connects agents to tools and Skills encode expertise. The next question: where does all of this actually run? The agent landscape is splitting into two camps — and the breakout story of 2026 is leading the charge for one of them.
250,000+ GitHub stars in 4 months · Fastest-growing open-source project in history
Created by Peter Steinberger in November 2025 — originally called "Clawdbot" (a play on Claude), renamed "Moltbot" after Anthropic's trademark complaint, then "OpenClaw" in January 2026. Jensen Huang called it "probably the single most important release of software ever" and said OpenClaw is "the operating system for personal AI."
What it is: A free, open-source AI agent that runs locally on your machine and connects to your chat apps (WhatsApp, Telegram, Slack, Discord, iMessage) as its interface. It's NOT a language model — it's an agent runtime that wraps around any LLM you choose: Claude, GPT, Gemini, DeepSeek, or local models via Ollama. You text it "clear my inbox of spam and summarize urgent messages" — and it actually does it.
OpenClaw Is The 8 Building Blocks in Action
Everything we covered in Section 02 — the eight building blocks that make up every agent — OpenClaw implements all of them. This isn't theory. It's a working system you can download and run today.
Multimodal input via chat apps — WhatsApp, Telegram, Slack, Discord, iMessage. Text, images, files, voice notes. The agent perceives through whatever channel you text it.
Any LLM you choose — Claude Opus for orchestration, Sonnet for coding, GPT for research, DeepSeek for cost efficiency, Ollama for local. The brain is swappable.
Dual-layer, 100% local. Short-term: daily Markdown logs (memory/YYYY-MM-DD.md) — auto-loads today + yesterday. Long-term: curated MEMORY.md — organized knowledge base. SQLite + vector search for retrieval. No cloud. Your data stays on your hard drive.
Task queue system (TASK_QUEUE.md). Main agent decomposes goals into steps, assigns to sub-agents, tracks progress. Re-plans on failure. The cognitive loop in action.
MCP native — connects to 10,000+ MCP servers. Plus 50+ built-in integrations: Gmail, Calendar, GitHub, file system, terminal, browser, databases. The agent's hands.
100+ AgentSkills — same concept as Claude's SKILL.md. Install from registry, write your own, or let the agent generate skills from observed patterns. Self-evolving SOPs.
CLAUDE.md, agents.md, USER.md — context files that tell the agent your business rules, preferences, communication style. The onboarding that makes the agent yours.
Permission sandboxing, workspace access controls (read-only/write/none), skill vetting. NemoClaw adds OpenShell sandbox, privacy routers, and network guardrails for enterprise.
Layer 1 — Short-Term (Daily Logs): Every day, OpenClaw creates a Markdown file (memory/2026-03-20.md) and appends everything — conversations, decisions, preferences, task outcomes. It auto-loads today's log and yesterday's for immediate context continuity. Like a work notebook.
Layer 2 — Long-Term (Curated Knowledge): Important patterns, confirmed decisions, and repeated preferences get organized into MEMORY.md — a structured knowledge base the agent can reference anytime. This is the agent's institutional memory.
Retrieval: SQLite with vector search (sqlite-vec) + full-text search (FTS5). Hybrid BM25 + semantic retrieval finds relevant memories even when wording differs. No external database. No cloud. One .sqlite file on your disk.
The reality: OpenClaw launched with 512 security vulnerabilities (Kaspersky audit). Gartner called its design "insecure by default." Cisco found third-party skills exfiltrating data without user awareness. An agent created a dating profile without its owner's permission. China banned it from government systems.
What's built in: Workspace access controls (read-only, write, none), permission sandboxing for skills, session isolation, configurable tool access. The agent can be constrained to specific directories and specific MCP servers. But these controls are opt-in — the default is wide-open access.
NemoClaw's answer: NVIDIA wraps OpenClaw with OpenShell (sandboxed runtime), privacy routers (control data flow), network guardrails (limit what the agent can reach), and policy engines (enterprise rules enforcement). This is Stage 3 — Govern & Secure — applied to the local agent model.
The lesson: OpenClaw proves the concept works — 250K+ people running always-on AI agents from their laptops. It also proves why the Agentic Engineering discipline exists. The excitement is real. The governance gap is equally real. Don't skip Stage 3.
Build an MCP server for your CRM once — it works in OpenClaw on your laptop AND Claude in the cloud AND any future agent runtime. The choice of local vs cloud is a deployment decision, not an architecture decision. Most production teams will use both: local agents for development, sensitive data, and personal productivity; cloud agents for complex reasoning, enterprise orchestration, and always-on workflows at scale.
What People Are Actually Doing with OpenClaw
"Clear my inbox of spam, unsubscribe from newsletters, and summarize urgent messages." Agents processing thousands of emails while you sleep.
Main orchestrator (Opus) delegates coding to sub-agents (Sonnet). Ships features in 45 minutes that would take 6 hours solo.
Connected to Calendar, Notes, Reminders, Notion. Manages schedules, builds meal plans, tracks health metrics — all via WhatsApp.
Monitors news, builds knowledge bases from URLs, writes weekly research digests. Always-on via cron jobs.
Contract review, invoice processing, customer onboarding — the same O2C pattern you saw in the cognitive architecture, running locally.
When Agents Work Together: Five Coordination Patterns
Whether local or cloud, as you scale beyond a single agent, you need a coordination model:
One master agent assigns tasks to workers. Easiest to start with — clear control, easy governance. OpenClaw's default pattern.
No boss. Agents communicate via events on a message bus. More robust, no bottleneck. Harder to debug.
Shared workspace all agents can read/write. Agents contribute solutions when they can. Classic collaborative problem-solving.
Agents bid for tasks. Best-suited agent wins. Elegant for dynamic load balancing. Complex to implement.
Combines patterns. Top orchestrator delegates to sub-orchestrators managing their own teams. How most production systems work.
OpenClaw proved the concept — and exposed the risk. It launched with 512 security vulnerabilities. Gartner called its risks "unacceptable." Cisco found third-party skills performing data exfiltration without user awareness. China banned it from government systems. An agent created a dating profile and started screening matches without its owner's permission.
The fundamental principle hasn't changed: the more power an agent has, the more intentional you need to be about access, prompts, and workflow design. But the urgency has. With 250,000+ stars and people granting agents access to their email, calendar, files, and financial accounts — governance isn't a future concern. It's a today problem.
A well-governed agent operates on the principle of least privilege: it has access only to the data and tools necessary for its role. Governance isn't bolted on after the fact — it's designed into the architecture through the Guardrails layer you saw in the cognitive architecture diagram.
Ask: "What is the worst-case harm if this agent makes an unchecked wrong decision?"
In the Agentic Engineering lifecycle, Govern & Secure comes before Build & Integrate — not after. The teams that skip governance and jump straight to building are the ones who end up with agents that delete email libraries, exfiltrate crypto wallet keys, or create unauthorized dating profiles. The Toolkit has a Governance Policy Template with 32 pre-production gates and an Identity & Trust Template with 49 security controls. Use them before you deploy.
You've seen the building blocks, the cognitive loop, MCP, Skills, OpenClaw, and the security reality. The question now: which use case is right for your team today? Agent use cases follow a natural progression — and the teams that succeed are the ones that start at the right level, not the most exciting one. The nineteen out of twenty that fail? They jumped to Level 3 before mastering Level 1.
Read-only. Low risk. Start here.
Building blocks: Perception + Reasoning + MCP (data in only)
Autonomy: Full — agent reads and summarizes, never writes or acts
Read + Write. Medium risk. Add governance.
Building blocks: Full cognitive loop + Skills + MCP (data in AND commands out)
Autonomy: Supervised — agent drafts and executes, human approves high-stakes actions
Multi-agent. High complexity. Full governance required.
Building blocks: Everything — cognitive loop + Skills + MCP + A2A + Memory + Guardrails
Autonomy: Calibrated — different autonomy levels for different steps in the workflow
Before you pick a framework, before you install OpenClaw, before you connect a single MCP server — ask: which level is this use case? If it's Level 1, you can move fast with minimal governance. If it's Level 2, you need the Skills and approval workflows designed first. If it's Level 3, you need the full lifecycle — Justify, Architect, Govern, Build, Gate, Operate — and the operational playbooks from the Toolkit. The Use Case Discovery & Prioritization Workbook helps you make this assessment systematically.
If you remember one thing from everything we just covered, make it this:
An agent is an LLM wrapped in a cognitive loop (Sense → Think → Plan → Act → Observe → Repeat). Context feeds it what it needs to know. Memory gives it continuity across sessions. Skills encode your team's expertise into the Plan layer. MCP connects it to the outside world — reading data in during Sense and sending commands out during Act. A2A lets agents delegate to each other. And Guardrails wrap everything — no action without authorization.
Whether it's OpenClaw on your laptop or Claude in the cloud — whether it's a Level 1 research agent or a Level 3 multi-agent O2C system — the architecture is the same. Once you see this structure, every new tool, framework, and platform you encounter is just a different implementation of these same building blocks.
That's the mental model. That's the noise filter. When someone pitches you a new agent product tomorrow, ask: which building block is this? Where does it sit in the cognitive loop? What MCP servers does it need? What Skills guide it? What's the governance model? If they can't answer, the product isn't ready. If you can't ask, the mental model isn't there yet. Now it is.
Every failed agent project I've seen started with someone picking a tool — LangGraph, CrewAI, Bedrock, OpenClaw — before they had this mental model. They built without understanding where their use case sat on the maturity curve. They skipped governance. They didn't design the observation step in their cognitive loop. They didn't think about which Skills to encode or which MCP servers to connect.
The mental model comes first. The tools come second. And that's exactly where we're going next — the complete builder's stack, 30+ tools across 7 layers, and the framework for choosing the right one without falling into the tool-first trap.
Go Deeper
This page covers the concepts. For the full enterprise playbook — cognitive architecture, reasoning frameworks, governance, protocols, and orchestration at production scale — explore The Agentic Enterprise Strategy book and the 28 operational tools that accompany it.
You've got the mental model. Now let's talk about tools.
Explore the Builder's Stack →