How We Use AI Agents to Build Agent Shelf
A behind-the-scenes look at the specific AI agents the Agent Shelf team uses daily, what works well, and where human judgment still matters.
We eat our own cooking
Building an agent registry means we have opinions about what makes a good agent. But opinions only go so far. The best way to learn what works is to use agents every day, notice where they save time, and notice where they get in the way.
Here's a look at the four agents we rely on most, what they actually do for us, and what we've learned from using them.
The code review agent
Our code review agent runs during every pull request. Its frontmatter defines it as a senior engineer with a focus on security, performance, and consistency. The instructions tell it to follow a specific review order: understand the PR context first, check for correctness, flag security issues, then assess readability.
The agent's workflow section is detailed. It tells the AI to read the full diff before commenting, to group feedback by severity (critical, suggestion, nit), and to always include a corrected code example when flagging a problem.
What works well: It catches things humans skip during Friday afternoon reviews. Unused imports, missing error handling, inconsistent naming. It's especially good at spotting when a new API endpoint is missing authentication checks, because we explicitly listed our auth patterns in the agent's instructions.
What still needs a human: Architectural decisions. The agent can tell you a function is too long, but it can't tell you whether a feature belongs in the API layer or should be a background job. It also struggles with context that spans multiple PRs. If you're refactoring across three pull requests, the agent reviews each one in isolation.
We publish agents like this one in the coding category so other teams can adapt them.
The SEO content agent
Every blog post on this site (including this one) starts with our content agent. It's configured with our brand voice guidelines, a list of banned words (goodbye "delve" and "landscape"), and our internal linking rules. The frontmatter sets its category to marketing and tags it for content and SEO work.
The agent's instructions include our content structure: start with a problem or question, use ## for main sections, keep paragraphs short, and always include internal links to relevant pages. We also give it a list of our site's URL patterns so it can suggest appropriate links.
What works well: First drafts come out 80% there. The structure is right, the tone is close, and the internal links are relevant. It saves about two hours per post compared to starting from scratch. It's also good at suggesting titles and meta descriptions that hit our target keywords without sounding forced.
What still needs a human: Voice. The agent writes competent content, but it doesn't write our content until we edit it. It also can't fact-check itself. If we mention a statistic or claim, someone on the team verifies it. We've found agents in the marketing category that complement ours well for different content types.
The DevOps agent
Our deployment pipeline uses a DevOps agent that generates and validates configuration files. It knows our infrastructure stack (Vercel for the frontend, MongoDB Atlas for the database, GitHub Actions for CI) and has instructions for each environment.
The agent's workflow is prescriptive: when asked to update a config, it first reads the existing configuration, identifies what's changing, generates the new version, and then outputs a diff. We included rules about secrets management (never hardcode, always reference environment variables) and naming conventions for our environments.
What works well: Consistency. Before this agent, our staging and production configs would drift apart in subtle ways. The agent enforces the same patterns everywhere. It's also good at generating GitHub Actions workflows, because we included three annotated examples in its instructions. More agents like this are available in the DevOps category.
What still needs a human: Anything involving cost. The agent will happily suggest configurations that scale beautifully but cost three times our budget. It also can't evaluate whether a performance optimization is worth the added complexity. Those tradeoffs require context about our traffic patterns and team capacity that don't fit neatly into agent instructions.
The documentation agent
Our docs agent keeps our documentation current when the codebase changes. Its instructions include our docs structure, our writing conventions, and a rule that every API endpoint must have a request example, response example, and error case.
The workflow tells it to compare the current docs against recent code changes and flag anything outdated. It also generates first drafts of docs for new features, following the same template every time.
What works well: It eliminates the "docs are outdated" problem almost entirely. When someone adds a new API parameter, the agent catches that the docs don't mention it. The templates also mean our docs are consistent. Every endpoint page looks the same, which users appreciate.
What still needs a human: Explaining why. The agent documents what a feature does, but it can't explain why you'd want to use it or when to choose it over an alternative. The best documentation tells a story, and agents aren't great storytellers yet.
Lessons learned from daily agent use
After months of using these agents, here's what we've taken away:
One job per agent. Early on, we tried building a single "dev assistant" that could review code, write docs, and manage deployments. It was mediocre at all three. Splitting into focused agents with clear responsibilities made each one dramatically better. We wrote more about this in our guide on writing effective agent definitions.
Examples beat instructions. Telling an agent to "write clear error messages" produces generic results. Showing it three examples of error messages you consider good produces messages that match your style. We include at least two examples for every major task in our agent definitions.
Version your agents. We treat agent definitions like code. When we change the code review agent's severity levels, that's a minor version bump. When we restructure its workflow, that's a major version. This matters because team members need to know when the agent they rely on has changed behavior.
Review the agent's work, don't skip it. Agents make us faster, not infallible. Every output gets reviewed by a human before it ships. The review is usually quick, but it's never skipped. This is especially true for the content and docs agents, where subtle inaccuracies can erode trust.
Share what works. When we build an agent that solves a real problem, we publish it. Other teams adapt it to their needs, and sometimes their improvements make it back to us. That feedback loop is the whole reason Agent Shelf exists.
If you want to build agents like these, start with the documentation and publish your first agent. The best way to learn agent design is to use your own agents every day and iterate on what doesn't work.
Written by Agent Shelf Team
The Agent Shelf team builds open infrastructure for AI agent discovery and distribution. We maintain the Agent Shelf registry, MCP server, and publish skill.
How to Migrate from .cursorrules to Agent Skills
Nextarrow_forwardGetting Started with Agent Shelf: Find, Download, and Use AI Agents