Building Enterprise-Grade AI Agents: Lessons from Sierra's Arya Asemanfar

We started the Deployed podcast to share practical lessons learned from engineering & product leaders who are building AI systems at scale, and this conversation comes at an interesting time. In the past couple months there’s been an explosion of interest in building AI agents. They’re far more complicated than simple single-prompt applications, or even RAG chatbots!

One company who’s been way out in front creating enterprise agents is Sierra – the customer service agent company co-founded by Bret Taylor, who’s the former co-president of Salesforce and the current OpenAI board chair.

Arya Asemanfar leads product engineering at Sierra, and we’re fortunate to talk with him in this episode. He shares a ton of practical wisdom gained from building enterprise agents that actually work. Arya covered a range of useful topics with us — from how Sierra designs custom agents for customers, to the details of their agent development stack, their eval system, human review process, and how they build feedback loops into their product.

A few highlight clips below should be especially valuable for teams building production AI agents (or thinking about it).

Read on for our key takeaways, or check out the full episode on Spotify, Apple Podcasts, or YouTube.

Bringing AI Agents to Life With Customers

Anyone building an agent product will want to think about Sierra’s approach to onboarding and working with customers. Rather than trying to build a one-size-fits-all solution, they've embraced the unique needs of each business through what Arya calls a "co-design process."

"Each agent is its own product," Arya explains. Sierra pairs each customer with an agent developer who works to understand the customer's brand voice, policies, and desired customer experience. This isn't just initial setup – it's an ongoing partnership to shape how the agent behaves and evolves.

This approach has helped Sierra navigate one of the trickier aspects of enterprise AI adoption: setting appropriate expectations. "Early on when AI was still fairly new... a lot of customers would show up and be kind of unsure about what to expect," Arya notes. "I think that we've actually seen that start to change fairly rapidly... customer expectations are starting to be much more grounded."

The Agent Development Stack

Everyone is trying to figure out the basic patterns, tooling, and infrastructure needed to build great agents. As Arya puts it, "If you think about Google in its early days, they not only built a great product in their search engine, but they also got really good at building internet scale products (and invented tech like MapReduce to help)... I think the same is going to be true for agents."

Arya describes the critical components of their platform that includes tools for:

Agent Design: New tools for modeling complex conversational flows
Testing: For both unit-test style/prompt-level evaluations, and integration tests for full conversation flows
QA: Multiple layers of quality checks before agents go live, including human-in-the-loop reviews
Monitoring: Runtime supervision of agent behavior
Observability: Understanding what's happening in production

Having these components has been essential for scaling their ability to build and maintain multiple enterprise agents. "We're not only sort of building AI agents for our customers, we're also investing in building the tools and processes and ultimately a product that is the best way to build the best agents."

Building a Comprehensive Evaluation System

One of the most valuable parts of any AI system is evals, and Arya breaks down how Sierra approaches testing and evaluating their agents.

Unit-test style evals for specific prompt behaviors
Integration style tests for end-to-end conversation flows / skills that an agent might do
Runtime checks using supervisor models
Human review as a critical backstop

Importantly, Sierra runs both platform-level evals (testing their core abstractions) and customer-specific evals (verifying specific behaviors or tone).

He describes this set of things as a necessary "constellation" of evaluation methods.

The Essential Role of Domain Experts

A particularly important part of Sierra's approach (and most AI teams) is how they've integrated domain expertise into their development process. Arya describes technical expertise being especially important early on, and then emphasizes that domain expertise becomes increasingly critical as systems scale.

We’ve been building Freeplay to make sure it works well for less-technical domain experts, and Arya describes well why we do this: "The domain expert is probably the more important one (vs. technical experts) once you have some level of scale... Knowing what the right customer experience should be is more important because ultimately you have to give that feedback. If the behavior is incorrect because the behavior was not correctly specified, the only person who's going to know that is the person who knows what should happen instead."

He talks more about this here.

Designing Effective Feedback Loops

Another big question we hear a lot is how to build effective feedback loops. Arya talks about some of the challenges to doing this well, including that effective feedback loops require more structure than simple freeform comments. Different types of issues need different types of feedback:

For tone/phrasing issues, it’s helpful to collect direct examples of better responses
For behavioral issues, you end up wanting something closer to a detailed spec of how the interaction should work
For system-level issues, sometimes what’s most valuable is complete workflow documentation

"Having an opinion about the kinds of feedback that you want to collect and how to act on those things in particular is important," Arya notes. This structured approach helps ensure feedback can be effectively acted upon rather than creating "whack-a-mole" situations.

Looking Ahead

We covered a bunch of other topics, including Arya’s excitement about the future of AI product development. He believes we're still early in discovering new paradigms for how people will interact with AI-powered products. Just like mobile devices led to new interaction patterns like pull-to-refresh, AI may enable entirely new ways of engaging with software.

For teams looking to build in this space, his parting advice emphasized the importance of staying grounded in specific customer needs: "It's too easy to get in your head too much about what is the right solution to a particular problem in the abstract... keeping it hyper focused on one customer or one concrete example... helps you build a foundation from which you can now have a more informed intuitive mental model."

Thank you to Arya and the Sierra team for sharing these insights. If you're interested in learning more about Sierra's work with enterprise AI agents, visit sierra.ai.

Want to hear more conversations like this one? Subscribe to Deployed on Spotify, Apple Podcasts, or Youtube.

Subscribe to our newsletter

Product

Blog

Resources

Company

Pricing

Book a demo