tl;dr: This year we’ve talked to >100 leaders & builders creating products with LLMs, ranging from seed stage to public companies and across engineering, product & design. We’ve also spent a lot of time building alongside early customers, feeling the pain points of building with LLMs today, and understanding where they need help.
What we've learned feels useful to anyone building software products with LLMs today, so we’re sharing some learnings here.
A new hierarchy of needs for LLM adoption
The risks from building with LLMs can be high, the costs can add up fast, and the ability to drive real outcomes and ROI can feel elusive. CTOs, CPOs and other leaders describe a hierarchy of needs for incorporating LLMs into their products.
First they need to feel confident that LLMs won’t cause major problems (or cost them their jobs) due to safety issues, bias, hallucinations, legal issues, or other embarrassing failures. Then, they need to know their business can afford to use LLMs the way they want. Almost no one we’ve talked to had generative AI models in their 2023 Cost of Goods Sold (COGS) budget, and the bills add up quickly/margin impact is real.
Both of those challenges are table stakes and really just get a team to neutral, before improving the customer experience can become a primary focus.
Only when they’re addressed can product teams really start to optimize for what matters most: moving meaningful metrics for their customers. That’s the enduring need for any product company, and yet, many teams we’ve talked to feel like they can’t give significant energy because they’re still stuck solving the first two.
Moving teams through this maturity curve faster is one of the most valuable things the industry needs to accelerate the adoption and usefulness of LLMs, and it’s a core objective of Freeplay.
Shifting roles & tools
Skills are converging, roles are changing, and teams are stitching together a piecemeal set of tools to try to keep it all straight.
A year ago, most product development teams had never touched an LLM. Today, teams in every software category and vertical are being brought together to develop new product experiences using AI — many of whom have never worked with AI or machine learning at all in the past, much less LLMs.
From a tooling standpoint, we’ve seen a lot of fragmentation. A framework for this, a proxy for that, some open source, some closed… There’s a lot to make sense of. It can feel overwhelming. Especially at larger companies, leaders are wrestling with decisions between letting their teams pick whatever tools they want, and standardizing on a common toolset that they can trust.
Also, team composition and roles look different. Engineers are still driving integrations and implementation, but PMs and designers are often experimenting with (or even owning) prompts, which often evolve at a different speed than other software. Customer service teams and subject-matter experts might also have a ton of value to bring to QA given their understanding of customer expectations, and a critical role to play in data annotation to improve prompts & pipelines. We’ve found this especially true at companies operating in niche domains.
More than just a “tool,” these teams are in need of new patterns for how to work with LLMs, and new workflows for how to collaborate across job functions.
The best way to set up a team?
Lots of leaders are asking about how to staff their teams. There are two dominant models inside companies that we’re observing.
1. Driven entirely by developers. Some teams are treating working with LLMs as entirely the domain of software engineers (or software engineers + data scientists). They way they tend to build brings the benefits of traditional software engineering to building with LLMs, but also can make the whole pipeline less approachable to others. This can make sense especially in organizations where LLMs are producing outputs that can be evaluated well by engineers or data scientists alone. A downside we hear from some leaders here is that they’re worried about costs from highly-paid engineers doing prompt engineering and QA on LLMs.
2. Collaborations/shifting roles. On other teams, we see developers focusing on what they're best at, but also working closely with PMs, domain experts, and other “non-engineering” roles to collaborate on prompt engineering, evaluations, data annotation, etc. This seems especially common when companies are building niche products, where other roles might have better perspective than most engineers on what exactly makes for a quality LLM response. These teams might create more leverage from empowering other roles outside engineering, but they also face a learning curve.
At Freeplay, we’ve chosen to focus first on the needs of the second group – providing ways to encourage collaboration and the adoption of new workflows in the Freeplay product.
In either model, we also see an eagerness to bring in at least some ML talent if they don’t already have it, so that while other people on the team inevitably learn new techniques, there’s in-house subject matter expertise to help spot issues and make recommendations from experience. The good news with off-the-shelf LLMs and emerging tooling is that you don't have to stack your team with experienced ML engineers & researchers to build a valuable product. For the leaders we've talked to who have it though, there's been a great value in having in-house experts to lean on while others ramp up and learn.
—————
The products we’re building at Freeplay are intended to directly address these challenges, and the ones around the corner. We’ll keep sharing what we see, and invite anyone working through these - or other areas of friction - to get in touch.