The rise of large language models (LLMs) like GPT-4 has been one of the biggest product development breakthroughs in recent memory. As software product teams look to leverage LLMs to create value for their customers, many are wondering how to build the right skills and processes within their organizations. Most don't have significant prior experience with AI.
To provide some practical perspectives, we spoke with Nick Nieman, VP of Product at Sprout Social (NASDAQ: SPT), about their teams' journey ramping up using LLMs in production. Nick shares insights into why & how they got started, what his teams have learned, and how he envisions needs evolving across product, data science, and engineering functions.
A key bit of advice: If you haven't already, get started and learn by doing.
Read on for an insight from a product leader in the trenches.
———————————————————————————
Ian Cairns: Early this year you asked several of your product teams to get started building with LLMs, and I don’t think you’re alone among leaders who have done that this year. What was the motivation initially, and how has that played out so far?
Nick Nieman: We've invested in AI/ML at Sprout Social for years and have been keeping an eye on generative AI progress. As the development of large language models became highly visible late last year—and progress accelerated—it caused another round of internal questions about the fit and readiness of the technology for primetime.
Sprout Social helps brands create marketing content, engage with their customers, and analyze trends on social media, and at that point we saw clearly that the trajectory of LLM improvement was incredible and would add value for our customers on a long enough time horizon. Still, there was no shortage of questions about how to build around LLMs, how we could take a unique approach in our industry, what the risks were, etc. But we ultimately didn't feel capable of answering those questions from the sidelines and instead believed we needed to learn through doing.
With that perspective, we set out to have a few teams across different Sprout Social product areas experiment with LLMs. We embraced the goal of building something potentially shippable in a budgeted time period rather than working backward from an ideal scope. This really pushed us directly into hands-on learning—literally, lots of trying ideas in OpenAI's sandbox or ChatGPT—to validate ideas rapidly. It helped demystify for everyone what LLMs are and how accessible they are to the typical product team, whereas they might have assumed it was out of their reach without very specialized AI/ML skills or help.
Some of the resulting product iterations we didn't ship but still learned a lot from, and many did make it into our customers hands. Examples of releases have included suggested social post captions, rewriting of social messages for tone, and suggested keywords for social listening queries.
You can see a little more on these features here or check them out directly by signing up for a Sprout Social trial.
Ian: If one goal initially was learning about the tech, what do you feel like the main takeaways have been so far for your PM’s? What’s been surprisingly easy or positive about working with LLMs, and what’s been challenging?
Nick: The flexibility and ease of starting with LLMs relative to other AI/ML models is a big positive. The initial cost is zero. In evangelizing experimentation with LLMs inside the company, I continually reiterate that using a model like OpenAI’s GPT-4 is a simple API request containing a text field. You can rapidly answer, "Is an LLM broadly capable of this thing?". It's fun, and through hands-on play our ideas about what the technology can do (and can't) keep expanding.
It gets harder to answer "How can I get even better results?" and "How does this perform across a wide range of scenarios?" As easy as it is to experiment with LLMs and various prompts, confidently optimizing is challenging. Prompt engineering is such a novel domain, and we often make poorly educated guesses at what will improve results (much less why it works). That challenge is multiplied when you must test a prompt against various customer inputs. The models are stochastic so tests produce different results with the same inputs. And then, how do you really assess the quality of a large set of outputs? Like other AI/ML tasks, it can be subjective, and even humans won't always agree at a low level. Our toolchain needs to evolve to let us learn at a larger scale. In the interim, you understandably see a lot of product teams releasing their work in more guarded beta periods to mitigate risks.
Ian: Everyone’s trying to figure out what roles & responsibilities look like when working with LLMs, since there’s a shift both in what’s needed (new processes & skills around prompt engineering, testing, etc.) and what’s possible (e.g. non-engineers can contribute in new ways, since prompts are English). I know you’ve mentioned at least a couple cases where PMs have been involved in prompt engineering and doing testing on different prompt versions. What’s the role your PMs have played in these projects so far, and how do you see that role evolving over time?
So far, we have mostly seen PMs take on the prompt engineering work for their team. That's happened organically and teammates in design, engineering, and data science have also contributed. The fact that the medium for interacting with LLMs is plain language is such a unique dynamic within the development process. As a PM, I don't pretend I can write production-quality code, but I do write in English a lot. Having a chance to meaningfully, directly improve the product by iterating on an LLM prompt is fun and scratches a particular "maker" itch for me and probably other PMs.
In plain terms, typically our PMs are touching the prompt in the development process at various points like:
initial validation that an LLM can handle a task; usually this is happening in early product discovery
experimentation with major variations in prompts to select top options; usually happening early in product execution
decisions about what to track in terms of customer context, prompt inputs, prompt outputs, etc for post-release learning; happening as early as possible in product execution
looking at prompt outputs and customer utilization post-release, selecting "winning" prompts, considering new prompt iterations to try
At least at Sprout Social, I suspect that PMs will be the main stewards of prompts for a while. I'm intrigued by the idea of having someone truly expert in prompt construction (i.e. a prompt engineer) to take things to another level, but that's a scarce skillset and practically I think it's something we'll need to build collectively in the organization.
Ian: From an organizational standpoint, you started off with each product team working independently to build with LLMs, and you mentioned recently that you’re ramping up investment from your central data science team to improve practices around LLMs. What are the goals there, and how do you see the organization developing with regard to skills & tools for building with LLMs?
Nick: Given some basic inputs on data governance, I think any given product team at Sprout Social is capable of solving customer problems using an LLM. But there are ways we can give them more leverage and allow them to solve even more complex problems, and we're leaning on our data science and infrastructure teams to help there.
Some of the things we're asking these more centralized teams to think about and work on include:
experimenting with open source and self-hosting of various foundational LLMs
fine-tuning LLMs to perform better at specific tasks in our domain
developing machine learning approaches around LLMs, e.g. automatically optimizing which prompts are used in which customer contexts based on product feedback loops
increased privacy measures around LLMs
I think we’ll see this kind of pattern of individual product team empowerment and some measure of centralized AI/ML development continue for a while. I believe the endgame for a product like ours is that LLM capabilities are present in a very large percentage of our customer workflows and thus it’s impractical to try to fully centralize the product development to a small part of the organization. But we still need specialized skills to move the toolkit forward.
Ian: Every PM & Product leader right now is thinking about what advances in AI mean for them, and you’ve had the opportunity to learn across several teams and initiatives so far. Any other advice or suggestions you’d give to Product people who are ramping up with LLMs?
Nick: If you have an instinct that LLMs could be important to your customers but you haven’t gotten started yet, I want to relate again how valuable it’s been for us to just dive in head first and experiment.
I think there can be a tendency to mystify anything AI/ML as being out-of-reach for all but the most specialized people. But getting started with an LLM is something that’s within the reach of most product development teams, and curiosity and willingness to immerse yourself can outweigh historical knowledge in these moments of rapid development.
At Sprout, I know we still have so much to learn and we won’t predict LLM-oriented opportunities or practices perfectly, but our early experimentation has given us confidence we can build upon. We couldn’t have gotten there only waiting, watching, and talking about the potential.
———————————————————————————
Big thanks to Nick for sharing these perspectives! If you're curious to learn more about what Sprout Social is building for customers with AI, check out sproutsocial.com/ai.