Prompt engineering, testing & evaluation tools for product teams. Build better features with LLMs.
How do you discover the best prompts to power your product features? How can you trust a model will consistently do the right thing and delight your customers?
When software teams start building with large language models (LLMs), they quickly realize they need new tools to ship production-ready software. And they need more than just “developer tools,” since the whole team’s getting involved.
We’ve spent the past 6 months talking about the challenges of building with LLMs with dozens of leaders from startups to Fortune 100 companies — CEOs & founders, CTOs & tech leads, product & design leaders, and security pros. A clear picture emerged for us.
Developers play a key role in integrating LLMs into existing software. But product managers, designers, and other domain experts are increasingly involved in prompt engineering and QA too. They crave the freedom to experiment, without distracting engineers — especially when code needs to run to generate realistic test results. And everyone involved is eager to find faster & less-manual ways to test, and to make sure their systems are improving (not regressing) as they iterate.
Enter Freeplay.
We’re building a lightweight, flexible platform that lets product managers, designers, domain experts, and developers collaboratively experiment, test and deploy LLM-powered features in their software.
With Freeplay, you get:
- Complete visibility into LLM responses across environments.
- An intuitive prompt management and version control system.
- Automated testing and evaluation tools.
- A range of enterprise-ready features.
- A simple integration with your existing codebase.
Here’s a snapshot of how it works so you know what’s coming:
Visibility into every LLM response
Freeplay logs complex chains or chat interactions together as "sessions." This way, you can view the actual customer experience, not just one-off prompts. And any session can be quickly labeled and saved for future use in testing (see below).
Prompt management & version control
Freeplay takes your prompts out of code and into our version control system. You choose which version of a prompt to run via our SDK. This means a PM can tweak a series of prompts and push them out to test in a notebook or staging environment — without disrupting prod or requiring a deploy.
Automate testing & evaluation
After initial test results look positive, you then want a way to do more thorough testing and gain confidence you’re ready for production. Freeplay lets you save test cases (including from observed customer sessions in production), then replay dozens or hundreds of examples when you version your prompts or code. We combine AI and human-in-the-loop workflows to make it faster to review and get new versions live.
Enterprise-ready from day 1
We've built Freeplay with B2B software teams in mind. We offer an option for a dedicated environment to protect your data, access controls to manage your whole team, and a fast, flexible developer integration that works with any code your team writes with minimal additional latency.
And this is just the start. We’re already considering related problems like: How to write a good prompt in the first place? How to pick the best model, or train your own that you can run for less money? How to enforce standards across teams?
It’s still early, but we’ve begun testing with private beta partners and we’re excited to share it with the world soon. We're also onboarding more beta partners – reach out if you’re interested.
You can sign up for our product waitlist here.
Want to keep in touch? You can stay connected with us through on Twitter or LinkedIn for the latest news and updates.
Who we are
Our founding team have been building data & developer products together for more than a decade. Co-founders Ian Cairns & Eric Ryan first partnered together at Gnip, then later led product & engineering for Twitter’s developer platform after being acquired. Our founding engineers Brian Newsom & Chris Hogue built and scaled products that powered the nearly $400M Twitter enterprise data licensing business and served hundreds of thousands of developers via public APIs. We’ve seen what it looks like for a whole ecosystem of product companies to develop around a new platform, and we’re bringing that experience to help product teams succeed with LLMs.