Optimizing for Developer Control: Designing An SDK To Stay Out Of The Way

Apr 11, 2024

In the spectrum of tools to help developers build with LLMs, there are two primary approaches to delivering abstractions: route traffic through a proxy service, or simplify complexity via an SDK. Both have benefits on the surface but also introduce tradeoffs. At the end of the day, developers need tools that solve their problems and limit downside risk. The challenge is finding the right balance.

Threading that needle well means giving developers tools that simplify building, avoid risk from production downtime or code fragility, and ultimately let people build whatever they want and need.

At Freeplay we’ve prioritized building for established teams operating at scale in growth stage and enterprise companies, who generally know what they want to do and have strong opinions about how their code should work. That’s led us to design a developer experience that prioritizes SDKs with a set of simple, atomic functions that give developers the maximum amount of control and freedom to build however they want — and without creating risk on the production hot path.

Want the freedom to call any model? Want to write your own business logic and code for interacting with LLMs? Or, prefer to use a framework like LangChain or LlamaIndex? Like using SDKs, or prefer DIY in your language of choice (Go? Rust? Elixir?) using an API? Our approach gives developers full freedom to choose.

This post walks through our philosophy and some examples to illustrate.

Simplicity vs. control

One-line code integrations make for fun Twitter demos and short product highlight reels that generate a lot of buzz. Almost everyone loves the magic of trying a new system with just a line or two of code.

At the same time, when it comes to complex interactions like LLM observability and evals, they often come with tradeoffs:

  • Third-party proxies are subject to downtime and introduce latency, plus they only provide whatever functionality that third-party supports

  • Monkey-patching more popular SDKs like OpenAI and Anthropic is fragile and subject to breakage — plus limiting when it comes to adopting new or less-popular LLMs

  • The more business logic that’s handled by an SDK or framework, the less freedom you have to solve your own unique problems

Simple sounds great at first, and it’s even what we prioritized with the first version of our SDK. But “simple” on the surface also means limits in terms of what you can do (or complex workarounds to make them happen).

The larger the customers we work with, the less likely they are to want to deal with frameworks and third-party proxies. Growth-stage and enterprise customers consistently express a desire for more direct control over their LLM interactions to support more complex usage patterns.

As a result, we started running our developer experience design decisions through what we call the "Chief Architect Test."

As one Staff Engineer at a growth-stage tech company told us:

"The LLM interaction is the most core part of our product experience. The last thing we want is an early stage startup to be in the hot path."

Instead, they framed their goals in two ways:

  1. Give tools I can use without it introducing any risk in production — if it’s down, it should fail gracefully and not interrupt the customer experience.

  2. As the generative AI ecosystem evolves and new functionality constantly emerges, give me the freedom to adopt any new models, parameters, tools, etc. without waiting for SDKs to catch up.

At the end of the day, they want foundational parts of their development workflow to be easy — things like prompt management, observability, testing & evals — while maintaining as much control as possible.

Our philosophy

At Freeplay, our team has spent a significant portion of our careers building and designing developer APIs and SDKs. One of our guiding principles has always been to "Make the easy things easy and the hard things possible."

As we thought about the feedback from our customers, we found ourselves inspired by the philosophy behind the early UNIX tools like grep, sed, and awk. These tools each focus on doing one thing really well (text searching, text editing, and text processing, respectively). By providing a focused set of capabilities, they can be composed together in powerful ways to handle a wide variety of tasks. This composability has allowed them to remain relevant and useful for decades in an changing technology landscape.

Freeplay fundamentally does two things with production LLM traffic:

  1. We make it easy to manage & deploy prompt templates and model config

  2. We let customers record LLM completions for observability (and then run evals on them, curate them into datasets, etc.)

The early version of our SDK encapsulated both of these together into one call, along with the call to an LLM. This made it easy if you were calling popular models like OpenAI or Anthropic. But it made other things hard: calling self-hosted models, managing complex fallback logic, etc. Plus it introduced dependencies for each of those LLM SDKs.

We realized we needed to rethink our approach, and the result is our current SDK the decomposes each of those steps into their own pieces:

  1. Fetch prompt templates & model config from Freeplay: Centralizing prompt experimentation in Freeplay allows the whole team to version control, test, and collaborate on prompts. People can launch new prompt changes or model changes from the server and our SDK make it easy to select the right version using an environment variable — just like a feature flag. (Plus, you can use our CLI for prompt bundling to check prompt templates out into source code to minimize latency and further increase control in production.)

  2. Making the LLM call: This core interaction is managed directly by our customers’ code, Freeplay’s not in the loop at all. Our SDKs help translate prompt template formats and model config, but then developers are directly in control to manage LLM calls however they want — switching providers, managing fallbacks, etc.

  3. Recording response back to Freeplay: Tracking the inputs, outputs, and metadata around each LLM interaction lays the foundation for testing and evaluation. Our customers record LLM results back to Freeplay (including to their own privately-hosted database when needed). These calls can be made fully async to again limit latency and risk.

By decomposing our SDK into these atomic steps, we give teams the flexibility they need while still providing powerful tools to manage the end-to-end lifecycle. And developers can do all the same things through our API in any language they choose.

An example

Here’s what the code might look like using our Python SDK (we also offer options for Node and JVM languages, see full SDK docs here):

# create your a freeplay client object
fp_client = Freeplay(
    freeplay_api_key=os.getenv("FREEPLAY_API_KEY"),
    api_base="https://acme.freeplay.ai/api"
)

## PROMPT FETCH ##
# set the prompt variables
prompt_vars = {"keyA": "valueA"}
# get a formatted prompt
formatted_prompt = fp_client.prompts.get_formatted(project_id=os.getenv("FREEPLAY_PROJECT_ID"),
                                                  template_name="template_name",
                                                  environment="latest",
                                                  variables=prompt_vars)

## LLM CALL ##
# Make an LLM call to your provider of choice
start = time.time()
chat_response = openaiClient.chat.completions.create(
    model=formatted_prompt.prompt_info.model,
    messages=formatted_prompt.messages,
    **formatted_prompt.prompt_info.model_parameters
)
end = time.time()

# add the response to your message set
all_messages = formatted_prompt.all_messages(
    {'role': chat_response.choices[0].message.role, 
     'content': chat_response.choices[0].message.content}
)

## RECORD ##
# create a session
session = fp_client.sessions.create()

# build the record payload
payload = RecordPayload(
    all_messages=all_messages,
    inputs=prompt_vars,
    session_info=session, 
    prompt_info=formatted_prompt.prompt_info
)
# record the LLM interaction
fp_client.recordings.create(payload)

This more decomposed approach is more verbose, but it puts teams in full control of their LLM interactions while still getting the benefits of Freeplay's prompt management, observability, and testing workflows.

We continue to get great feedback from enterprise developers and other serious production software teams that this is the approach they’re looking for — giving them full control over the critical LLM interaction, while providing a powerful set of tools to manage the end-to-end experimentation and development lifecycle.

If you're building a product with LLMs, we'd love to show you how Freeplay can help you iterate faster and ship with confidence without getting in your way. Please get in touch here to learn more.

© 228 Labs Inc. 2024