Mocking OpenAI for testing
| Nico LutzEasily monkeypatch OpenAI calls for testing with pytest
Some of my recent applications use parts of the OpenAI API or its derivates on
Azure. Also part of every software development process should be test
development. Mocking calls to OpenAI can have many reasons, it drives down cost
and speed, also it gives you control over your application during testing. But
be aware, since you are mocking the response from OpenAI you are simply testing
your applications logic and expect OpenAI to perform in a deterministic way,
which could pose problems whenever you change models or do complex logic with
the Large Language Model responses. In this short blog posts I show how you can
mock calls to OpenAI. In this particular case the client is AsyncAzureOpenAi
but the approach should also work for the standard not async client, just mock a
different class.
Without further ado, here is my mock that takes care of ChatCompletions
responses.
from openai.resources.chat.completions import AsyncCompletions
@pytest.fixture
def mock_openai_chatcompletion(monkeypatch):
mock_responses = []
async def mock_acreate(*args, **kwargs):
if mock_responses:
return mock_responses.pop(0)
else:
raise ValueError("No mock response available for the call.")
# Mock the `openai.ChatCompletion.create` method
monkeypatch.setattr(AsyncCompletions, "create", mock_acreate)
class MockChatCompletion:
def __init__(self):
self.responses = []
@property
def responses(self):
return mock_responses
@responses.setter
def responses(self, value):
# Extend the list of mock responses for multiple calls
mock_responses.extend(value)
return MockChatCompletion()
The whole thing is pretty basic, my whole application goes async
and therefore
I need to mock AsyncChatCompletions
and monkeypatch the create
function that
I use throughout my code. Here I decided to return an object that holds a list
of responses. My reasoning will be clear once I show how I use this mock inside
my tests. Let us for example imagine I have an app that in essence proxies calls
to OpenAI/Azure and does some logic to its output. A test could look like so:
from openai.types.chat import ChatCompletion, ChatCompletionMessage
from openai.types.chat.chat_completion import Choice, CompletionUsage
@pytest.mark.asyncio
async def test_handlers_api_chat(client, mock_openai_chatcompletion):
mock_openai_chatcompletion.responses = [
ChatCompletion(
choices=[
Choice(
index=0,
finish_reason="stop",
message=ChatCompletionMessage(
content="Whispers of the wind",
role="assistant",
),
)
],
model="gpt-4o-2024-05-13",
usage=CompletionUsage(
completion_tokens=5,
prompt_tokens=36,
total_tokens=41,
completion_tokens_details=None,
),
),
]
res = client.post(
"api/chat",
json={})
assert res.status_code == 200
As you can see I simply set my responses inside the test via the provided
pydantic
OpenAI models. This gives me the advantage to have clear control over
which response happens at which test. Also using a list offers the ability to
let the mock be called multiple times and have complete control over its output.
For example if one of my endpoints calls OpenAI more then once.
By the way the same approach also works for streams and embeddings. For example here is mock for embeddings coming from OpenAI.
from openai.resources.embeddings import AsyncEmbeddings
@pytest.fixture
def mock_openai_embeddings(monkeypatch):
mock_responses = []
async def mock_acreate(*args, **kwargs):
if mock_responses:
return mock_responses.pop(0)
else:
raise ValueError("No mock response available for the call.")
# Mock the `openai.Embedding.create` method
monkeypatch.setattr(AsyncEmbeddings, "create", mock_acreate)
class MockEmbedding:
def __init__(self):
self.responses = []
@property
def responses(self):
return mock_responses
@responses.setter
def responses(self, value):
# Extend the list of mock responses for multiple calls
mock_responses.extend(value)
return MockEmbedding()
and its usage like so:
@pytest.mark.asyncio
async def test_handlers_api_chat_200_case_6(
client, mock_openai_chatcompletion, mock_openai_embeddings
):
# One Chat Message
mock_openai_embeddings.responses = [
CreateEmbeddingResponse(
data=[Embedding(embedding=[123., 123.], index=0, object="embedding")],
model="text-embedding-ada-002",
object="list",
usage=Usage(prompt_tokens=11, total_tokens=11),
)
]
... do stuff.