Mocking OpenAI for testing

July 10, 2024 | Nico Lutz

Easily monkeypatch OpenAI calls for testing with pytest

Some of my recent applications use parts of the OpenAI API or its derivates on Azure. Also part of every software development process should be test development. Mocking calls to OpenAI can have many reasons, it drives down cost and speed, also it gives you control over your application during testing. But be aware, since you are mocking the response from OpenAI you are simply testing your applications logic and expect OpenAI to perform in a deterministic way, which could pose problems whenever you change models or do complex logic with the Large Language Model responses. In this short blog posts I show how you can mock calls to OpenAI. In this particular case the client is AsyncAzureOpenAi but the approach should also work for the standard not async client, just mock a different class.

Without further ado, here is my mock that takes care of ChatCompletions responses.

from openai.resources.chat.completions import AsyncCompletions

@pytest.fixture
def mock_openai_chatcompletion(monkeypatch):

    mock_responses = []

    async def mock_acreate(*args, **kwargs):
        if mock_responses:
            return mock_responses.pop(0)
        else:
            raise ValueError("No mock response available for the call.")

    # Mock the `openai.ChatCompletion.create` method
    monkeypatch.setattr(AsyncCompletions, "create", mock_acreate)

    class MockChatCompletion:
        def __init__(self):
            self.responses = []

        @property
        def responses(self):
            return mock_responses

        @responses.setter
        def responses(self, value):
            # Extend the list of mock responses for multiple calls
            mock_responses.extend(value)

    return MockChatCompletion()

The whole thing is pretty basic, my whole application goes async and therefore I need to mock AsyncChatCompletions and monkeypatch the create function that I use throughout my code. Here I decided to return an object that holds a list of responses. My reasoning will be clear once I show how I use this mock inside my tests. Let us for example imagine I have an app that in essence proxies calls to OpenAI/Azure and does some logic to its output. A test could look like so:

from openai.types.chat import ChatCompletion, ChatCompletionMessage
from openai.types.chat.chat_completion import Choice, CompletionUsage

@pytest.mark.asyncio
async def test_handlers_api_chat(client, mock_openai_chatcompletion):
    mock_openai_chatcompletion.responses = [
        ChatCompletion(
            choices=[
                Choice(
                    index=0,
                    finish_reason="stop",
                    message=ChatCompletionMessage(
                        content="Whispers of the wind",
                        role="assistant",
                    ),
                )
            ],
            model="gpt-4o-2024-05-13",
            usage=CompletionUsage(
                completion_tokens=5,
                prompt_tokens=36,
                total_tokens=41,
                completion_tokens_details=None,
            ),
        ),
    ]

    res = client.post(
        "api/chat",
        json={})
    assert res.status_code == 200

As you can see I simply set my responses inside the test via the provided pydantic OpenAI models. This gives me the advantage to have clear control over which response happens at which test. Also using a list offers the ability to let the mock be called multiple times and have complete control over its output. For example if one of my endpoints calls OpenAI more then once.

By the way the same approach also works for streams and embeddings. For example here is mock for embeddings coming from OpenAI.

from openai.resources.embeddings import AsyncEmbeddings

@pytest.fixture
def mock_openai_embeddings(monkeypatch):
    mock_responses = []

    async def mock_acreate(*args, **kwargs):
        if mock_responses:
            return mock_responses.pop(0)
        else:
            raise ValueError("No mock response available for the call.")

    # Mock the `openai.Embedding.create` method
    monkeypatch.setattr(AsyncEmbeddings, "create", mock_acreate)

    class MockEmbedding:
        def __init__(self):
            self.responses = []

        @property
        def responses(self):
            return mock_responses

        @responses.setter
        def responses(self, value):
            # Extend the list of mock responses for multiple calls
            mock_responses.extend(value)

    return MockEmbedding()

and its usage like so:

@pytest.mark.asyncio
async def test_handlers_api_chat_200_case_6(
    client, mock_openai_chatcompletion, mock_openai_embeddings
):
    # One Chat Message
    mock_openai_embeddings.responses = [
        CreateEmbeddingResponse(
            data=[Embedding(embedding=[123., 123.], index=0, object="embedding")],
            model="text-embedding-ada-002",
            object="list",
            usage=Usage(prompt_tokens=11, total_tokens=11),
        )
    ]

    ... do stuff.