Bakken & Baeck logo

Text classification with GPT-4 | Philipp Gross

Text classification is one of the oldest applications of Natural Language Processing. With the advent of large language models like GPT-4, you can get powerful classifiers with little effort.

Setup

Click here to see installation steps and setup of helper function llm()

Install dependencies:

!pip install openai python-dotenv

Setup helper function llm():

import os
import openai
from textwrap import dedent
import dotenv

dotenv.load_dotenv("../.env", override=True)

openai.api_key = os.environ["OPENAI_API_KEY"]


def llm(prompt: str) -> str:
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
              {"role": "system", "content": "You are a helpful assistant."},
              {"role": "user", "content": dedent(prompt).strip()},
        ],
        temperature=0.1,
    )
    return response["choices"][0]["message"]["content"]

Text classification with prompting

When you want to classify a text, the last you need is a list of labels (Zero-Shot Prompting):

print(llm("""
    Categorize the text (elephant or whale).
    Text: A grey heavy animal with a trunk
    Label:
    """))
Elephant

Such a classification task would benefit from an annotation guideline, which explains methods to resolve ambiguities and collects edge cases of examples. In fact, those can be turned into an improved prompt, which leads to the technique Few-Shot Prompting:

print(llm("""
    Categorize the text (elephant or whale).
    Text: The world largest land animal
    Label: Elephant

    Text: The world largest ocean animal
    Label: Whale

    Text: Gigantic, social, intelligent mammals with large size, complex social behavior, long lifespan, remarkable maternal care, and a gestion period of up to 18 months.
    Label:
    """))
Elephant

Clearly, there are limits to those approaches, where larger label sets, or longer example texts are bloating the prompt, eating away the previous but limited context window size. Then you need to get more serious about labeling training examples, and either train a simple model like logistic regression, or finetune a large large model of your choice.

But, if you are interested in a well known classification standard that was published well before the cutoff date of the language model, there is a tiny escape hatch, and you can even get away with a shorter prompt by just referring to its name!

print(llm("""
    Text: Gigantic, social, intelligent mammals with large size, complex social behavior, long lifespan, remarkable maternal care, and a gestion period of up to 12 months.
    Task: Categorize the life form mentioned in the text above with respect to the Linnaean taxonomy and return a JSON list of match object for the top 3 matches having the properties name (string), confidence (float), and taxonomy, which is an object with the string or null valued properties kingdom, phylum, class, order, family, genus and species. You should use null values in the taxonomy if there is not information in the given text.
    """))
[
  {
    "name": "Elephant",
    "confidence": 0.9,
    "taxonomy": {
      "kingdom": "Animalia",
      "phylum": "Chordata",
      "class": "Mammalia",
      "order": "Proboscidea",
      "family": "Elephantidae",
      "genus": null,
      "species": null
    }
  },
  {
    "name": "Whale",
    "confidence": 0.8,
    "taxonomy": {
      "kingdom": "Animalia",
      "phylum": "Chordata",
      "class": "Mammalia",
      "order": "Cetacea",
      "family": null,
      "genus": null,
      "species": null
    }
  },
  {
    "name": "Dolphin",
    "confidence": 0.7,
    "taxonomy": {
      "kingdom": "Animalia",
      "phylum": "Chordata",
      "class": "Mammalia",
      "order": "Cetacea",
      "family": "Delphinidae",
      "genus": null,
      "species": null
    }
  }
]

Conclusion

The results are not perfect, but given the difficult question it is astonishing how good the answers are. This shows that it is really easy these days to prototype text classifiers with varying degrees of effort spent on data labeling. Of course, before releasing such a system to the public it is mandatory to test it extensively depending on the potential impact wrong predictions.

Large language models are amazing machines. I can't imagine the agony of labelling a training set to get a classical model like logistic regression in the same ballpark.