I made an AI chatbot answering questions for employees at our company

2023-10-219 minute readlast updated 2023-11-22

In this article, I'll explain to you how I used OpenAI embeddings and completions API to implement an AI chatbot answering questions for employees at our company. The code examples I'll show are written in Typescript, but the same principles apply to any programming language.

Short intro about the problem we're trying to solve here

I work in a IT consulting company called Fink AS. We're around 60 employees, and have a quite large internal document we call The Handbook, which has a lot of useful information about employee benefits, salary, how to book travel, how to use our internal systems, and so on. As a company in growth, this document grows accordingly. Finding the correct information can get increasingly difficult.

Maybe a chatbot can make this information more available for our employees?

Let's try using ChatGPT!

I guess the rudimentary solution would be to simply paste the contents of our handbook into a chat with ChatGPT from OpenAI, followed by a question, like so:

Given the following internal document in our company:

"""
<PASTE_HANDBOOK_HERE>
"""

Answer the following question from an employee: "How do I book travel?"

This however did not produce satisfactory results. If we ask a question about something in the first chapters of the handbook, the AI simply did not answer correctly.

The AI can only read a limited amount of text before answering. This is more precicely called the "context size" of the language model.

The context size for GPT-3 is 4092 tokens, which is a lot less than the handbook, which is well above 30 000 tokens. The newer GPT-4 model has a context size of 32 768 tokens, which would be enough today, but not in the future.

It seems like we have to limit the amount of text the AI has to read before answering. To put it simply, we need to implement some kind of search, and provide the search results to the AI.

Replacing keyword-based search with AI-enabled Semantic Search

If you've implemented a search engine before, you know that there are many ways to do this. The most common way is to implement a keywords-based search where you simply look for the exact words from the query in the database. If you want to get fancy with it, you could also implement some sort of weighting of the words. For instance, you could make prepositions less important than names. From there, you could continue making the search engine more and more advanced, but it will still be based on keywords-matching.

With the advent of AI, we can now implement a search engine that is not based on keywords, but on semantics. This means that we can search for the meaning of the query, instead of the exact words. This is called semantic search.

To be precise, the Large Language Model (LLM) can read some text and produce a vector representation of the meaning of the text. This vector is called an embedding.

We can then use this embedding to compare the meaning of the query with the meaning of the text in our database. The result is a list of documents sorted by how similar they are to the query.

(Figures borrowed from OpenAI docs)

Let's implement this semantic search!

Here is a high level overview of what we're going to build.

The process goes like this:

  1. An employee writes a query to the AI, and we send it to OpenAI to receive an embedding vector of it
  2. We then query our vector database (for example QDrant) to find the most similar pieces of text to this query. This is the semantic search part.
  3. We then send the search results to OpenAI completion API to produce an answer

This implies that we've split our internal documentation into smaller chunks, and created embeddings for these in our database. Creating an embedding is somewhat expensive, so I recommend creating a script for the initial creation of the embeddings, and then update the embeddings when the chunks changes.

Splitting the contents into smaller chunks

In our case, the handbook content was stored in Notion. I'm not going to go into detail on how to fetch the content from Notion, but I'll assume you know how to fetch content from your CMS.

At this point you'll have a large piece of text, or several large ones. However, we need to split it into smaller chunks. In our case, we simply split the document into paragraphs. The result is a list of chunks that looks something like this:

[
    "We have a lot of benefits for our employees. Here are some of them...",
    "If you are on sick leave, you will get 100% of your salary...",
    ...
]

This particular step is kind of important to get right. In my case, a lot of effort went into this. However, it's also very specific to our case and Notion APIs, and thus outside the scope of this article.

Create embeddings for all the chunks

We'll be using OpenAI to create embeddings. You could interface with their REST API directly using fetch, however, for simplicity sake we'll use the openai npm package.

npm install --save openai

Then let's write a function that takes in the text content and returns an embedding vector:

import OpenAI from 'openai';

const OPENAI_API_KEY = process.env.OPENAI_API_KEY
const EMBEDDING_MODEL = "text-embedding-ada-002"

async function createEmbedding(content: string): Promise<number[]> {
  const openai = new OpenAi({
		apiKey: OPENAI_API_KEY,
	});

  const response = await openai.embeddings.create({
    model: EMBEDDING_MODEL,
    input: content,
  });

  return response.data[0].embedding
}

In the code example above, we chose to use the embedding model called text-embedding-ada-002. You could replace this with any other embedding model from OpenAI.

If you wonder what embedding model to choose, I'll recommend just testing them out and see which one performs best for your use case. Start with the cheapest one and move to a heavier one if it's not good enough 💸

Storing the embeddings in a database

There are many options for storing the embeddings. In our case, we chose to use QDrant. You can get a QDrant server up and running very quickly using Docker:

docker run --rm -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage qdrant/qdrant

Now you have the QDrant API listening on port 6333 locally. Interfacing with QDrant could be done directly using fetch, like this:

export function insertEmbeddings({
  collectionName,
  embeddings,
}: {
  collectionName: string;
  embeddings: Embedding[];
}) {
  return fetch(`${QDRANT_URL}/collections/${collectionName}/points`, {
    method: 'PUT',
    headers: {
      'Content-Type': 'application/json',
      'api-key': QDRANT_API_KEY,
    },
    body: JSON.stringify({
      points: embeddings.map((e) => ({
        id: randomUUID(),
        vector: e.embedding,
        payload: {
          // This is where you can store metadata about the embedding
        },
      })),
    }),
  });
}

Vectors are stored in separate collections, so you have to create a collection using their POST /collections/:collectionName endpoint beforehand.

Performing the semantic search

At this point you have chunked your content into smaller pieces, created embeddings for each of them, and stored them in Qdrant. Then it's time to perform the semantic search. Let's reuse our createEmbedding function as defined in our snippet above, and use fetch again to interface with QDrant:

async function search({
  query,
  collectionName,
}: {
  query: string;
  collectionName: string;
}) {
  // First we create an embedding for the query
  const vector = await createEmbedding(query)

  // Then we find the top 5 most semantically similar chunks to our query
  const response = await fetch(`${QDRANT_URL}/collections/${collectionName}/points/search`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'api-key': QDRANT_API_KEY,
    },
    body: JSON.stringify({
      vector,
      limit: 5,
      with_payload: true, // This will return the metadata we stored in the payload
    })
  });

  const searchResponseJson = await searchResponse.json();

  return searchResponseJson.result
}

Using this function you'll get a list of chunks that most likely contain an answer for the query. Now it's time to send these chunks to OpenAI to get an answer.

async function getAnswer({
  query,
  collectionName,
  openai,
}: {
  query: string;
  collectionName: string;
  openai: OpenAI
}): Promise<string> {
  // First we perform the semantic search
  const searchResults = await search({ query, collectionName })

  /**
   * Then we put the search results together with the question 
   * from the employee and send it to OpenAI for answering
   */
  const response = await openai.chat.completions.create({
      model: 'gpt-3.5-turbo',
      messages: [
        {
          role: 'system',
          content:
            'You are an assistant who answers questions for employees in a consultancy firm called Fink. You do not ask any questions.',
        },
        {
          role: 'user',
          content: `${searchResults.map(
            (r) =>
              `"""Here are some texts from our handbook:\n${
                r.input_text
              }\n"""\n`
          )}\n\nQuestion: ${question}`,
        }
      ],
      temperature: 0.1,
    })

  return response.choices[0].message?.content
}

And that's it! There is a lot of glue code left out in this article for brevity, but in short this is what we've done:

  • Split the contents into smaller chunks
  • Create embeddings for all the chunks
  • Store the embeddings in a database
  • Perform the semantic search
  • Put the search results together with the question and send it to OpenAI for answering

And that's it! We made a simple UI for our AI and this is the result:

Hey, I'm Magnus, a developer from Norway.

I'm currently employed at Fink AS.

I also write about technical stuff in general

Newtype or newtrait? Ways around the orphan rule

2024-10-203 minute read

Can we work around the orphan rule in Rust using traits instead of newtypes?

Read more

Kunstig Humor – Improv theater meets AI

2023-10-184 minute read

This fall, I've been assisting the comedy group Vrøvl in setting up a show where improvisers and the audience interact with AI live on stage.

Read more

My first stab at 3D game development

2023-08-233 minute read

I wanted to learn about 3D game development, so I set out on a small project inspired by a friend of mine who lives on a farm. His name is Gunnar and he lives on a farm called Steinseth Gård.

Read more

How most Rust projects are organized (Part 2)

2022-01-102 minute read

In part 2, I manually inspected a selection of Rust projects looking for patterns in how files and folders usually are structured.

Read more

How most Rust projects are organized

2022-01-072 minute read

I collected data from GitHub.com to see what resides in most Rust projects src folder.

Read more

I implemented Twitter in the woods using military radios

2018-02-0415 minute read

In this project I implemented a very simple Twitter-like application for use in networks with very low bandwidth and high packet loss rate.

Read more