Welcome to Ragu

Ragu is a platform for creating LLM assisted workflows with the aim to automate tedious and repetitive tasks in order to save time and sanity.

It is composed of 2 parts; A chat application for executing workflows intended for your end users and an administration application intended for Ragu administrators.

This booklet is designed to get you up and running with Ragu and it will explain the core concepts necessary to customize Ragu for your needs.

A primer on large language models

This section is a bit technical. If you are not interested or you already know how an LLM works, feel free to skip to the introduction.

Large language models are a technology that has recently taken the world by storm. They are powerful, but it is important to remember they are not magic.

LLMs are as good as the data they are trained on and the data they are given to work with. This is important to keep in mind when you create knowledge bases and ultimately when you converse with LLMs.

Training LLMs is lengthy and expensive (in the sense of hardware, electricity and labor). This is due to the fact that these models are trained on vast amounts of data and they have an enormous number of parameters which require specialized hardware to train efficiently.

Context enrichment

Once an LLM is trained it will reflect the data it has been trained on. In other words, it will only regurgitate what it already knows. This begs the question; How do we teach it concepts from our domain? How do we provide them additional data to make them better suit our needs?

One technique, called fine-tuning, can be used to adjust the parameters of an LLM using custom data. This technique readjusts the model's parameters and makes it more likely to respond with the information we have trained it with. This is a very powerful technique, however it has its caveats.

Fine-tuning is an intensive process which, like training, requires a significant amount of data and computing power. It is not as intense as training, but it is still an expensive process and requires specialized hardware (if the goal is to do it in a reasonable amount of time). Additionally, once the model has been fine-tuned, new information that needs to be added to it will require a whole new round of fine-tuning.

While fine-tuning is an extremely valuable technique for smaller models, it is not the primary way of enriching LLM contexts that Ragu uses. Instead, Ragu uses retrieval augmented generation (RAG).

Don't be scared by the big words, the technique itself is much simpler than it sounds.

The jist of it is the following:

User prompts the LLM with a question.
A text embedding model is used to embed the prompt.
The prompt embeddings are used to perform semantic similarity search in a vector database.
An arbitrary amount of data is retrieved from the vector database (retrieval).
The data is prepended to the prompt to form a context enriched prompt (augmented).
The context enriched prompt is then sent to the LLM that generates a final response for the user (generation).

The beauty of RAG is that we get to fully utilise the power of an already trained LLM without having to go tinker with its underlying parameters. Instead, at inference (prompting) time we grab the most relevant information from the knowledge base and feed it to the LLM.

Text embeddings (vectors) and vector databases

A text embedding model is a model that converts text into a vector representation. Embedding text means creating a vector that represents the original text. These vector representations are used to represent the text in a way that allows us to calculate the "similarity" of the original text.

More specifically, "similarity" is the distance between two such vectors. Imagine two vectors on a coordinate plane, which represent two pieces of text. If the vectors are close together, i.e. if they are pointing in a similar direction, then the text is considered to be semantically "similar". If they are pointing opposite of each other, then the text is considered to be semantically "different".

The same text embedding model that embeds the prompt is also used to embed documents you store in collections. The prompt is embedded (transformed into its vector representation based on the embedding model) and then used as a reference vector to calculate the distance between it and other vectors in the vector database. The closest vectors' text contents are then retrieved and added to the prompt.

A vector database stores these vector representations in a way that allows efficient retrieval. However, unlike traditional databases, they do not search for exact matches, they search for the most similar matches. The similarity is defined by the embedding model, and is represented by the distance between the search vector (the prompt) and other vectors stored in the database (the documents).

To further clarify the concept of semantic similarity, think of the words "cat", "car", and "dog". A traditional database will have grouped "cat" and "car" together since they are similar lexicographically. A vector database, on the other hand, will have grouped "cat" and "dog" together since they are similar semantically.

Conversing with LLMs

LLMs have a specific way of accepting messages. Most LLMs will use the following format and message types:

system - A message that is sent to the LLM at the beginning of the conversation. This is typically used to give the LLM instructions on how to behave. A user never sends this manually, it is constructed automatically depending on the context you set up for the agent.
user - A message that is sent to the LLM by the user as part of a conversation.
assistant - An LLM generated message that is sent back to the user.

A system message is typically sent to the LLM at the beginning of a conversation, while the user and assistant messages are sent by the user and LLM, respectively and interchangeably (meaning a user message is always followed by an assistant message, never the reverse).

Keep in mind, this whole primer is an oversimplification for the sake of brevity. Whole papers have been written on these subjects and it's not feasible to explain these concepts in a single page. Nevertheless, these are the essential concepts that should help you when you interact with Ragu.

Introduction

This chapter will guide you through creating your first agent and knowledge base.

Creating your first agent

Now that you're familiar with some of the basic concepts of Ragu, let's go through the process of creating your first agent.

Head on to the admin panel, click on Agents in the sidebar and click on the Create agent button at the top right of the page.

TODO: NEEDS GIF

You should now see a page where you can give your agent a name, description, context and various kinds of instructions. For now, we'll be ignoring the instructions, those are described in more detail in the Agents section.

Give your agent a memorable name (if you can't think of one, call it 'Radical Ragu') and a short description that specifies its purpose.

Now set the agent's context. This is one of its most important parameters as it will dictate how it will behave. You always want to write the context as though you are referring to the agent itself, i.e. in second tense.

One example of a context would be

You are Radical Ragu, a helpful assistant that answers all questions related to ragu.
If you receive a question not related to ragu, let the user know you only talk about
ragu.

Finally, you need to set the agent's LLM, indicated by the model parameter. Depending on how Ragu was configured, these will vary. For now, we'll be using OpenAI's GPT-4, but if you don't have that one, don't worry, just use any model that's available. For the model parameter, select openai/gpt-4.

Optionally, you can set the model's temperature. This is a value between 0 and 1 that controls the "creativity" of the model's output. The higher the value, the more "creative" the output. If you don't want your agent's responses to be too crazy, we suggest keeping this at the default value of 0.1.

Press the Create agent button at the bottom of the page. Voila! You created your first agent. Easy, right?

Well, OK, we're not done yet. You might be wondering why the agent is inactive. When you create new agents, most likely you do not want them to be active until you've configured their knowledge base, that's why they're always inactive by default.

Go ahead and activate the agent by clicking the Activate button on its page. Switch to user mode via the profile icon at the top right of the page. You should see the agent on the dashboard. Try sending it a message.

Try asking it Who is Raguru Labamba?.

It doesn't know? Let's fix that!

Switch back to admin mode via the profile icon at the top right of the page and head over to the Collections page, located in the sidebar. Click on Create collection in the top right corner of the page and give it a name. Collection names can only contain letters, numbers and underscores.

Next, pick an embedding model. These aren't important for now and are explained in the Collections section in more detail. Again, these will vary depending on your configuration. For now, we'll be using OpenAI's text-embedding-ada-002, so go ahead and pick that one.

You should now see it in your collection list. Click on it to open it.

Each collection starts out empty and here is where you add documents to it. To your left will be a list of documents not yet added to the collection, while to your right will be a list of those currently in it. If you have uploaded documents previously they should be visible on the left side. You should also see the default document RaguruLabamba.txt on the left hand side of the Add documents to collection section.

Add the RaguruLabamba.txt document to your collection by clicking on it, then clicking submit. Radical Ragu now contains Raguru Labamba's biography as part of its knowledge base.

Now that you have a document in your collection, let's assign it to your agent. Go back to the agent's page and click on the Assign collection button at the bottom of the page. Select the collection you just created. You'll see two additional parameters;

The instruction parameter is where you tell your agent what to do with the data it obtains from the collection. In this case, you can instruct it to use the information to answer any ragu related questions.

Give it the following instruction:

Use the following information to answer any ragu related questions.

The Retrieval amount parameter will determine how many chunks will be retrieved when you prompt the agent. For now, you can set this to 1 since our collection is very small and contains only a single chunk.

Click on the Assign collection button at the bottom of the page.

Switch back to user mode and see if Radical Ragu is able to answer your question.

Congratulations! You just created your first agent in Ragu and enriched them with knowledge!

Next steps

Try uploading some documents to Ragu.
Try following the same steps as above, but this time using the documents you uploaded. Keep in mind, agents are intended to communicate with your end users, so you should be mindful of what you put in their collections.

This was a very simple example with a very small document. Chances are your documents will be much larger than a single paragraph. Next up you'll learn how to use Ragu with larger and more complex documents.

Creating a knowledge base

You already know how to create an agent, now it's time to create a knowledge base. In order to do that though, we need to go over some simple concepts on how the agent interacts with collections.

Reading the LLM primer will definitely help you in the following steps.

As you already know, a collection contains documents. What you might not know is that those documents are chunked beforehand. Yes, even the small example from the introduction was chunked, albeit only one chunk was produced (the document itself) so you might not have noticed it.

Chunking a document is a way of breaking it up into smaller pieces so that they can be fed to an LLM. Pasting a 400 page PDF in a prompt is just not going to work. The prompt will be too long and the LLM will not be able to process it, hence we need to chunk.

Remember, the quality of an LLM's response is directly related to the quality of its input. For that reason, it's important to:

Have a good selection of documents.

If your agent is fed mumbo jumbo, you shouldn't be surprised it spits out mumbo jumbo. Unless you are creating an agent for entertainment purposes, (which is always fun), descriptive and clear-cut documents are advised.
Ensure those documents are chunked in a manner that preserves their context.

Due to the varying nature of documents, this is the tricky bit. Not all chunks have to be perfect, but they should generally be descriptive and retain the semantics of the original document.

The first step is really up to you and your choice of documents. For the second step, Ragu has a user interface where you can play around with document chunks in fast iteration.

For the remainder of this chapter, we'll be learning how to upload and chunk documents. You already know how to assign documents to collections, but you'll also learn what exactly happens during this assignment.

Uploading documents

On the admin page, in the sidebar you will see the Documents section. Clicking on it will take you to a page where you see all documents available, as well as a form to upload new ones. Clicking on the Upload button will open up a small window where you can drag and drop your files or use the built-in file picker to upload them. After selecting and uploading your document, you will be redirected to the document's page where you can configure how it will be parsed and chunked.

Processing documents

On the document's page you will see two sections; Parsing and chunking.

Parsing a document

A document's parsing configuration will determine which parts of the document you want to include in whichever collection you are putting it in.

For example, if you are uploading a PDF, you might want to skip the first or last few pages. Usually PDF documents have a cover page and a table of contents as their starting pages, but that information is not of particular use for agents, so you can usually skip it. You can configure all of this in the parsing configuration.

The parsing parameters are as follows:

start - Determines the number of pages to skip at the start of the document. For example, a value of 5 skips the first 5 pages of a document.
end - Determines the number of pages to skip from the end of the document. For example, a value of 5 skips the last 5 pages of a document.
range - If selected, instead of skipping start or end pages, it will select a range of pages to include. For example, if start is 3 and end is 5, it will include pages 3, 4 and 5.
filter - A list of regular expressions used to exclude certain parts of the document.

Chunking a document

Once you've decided which parts of the document you want to include in your collections, you can now configure how those parts will be chunked. This process involves a lot of trial and error. Due to the varying nature of documents, there is no magic configuration that will fit all of them, so you're going to have to play around a bit until you get the chunks you want.

The chunkers available are described in detail in the chunkers chapter, so we're only going to provide a quick overview here.

Sliding window - The most straightforward way of chunking a document, but produces the least quality chunks. You select a base size, i.e. how many characters will be in each chunk, and an overlap that each chunk has in regard to its previous and subsequent one. This one is useful for when whole documents can fit into one chunk and should rarely be used otherwise.
Snapping window - Works on the same principle as sliding window, except it's aware of sentence stops. This chunker, along with the semantic window, produces the best results for textual documents because it's aware of sentence boundaries and will not produce chunks that start or end in the middle of sentences.
Semantic window - Similar to snapping window, but groups chunks based on the semantics of the text. In other words, more similar chunks will be grouped together. It's worth noting that this chunker is embedding based, meaning that if you use a third party embedding service (such as OpenAI), it will spend tokens during previews and actual chunking.

Play around with different types of chunking configurations until you find one that suits the document in question. Once you're satisfied with the results, click on the Save button in the respective configuration sections to save the configurations.

Both the parsing and chunking configuration are applied whenever you add the document to any collection, so you only have to configure them once.

Assigning documents to collections

If you've followed the Creating your first agent section of the introduction, you've already done this step.

Once you're satisfied with a document's resulting chunks and have saved its configuration, you can assign that document to any collection you want. Whenever you open a collection's page, you will see a list of all documents assigned to it, as well as a menu where you can add new documents to it.

When you add a document to a collection, what you're really adding to the collection are its chunks. These chunks are then retrieved when you users converse with the agent and are used to enrich the agent's context. That's why it's important to have good chunks. If the chunks do not retain information clarity, then the agent will simply not have the necessary context to answer questions in a useful manner.

There are a few important things to remember when assigning documents to collections:

Once a document has been added to a collection, any changes to its configuration will not influence the existing chunks. If you want to update existing chunks, you will need to remove the document from the collection and re-add it.
If you delete a document from Ragu, its chunks will be removed from all collections.

Next you'll learn what an agent is and all its various settings you can adjust.

Core concepts

This chapter will guide you through the core concepts of Ragu.

Users

Users are the people who interact with Ragu. Users come with roles attached, which determine the level of access they have to the platform.

Ragu administrators have the highest level of access, and are the only users who can access the back office. Administrators are responsible for setting up the various different components of the platform, such as workflows, agents and knowledge bases.

Ragu users are the people who can utilise the platform to assists them in the various workflows administrators set up.

All Ragu users belong to specific groups. A user's groups dictate what agents or collections they can access.

Workflow

Workflows can be thought of as any process that can be quantized into steps.

For example, the creation and finalization of travel orders, submitting JIRA hours, inventory management, can all be thought of as workflows that have clear indications of when they start, what steps are required to perform them, and when they end.

A chat can also be thought of as a workflow. Unlike structured workflows tailored for specific tasks, chats do not have a clear indication of when they end. This makes them useful for processes such as onboarding employees or customer support. With chats users can finish these processes in a conversational manner instead of scraping through company documentation.

Agents

Every workflow consists of agents. Agents are large language models (LLMs) that have a specific context associated with them. An agent's context instructs its LLM on how it should behave and what it should expect from users. If you've ever customized ChatGPT before (told it your interests, what is should call you), you have worked with contexts before.

An agent can also have tools associated with them. Tools allow agents to integrate with other systems. They can transform agents from regular chat bots into problem solving powerhouses.

Any workflow can have an arbitrary number of agents associated with it, however in this guide we will focus on the simplest workflow - chats. A chat workflow has a single agent whose context is enriched with a knowledge base.

Knowledge base

A knowledge base consists of one or several collections that contain documents.

Both collections and documents are stand-alone entities that are managed independently of agents.

By having standalone knowledge bases, Ragu administrators have the ability to assign them to agents as they see fit with the click of a button.

Collections

Collections, as the name implies, are collections of documents that are available to an agent when it's asked to perform some task.

Whenever you assign a collection to an agent, you can instruct the agent on how to use the data it obtains from it at conversation time.

Documents

Documents are the basic building blocks of a collection, and in turn, knowledge bases. Currently, Ragu supports most textual document types.

Agents

Agents are the main way for your users to interact with knowledge bases in Ragu. They are essentially LLMs with some additional plumbing to enrich their context whenever you prompt them.

Metadata

Each agent has a name, description, and language. None of these settings influence the agent's behavior and exist solely for display purposes. They are intended to hint to users what an agent's purpose is, as well as the expected language of the agent.

Model

Each agent has a model. This is the LLM that will be used by the agent to answer questions. An LLM can receive general instructions on how to behave. We call these general instructions the agent's context.

The context is always sent as a system message to the LLM. This is a special kind of message the LLM will treat differently from chat messages and will use it to guide its behavior.

The context is written in second tense, as though you are referring directly to the agent. The bulk of an agent's behaviour is defined by its context.

The temperature of an agent's LLM is a value between 0 and 1 that controls the "creativity" of the agent's output. The higher the value, the more unhinged the agent will be. You usually want to keep this at a low value since high temperatures can make the agent start to make things up, or in other words, hallucinate.

Instructions

If you created your first agent you might have seen some instruction settings that we overlooked for the sake of brevity. These instructions are intented for the agent's underlying LLM and they instruct it on how to behave in specific situations.

All instructions have defaults. There is no need to set them explicitly unless you wish to fully customize the agent's behaviour.

The available instructions are as follows:

Prompt instruction Gives the agent additional directives for answering prompts. Sent alongside every prompt as part of the user message and instructs the LLM on what to do with the prompt.
Language instruction Gives the agent additional directives for answering prompts in a specific language. Sent alongside the context in the system message.
Title instruction Gives the agent additional directives for generating titles. Whenever a conversation is started, the initial user prompt will be used to generate a title for it. This instruction can be used to customise the way the title is generated.
Summary instruction Gives the agent additional directives for generating summaries. Whenever a conversation becomes too large, the agent will use this instruction to generate a summary of it. The summary will be used to replace the current conversation history. The summary will inevitably omit some information, but will keep the context size under control.

Message evaluation

Every message sent by an agent can be evaluated. You can use these evaluations to reason about an agent's performance.

Documents

Documents are the building blocks of knowledge bases. Anything that adds value and enriches an LLM's context can be thought of as a document.

Why chunking is important

If a document is small enough, the whole of it can be used to enrich an LLM's context. However, if a document contains more than a few pages it is usually a good idea, and often necessary, to break it down into smaller chunks.

Imagine if someone were to ask you a question about dandelions. Do you think you would you be able to answer the question faster and more correctly if you had a whole book about horticulture, or just 5 excerpts from it specifically related to dandelions?

This is essentially how context enrichment works with LLMs and it is exactly why big documents need to be chunked. When documents are chunked, the LLM can retrieve only the most relevant chunks to its prompt instead of the whole document.

This is one of the reasons why it's important to chunk your documents, the other being the limited context window of the LLM. This is fancy talk for the fact that LLMs can only process a limited amount of words (more specifically tokens) at a time. If the LLM can only handle 100 words at a time, then giving it a 200 word document will make it cut off the first 100 words and the context will be lost.

Parser

Documents come in many different shapes and sizes. A parser is a tool used to transform various different document types to textual formats that are usable by LLMs.

Before we can actually start using a document, we must parse it. It is in this process that we specify which parts of the document will be added to the knowledge base.

A generic parser for any document type looks like the following (note that elements are document specific, e.g. pages in a PDF, paragraphs in DOCX, etc.):

Parameter	Description
`start`	The number of elements to skip at the start of the document.
`end`	The number of elements to skip from the end of the document.
`range`	If selected, instead of skipping `start` or `end` pages, it will select a range of pages to include. The range is always inclusive.
`filter`	A list of regular expressions used to exclude undesirable parts of the document, such as signatures and page numbers.

Additionally, parsers for specific file types are also available.

Chunkers

Ragu offers a variety of chunkers for splitting up larger documents into smaller chunks LLMs can handle.

The quality of the generated chunks is important. Imagine those 5 dandelion excerpts all started and ended in the middles of sentences. It would be a bit difficult to reason about the context from which the chunks were made.

Ragu chunkers are designed to preserve the most context while still being fast enough to enable easy prototyping.

Sliding window

The most basic of chunkers. Usually good for adding whole documents to collections as a single chunk if they are small enough. Usage is not recommended for most documents.

Parameter	Description
`size`	The number of characters to fit in each chunk.
`overlap`	The number of characters to overlap between chunks. For each chunk, an `overlap` amount of characters will be prepended and appended from the previous and next chunk, respectively.

Snapping window

Similar to sliding window, but aware of sentence boundaries. Very useful for prose and documentation. This is the default chunker used for newly uploaded documents.

Parameter	Description
`size`	The number of characters to fit in each chunk.
`overlap`	The number of sentences to overlap between chunks.
`delimiter`	The sentence delimiter (sentence stop) to use. Usually you want to keep this set to fullstop (`.`).
`skip_forward`	A list of patterns that make the chunker skip sentence stops if a pattern is trailing the delimiter. Useful for things like abbreviations, or when you don't want to treat particular delimiters as sentence stops.
`skip_back`	A list of patterns that make the chunker skip sentence stops if a pattern is leading the delimiter. Useful for things like abbreviations, or when you don't want to treat particular delimiters as sentence stops.

Semantic Window

Similar to snapping window, but groups chunks based on the semantics of the text. This chunker is also aware of sentence boundaries. It is important to note that when using this chunker, a text embedding model will be used to generate embeddings for each chunk and will spend tokens.

The chunker will first chunk the whole text in a similar fashion to the snapping window. Then the chunks are embedded using a text embedding model and the distance between each is calculated (using an arbitrary distance function). If the distance between two chunks is less than the threshold, they are grouped together.

Parameter	Description
`size`	The number of sentences to fit in each chunk.
`threshold`	The similarity threshold to use when grouping chunks. This is a number from 0 to 1. The larger the threshold, the more similar the chunks will have to be in order to be grouped into one.
`distance function`	The distance function to use when calculating the distance between chunks. Cosine distance is the default.
`delimiter`	The same as snapping window. Used for the initial chunking, before the similarity is calculated.
`skip_forward`	The same as snapping window. Used for the initial chunking, before the similarity is calculated.
`skip_back`	The same as snapping window. Used for the initial chunking, before the similarity is calculated.

Technical

Developer documentation.

Collections

Collection schemas are shared between Ragu's applications.

Chonkit is responsible for creating collections, while Kappi is responsible for assigning them to agents. Chonkit has no concept of agents, it is solely a document and collection management system.

Access control

Each collection can be assigned a list of groups. If the list is empty or does not exist on a collection, it is considered accessible by all Ragu users. If it does contain entries in the group list, then the access is restricted only to users who are members of those groups. A user has to be a member of any one group in the collection in order to access it.

Identity vector

Each collection in a vector database created by chonkit will contain a metadata vector, also known as an identity vector. This vector contains metadata about the collection; Its name, embedding model, the vector database implementation identifier (vector provider), the embedding implementation identifier (embedding provider) and an optional list of groups.

The following is a JSON schema representing the above description.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "collection_id": {
      "type": "string",
      "description": "The UUID of the collection. Relevant to Chonkit."
    },
    "name": {
      "type": "string",
      "pattern": "^[A-Z]{1}[a-zA-Z0-9_]*$",
      "minLength": 1,
      "description": "Collection name. Cannot contain special characters. Must begin with a capital ASCII letter and contain only alphanumeric characters and underscores."
    },
    "model": {
      "type": "string",
      "description": "Collection embedding model."
    },
    "vectorProvider": {
      "type": "string",
      "description": "Vector database provider."
    },
    "embeddingProvider": {
      "type": "string",
      "description": "Embeddings provider."
    },
    "groups": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "minItems": 1,
      "description": "Optional collection groups that indicate which user groups can use it. If this is not defined, the collection is visible to everyone."
    }
  },
  "required": [
    "collection_id",
    "name",
    "model",
    "vectorProvider",
    "embeddingProvider"
  ],
  "additionalProperties": false
}

It is important for all Ragu applications to strictly follow this schema so the created collections are compatible. It is up to the applications to interpret these parameters as they see fit.

Each vector stored in the collection will have an associated payload, depending on the vector payload type (i.e. whether it is text, an image, etc.).

Vector payloads

It is important to note that since we use the concept of an identity vector, each subsequent vector inserted to the collection will also contain these properties. Therefore, it is important to pass property selectors when querying, so that only the anticipated properties are returned.

Text vector payload

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "content": {
      "type": "string",
      "description": "The original text of the embedding (vector)."
    }
    "document_id": {
      "type": "string",
      "description": "The UUID of the document. Relevant to Chonkit."
    }
  }
  "required": [
    "content",
    "document_id"
  ]
}