Revisiting Retrieval Augmented Generation

A mouse looks up to a sky with a moon and stars

I’ve been kinda hot and cold on Retrieval Augmented Generation (RAG).

I rushed in and experimented early using an overview of Singapore law. After seeing other locals try to implement them, I scoffed at it. I dismissed it as “grab three relevant articles from my vector db and ask #ChatGPT to write an answer on it”. Now I am going to have a second go.

One of the problems I suggested with a simple implementation of RAG concerned the embeddings used to search:

The granularity of the embeddings made from your data store seems also to have a significant impact. An embedding that takes too wide or narrow a snapshot of your data might miss or jumble the point.

A straightforward way to be more flexible is to divorce the embeddings used for search and the context used for generating an answer.

Once you have more flexibility with the context, there are various ways to play with what is passed to the large language model:

Now that OpenAI models can accept 16,000 or more tokens, it may be useful to play with how the context is presented to the model.

Currently, I am thinking of an “Ask a Judgement” Chatbot. Court judgements presents an interesting kind of document: they are long, contain several sections, and are also interlinked in complex ways.

Although smooshing it down into a vector database is going to be interesting, I guess I also have to think about what a user might want to ask a judgement. It’s still worth a fine prototype! I am especially curious to see whether it’s able to resolve my qualms about how RAG does in legal documents.

Let’s keep building!

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu