Augmenting ChatGPT with Your Data Can Improve Its Performance

I discuss how ChatGPT can be augmented with custom data to improve its performance using Singapore law as an example.

My DALL-E prompting still needs improving.

🔷 Update (31 May 2023): (1) Added a reference to Intellex's Scott; (2) Streamlit now embeds your apps, so there's a convenient way to access the Compare app.

#ChatGPT is a language model developed by OpenAI, with 1.5 billion parameters. It is capable of generating high-quality text in response to prompts and has been used for a variety of natural language processing tasks. It was trained on a large corpus of text from the internet and has achieved state-of-the-art performance on a number of benchmark datasets. Its versatility and accuracy make it a powerful tool for a wide range of applications.

It’s made me excited, and also many others in the legal profession as well.

One of the biggest questions I had was how it would perform on the subject of Singapore law. Law is a specialised area, and Singapore law is an even smaller niche in that. I suspected that statistically, ChatGPT wouldn't know much about it. I wouldn’t be holding my breath.

Could we help ChatGPT? One of the ways we could egg it along is to use the dark arts of prompt engineering. This is not about asking ChatGPT nicely. Instead, we’d tell ChatGPT and GPT-3 what it needs to answer the question using a simple series of prompts.

chat_plus = ChatOpenAI(temperature=0, openai_api_key=os.getenv('OPENAI_KEY'))

messages = [
	SystemMessage(content="You are a helpful assistant."
						"Use the following pieces of context to answer the question at the end."
                        "If the context does not help, don't use them."
						" Don't try to make up an answer")
]

for context in custom_sources[index]:
	messages.append(SystemMessage(content=f"Context: \n {context}")

messages.append(HumanMessage(content=f"Question: \n {question}"))

chat_plus(messages)

The original inspiration for this pipeline came from langchain. (It is the basis of the Custom or GPT-3+ model)

Thus, we look into our library of Singapore law information to find the four most relevant pieces of information (”context”) to the question asked and feed them to ChatGPT. After that, we ask the question we would like to answer. ChatGPT, having regard to the context, generates an answer.

To make a quick experiment, I used the overview of Singapore law section provided by the Singapore Academy of Law. It is focused on commercial law so it’s not likely to turn up something on divorces or criminal law, but it is acceptable for my experiment.

Curious Questions

Here are a few questions I would like to clarify in my experiment:

(GPT-3 and ChatGPT are both language models developed by OpenAI, but they differ in their size and purpose. GPT-3 is a much larger model with 175 billion parameters, while ChatGPT has 1.5 billion parameters. GPT-3 is designed to be a powerful general-purpose language model that can perform a wide range of natural language processing tasks, while ChatGPT is specifically designed for conversational AI and chatbot applications. ChatGPT is optimized for generating high-quality text in response to prompts, and is trained on a large corpus of conversational data from the internet. This focus on chatbot applications makes ChatGPT well-suited for tasks such as customer service, support, and chat-based interfaces.)

This experiment is important because I would like to know whether collecting small amounts of data and leveraging large language models like GPT-3 and ChatGPT is useful. The alternatives of collecting a huge amount of data and training a new large language model might not be feasible or practical for a small subject like Singapore law.

On the other hand, if some data and magical prompt engineering can produce results which are good enough, then this is an exciting opportunity to make a low-cost solution which can increase access to justice here. Such as 🥁🥁🥁… a chatbot. 😞

Findings

After running a series of questions through OpenAI’s ChatGPT and GPT-3, I knocked out a simple Streamlit app to display the results in a way which is easy to compare.

Unsurprisingly, if your data has information that answers a question directly, it can “augment” the answer appropriately.

Here both “augmented” models were able to cite the part of the Banking Act in Singapore, which ChatGPT on its own, wouldn’t.

If your data doesn’t have something which answers a question directly, it may come up with strange results.

In this section, the augmented GPT-3 model suggested that you take out a Mareva injunction against your neighbour. 😂 ChatGPT+’s answer looks more reasonable. No model referred to the Protection from Harassment Courts in Singapore.

Surprisingly, ChatGPT+ and GPT-3+ models have very similar answers. The easy explanation was that they were both fed the same data. On the other hand, given that GPT-3 had more parameters and so on, I expected more differences.

Since ChatGPT’s API cost is lower than GPT-3 (da vinci), we should consider using ChatGPT’s API more often.

Using your own data to augment a model in this manner can limit the model, although I am not sure why or how to resolve it (RHLF?).

When I asked for a letter, I should get a letter. Instead, GPT-3 gives me a lecture on how to bankrupt your tenant (it might have been influenced by the information on bankruptcies). ChatGPT+ slyly gives me something, but I was expecting the original ChatGPT’s answer. Not sure how moderation kicked in for 1 ChatGPT answer and not the other too.

☝🏼You can see the rest of the questions I experimented here in my streamlit app, as well as the answers generated by the models. You can see and compare more “edge” cases and interesting output there.

Conclusion

The primary contention is definitely proven — you can improve the results of ChatGPT or GPT-3 with your own data. The process to transform such data into vectors for search is definitely cheaper than trying to train and host your own model. This doesn’t make the answers wholly reliable — the model is still affected by what’s in your dataset as well as the gaps. Some of this can be helped with more data (the data used is a high-level summary of commercial law in Singapore), or maybe a bit more engineering (moderation pipeline?). Setting user expectations might also be important. (You can ask me questions on my niche topic but don’t ask me anything else) In any case, experimenting and getting feedback is probably going to be very important.

🤔A beta app from a LegalTech company in Singapore touches many aspects of what I am thinking about here: Intellex's Scott. Give it a go!

What are you building today?

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu