What's intelligence?

Watching large language models spew words at each other hasn't improved my understanding of them.

image

Since I wrote that post on the court hearing stimulator for Prompt Engineering for Lawyers, I've been obsessed with how I would implement it.

As a lesson on prompt engineering, I aimed to show you can do with one #ChatGPT prompt or chat. To be fair, it got very far. ChatGPT was able to argue with itself and the hearing reached its termination point fairly often.

However I won't implement a court hearing simulator with ChatGPT this way. If I could create various chats using the API, I would want to tweak each one to my favourite settings and watch them run. There are also other features such as recording results which would have been out of scope in the original experiment.

That's what I did with my casually named “AI Lawyers Battle Royale” project. I'm still working on it, so I only have a GitHub link. However it's already boasting a cleaner interface and scope for me to use more fact patterns and scenarios. It also has an autopilot mode, so if you are not fond of arguing, you can get ChatGPT to do everything for you. Do give it a shot if you know how to run #streamlit apps and have an #OpenAI key.

Eventually I am hoping to emulate an experiment like the authors in “Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback”. https://arxiv.org/abs/2305.10142

To summarise the paper, the authors managed to get various language models to negotiate on what price to buy a balloon and to learn from itself how to improve its position. (Spoiler: only some models can do this and returns diminish really fast)

Let's see how some of these models do on arguing legal applications, a truly word generating task. For now, the AI is able to come up with a pretty cogent heating.

I'll experiment some more and tell you about what I find if it's interesting.

For now I am getting a bit sceptical whether AI was persuaded by one argument or the other, or if this is all a statistical mumbo jumbo.

I was alerted to this possibility when I asked one litigant to write in the voice of a ten year old while the other was normal. Besides my impression that there seem to be limits in directing ChatGPT how to argue, I found that ChatGPT judge selected the ten year old to win in a few rounds.

This means that either (a) ChatGPT has no bias in deciding who wins or (b) ChatGPT doesn't care who wins. Of course it is also possible that ChatGPT was able to reach a decision based on the arguments.

Hopefully with more data, I can reach some conclusions. For example, if it is true that decisions are random, we would see the same distribution for difficult hearings and straightforward hearings. If ChatGPT can refer to arguments, we would expect that capable lawyers (assisted by a coach, perhaps) will outperform those with less ability.

I would also like to try some open source models too for comparison and experience.

Let the battles begin!

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu