Love.Law.Robots. by Ang Hou Fu

MachineLearning

Houfu dreams of having his #MachineLearning tool for years but will still have to wait longer.

Screenshot from Prodigy showing a/b testing of prompts

Read more...

Feature image

In 2021, I discovered something exciting — an application of machine learning that was both mind-blowing and practical.

The premise was simple. Type a description of the code you want in your editor, and GitHub Copilot will generate the code. It was terrific, and many people, including myself, were excited to use it.

🚀 I just got access to @github Copilot and it's super amazing!!! This is going to save me so much time!! Check out the short video below! #GitHubCopilot I think I'll spend more time writing function descriptions now than the code itself :D pic.twitter.com/HKXJVtGffm

— abhishek (@abhi1thakur) June 30, 2021

The idea that you can prompt a machine to generate code for you is obviously interesting for contract lawyers. I believe we are getting closer every day. I am waiting for my early access to Spellbook.

As a poorly trained and very busy programmer, it feels like I am a target of Github Copilot. The costs was also not so ridiculous. (Spellbook Legal costs $89 a month compared to Copilot's $10 a month) Even so, I haven't tried it for over a year. I wasn’t comfortable enough with the idea and I wasn’t sure how to express it.

Now I can. I recently came across a website proposing to investigate Github Copilot. The main author is Matthew Butterick. He’s the author of Typography for Lawyers and this site proudly uses the Equity typeface.

GitHub Copilot investigation · Joseph Saveri Law Firm & Matthew ButterickGitHub Copilot investigation

In short, the training of GitHub Copilot on open source repositories it hosts probably raises questions on whether such use complies with its copyright licenses. Is it fair use to use publicly accessible code for computational analysis? You might recall that Singapore recently passed an amendment to the Copyright Act providing an exception for computational data analysis. If GitHub Copilot is right that it is fair use, any code anywhere is game to be consumed by the learning machine.

Of course, the idea that it might be illegal hasn’t exactly stopped me from trying.

The key objection to GitHub Copilot is that it is not open source. By packaging the world’s open-source code in an AI model, and spitting it out to its user with no context, a user only interacts with Github Copilot. It is, in essence, a coding walled garden.

Copi­lot intro­duces what we might call a more self­ish inter­face to open-source soft­ware: just give me what I want! With Copi­lot, open-source users never have to know who made their soft­ware. They never have to inter­act with a com­mu­nity. They never have to con­tribute.

For someone who wants to learn to code, this enticing idea is probably a double-edged sword. You could probably swim around using prompts with your AI pair programmer, but without any context, you are not learning much. If I wanted to know how something works, I would like to run it, read its code and interact with its community. I am a member of a group of people with shared goals, not someone who just wants to consume other people’s work.

Matthew Butterick might end up with enough material to sue Microsoft, and the legal issues raised will be interesting for the open-source community. For now, though, I am going to stick to programming the hard way.

#OpenSource #Programming #GitHubCopilot #DataMining #Copyright #MachineLearning #News #Newsletter #tech #TechnologyLaw

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu

Feature image

I love playing with legal data. For me, books specialising in legal data are uncommon, especially those dealing with what’s available on the wild world of the internet today.

That’s why I snapped up Sarah Sutherland’s “Legal Data and Information in Practice”. Ms Sutherland was CEO of CanLII, one of the most admirable LIIs. CanLII is extensive, comprehensive, and packed with great features like noting up and keywords. It even comes in two languages.

Legal Data and Information in Practice: How Data and the Law InteractLegal Data and Information in Practice provides readers with an understanding of how to facilitate the acquisition, management, and use of legal data in organizations such as libraries, courts, governments, universities, and start-ups.Presenting a synthesis of information about legal data that will…Routledge & CRC PressSarah A. Sutherland

The book’s blurb recommends that it is “ essential reading for those in the law library community who are based in English-speaking countries with a common law tradition ”.

Since finishing the book, I found the blurb’s focus way too narrow. This is a book for anyone who loves legal data.

For one, I enjoyed the approachable language. My interaction with legal data has always been pragmatic. Either I was studying for some course, or I needed to find an answer quickly. It will be enough to appreciate the book if you’ve done any of those things. I liked that it didn’t baffle me with impossible or theoretical language. I found myself nodding at several junctures as I reflected on my experience of interacting with legal data as well.

Furthermore, it’s effectively a primer:

  • It’s short. I took a month to finish it at a leisurely place (i.e., in between taking care of children, making sure the legal department runs smoothly, and programming). Oh, and unlike most law books, it has pictures.
  • It effectively explains a broad range of topics. It talks about the challenges of AI and the political and administrative backgrounds of how legal data is provided without overwhelming you. More impressively, I found new areas in this field that I didn’t know about before reading the book, such as the various strategies to acquire legal data and an overview of statistical and machine learning techniques on data.

So, even if you are not a librarian or a legal technologist by profession, this book is still handy for you. I would love more depth, and maybe that’s some scope for a 2nd edition. In any case, Sarah Sutherland’s “Legal Data and Information in Practice” is a great starting point for everyone. Reading it will level up your ability to discuss and evaluate what’s going on in this exciting field.

  • * *

I am sorry for being a sucker — I am the kind of guy who watches movies to swoon at sweeping visages of my home jurisdiction, Singapore. I enjoyed Crazy Rich Asians, even though it’s fake.

So, I couldn’t resist looking for references to Singapore in the book. Luckily for me, Singapore is mentioned several times in the book. It’s described as “an interesting example of what can happen if a government is willing to invest heavily in developing capacity in legal computing and data use”. I’m not convinced that LawNet is like an LII, but among other points raised, such as the infrastructure, availability and formats are still much better here than in the rest of the common law world.

The more interesting point is that Singapore, as a small jurisdiction, would usually find its dataset smaller. That’s why experimenting on making models trained on other kinds of data effective on yours is crucial. (I think the paper cited in the book is an excellent example of this.) Other facets are relevant when you have fewer data and resources: what kinds of legal data should one focus on and the strategies to acquire them.

The challenges of a smaller dataset seem to be less exciting because fewer people are staring at them. However, I would suggest that these challenges are more prevalent than you would expect — companies and organisations also have smaller datasets and fewer resources. What would work for Singapore should be of interest to many others.

There’s always something to be excited about in this field. What do you think?

#BookReview #ArtificalIntelligence #DataMining #Law #LegalTech #MachineLearning #NaturalLanguageProcessing #Singapore #TechnologyLaw

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu