I love playing with legal data. For me, books specialising in legal data are uncommon, especially those dealing with what’s available on the wild world of the internet today.
That’s why I snapped up Sarah Sutherland’s "Legal Data and Information in Practice". Ms Sutherland was CEO of CanLII, one of the most admirable LIIs. CanLII is extensive, comprehensive, and packed with great features like noting up and keywords. It even comes in two languages.
The book’s blurb recommends that it is “essential reading for those in the law library community who are based in English-speaking countries with a common law tradition”.
Since finishing the book, I found the blurb’s focus way too narrow. This is a book for anyone who loves legal data.
For one, I enjoyed the approachable language. My interaction with legal data has always been pragmatic. Either I was studying for some course, or I needed to find an answer quickly. It will be enough to appreciate the book if you’ve done any of those things. I liked that it didn’t baffle me with impossible or theoretical language. I found myself nodding at several junctures as I reflected on my experience of interacting with legal data as well.
Furthermore, it’s effectively a primer:
- It’s short. I took a month to finish it at a leisurely place (i.e., in between taking care of children, making sure the legal department runs smoothly, and programming). Oh, and unlike most law books, it has pictures.
- It effectively explains a broad range of topics. It talks about the challenges of AI and the political and administrative backgrounds of how legal data is provided without overwhelming you. More impressively, I found new areas in this field that I didn’t know about before reading the book, such as the various strategies to acquire legal data and an overview of statistical and machine learning techniques on data.
So, even if you are not a librarian or a legal technologist by profession, this book is still handy for you. I would love more depth, and maybe that’s some scope for a 2nd edition. In any case, Sarah Sutherland’s "Legal Data and Information in Practice” is a great starting point for everyone. Reading it will level up your ability to discuss and evaluate what’s going on in this exciting field.
I am sorry for being a sucker — I am the kind of guy who watches movies to swoon at sweeping visages of my home jurisdiction, Singapore. I enjoyed Crazy Rich Asians, even though it’s fake.
So, I couldn’t resist looking for references to Singapore in the book. Luckily for me, Singapore is mentioned several times in the book. It’s described as “an interesting example of what can happen if a government is willing to invest heavily in developing capacity in legal computing and data use”. I’m not convinced that LawNet is like an LII, but among other points raised, such as the infrastructure, availability and formats are still much better here than in the rest of the common law world.
The more interesting point is that Singapore, as a small jurisdiction, would usually find its dataset smaller. That’s why experimenting on making models trained on other kinds of data effective on yours is crucial. (I think the paper cited in the book is an excellent example of this.) Other facets are relevant when you have fewer data and resources: what kinds of legal data should one focus on and the strategies to acquire them.
The challenges of a smaller dataset seem to be less exciting because fewer people are staring at them. However, I would suggest that these challenges are more prevalent than you would expect — companies and organisations also have smaller datasets and fewer resources. What would work for Singapore should be of interest to many others.
There’s always something to be excited about in this field. What do you think?