Ready to mine free online legal materials in Singapore? Not so fast!

I have been mulling over developing an extensive online database of free legal materials in the flavour of OpenLawNZ or an LII for the longest time. Free access to such materials is one problem to solve, but I'm also hoping to compile a dataset to develop AI solutions. I have tried and demonstrated this with PDPC's data previously, and I am itching to expand the project sustainably.

However, being a lawyer, I am concerned about the legal implications of scraping government websites. Would using these materials be a breach of copyright law? In other countries, people accept that the public should generally be allowed to use such public materials. However, I am not very sure of this here.

The text steps highlightedPhoto by Clayton Robbins / Unsplash

I was thus genuinely excited about the amendments to the Copyright Act in Singapore this year. According to the press release, they will be operational in November, so they will be here soon.

Copyright Bill – Singapore Statutes OnlineSingapore Statutes Online is provided by the Legislation Division of the Singapore Attorney-General’s ChambersSingapore Statutes OnlineThe Copyright Bill is expected to be operationalised in November 2021.

[ Update 21 November 2021: The bill has, for the most part, been operationalised.]

Two amendments are particularly relevant in my context:

Using publicly disclosed materials from the government is allowed

In sections 280 to 282 of the Bill, it is now OK to copy or communicate public materials to facilitate more convenient viewing or hearing of the material. It should be noted that this is limited to copying and communicating it. Presumably, this means that I can share the materials I collected on my website as a collection.

Computational data analysis is allowed.

The amendments expressly say that using a computer to extract data from a work is now permitted. This is great! At some level, the extraction of the material is to perform some analysis or computation on it — searching or summarising a decision etc. I think some limits are reasonable, such as not communicating the material itself or using it for any other purpose.

However, one condition stands out for me — I need “lawful access” to the material in the first place. The first illustration to explain this is circumventing paywalls, which isn’t directly relevant to me. The second illustration explains that obtaining the materials through a breach of the terms of use of a database is not “lawful access”.

That’s a bit iffy. As you will see in the section surveying terms, a website’s terms are not always clear about whether access is lawful or not. The “terms of use” of a website are usually given very little thought by its developers or implemented in a maximal way that is at once off-putting and misleading. Does trying to beat a captcha mean I did not get lawful access? Sure, it’s a barrier to thwart robots, but what does it mean? If a human helps a robot, would it still be lawful?

A recent journal article points to “fair use” as the way forward

I was amazed to find an article in the SAL Journal titled “Copying Right in Copyright Law” by Prof David Tan and Mr Thomas Lee, which focused on the issue that was bothering me. The article focuses on data mining and predictive analytics, and it substantially concerns robots and scrapers.

Singapore Academy of Law Journale-First MenuLink to the journal article on E-First at SAL Journals Online.

On the new exception for computational data analysis, the article argues that the two illustrations I mentioned earlier were “inadequate and there is significant ambiguity of what lawful access means in many situations”. Furthermore, because the illustrations were not illuminating, it might create a situation where justified uses are prohibited. With much sadness, I agree.

More interestingly, based on some mathematics and a survey, the authors argue that an open-ended general fair use defence for data mining is the best way forward. As opposed to a rule-based exception, such a defence can adapt to changes better. Stakeholders (including owners) also prefer it because it appeals to their understanding of the economic basis of data mining.

You can quibble with the survey methodology and the mathematics (which I think is very brave for a law journal article). I guess it served its purpose in showing the opinion of stakeholders in the law and the cost analysis very well. I don’t suspect it will be cited in a court judgement soon, but hopefully, it sways someone influential.

We could use a more developer-friendly approach.

Photo by Mimi Thian / Unsplash

There was a time when web scraping was dangerous for a website. In those times, websites can be inundated with requests by automated robots, leading them to crash. Since then, web infrastructure has improved, and techniques to defeat malicious actors have been developed. The great days of “slashdotting” a website has not been heard of for a while. We’ve mostly migrated to more resilient infrastructure, and any serious website on the internet understands the value of having such infrastructure.

In any case, it is possible to scrape responsibly. Scrapy, for example, allows you to queue requests regularly or identify yourself as a robot or scraper, respecting robots.txt. If I agreed not to degrade a website’s performance, which seems quite reasonable, shouldn’t I be allowed to use it?

Being more developer-friendly would also help government agencies find more uses for their works. For now, most legal resources appear to cater exclusively for lawyers. Lawyers will, of course, find them most valuable because it’s part of their job. However, others may also need such resources because they can’t afford lawyers or have a different perspective on how information can be helpful. It’s not easy catering to a broader or other audience. If a government agency doesn’t have the resources to make something more useful, shouldn’t someone else have a go? Everyone benefits.

Surveying the terms of use of government websites

RTK survey in quarryPhoto by Valeria Fursa / Unsplash

Since “lawful access” and, by extension, “terms of use” of a website will be important in considering the computational data analysis exceptions, I decided to survey the terms of use of various government agencies. After locating their treatment of the intellectual property rights of their materials, I gauge my appetite to extract them.

In all, I identified three broad categories of terms.

Totally Progressive: Singapore Statutes Online 👍👍👍

Source: https://sso.agc.gov.sg/Help/FAQ#FAQ_8 (Accessed 20 October 2021)

Things I like:

Things I don’t like:

Comments:

Totally Bonkers: Personal Data Protection Commission 😖😖😖

Source: https://www.pdpc.gov.sg/Terms-and-Conditions (Accessed 20 October 2021)

Things I like:

Things I don’t like:

Comments:

Totally Clueless: Strata Titles Board 🎈🎈🎈

Materials, including source code, pages, documents and online graphics, audio and video in The Website are protected by law. The intellectual property rights in the materials is owned by or licensed to us. All rights reserved. (Government of Singapore © 2006).
Apart from any fair dealings for the purposes of private study, research, criticism or review, as permitted in law, no part of The Website may be reproduced or reused for any commercial purposes whatsoever without our prior written permission.

Source: https://www.stratatb.gov.sg/terms-of-use.html# (Accessed 20 October 2021)

Things I like:

Things I don’t like:

Comments:

Conclusion

One might be surprised to find that terms of using a website, even when supposedly managed by lawyers, feature unclear, problematic, misleading, and unreasonable terms. As I mentioned, very little thought goes into drafting such terms most of the time. However, they provide obstacles to others who may want to explore new uses of a website or resource. Hopefully, more owners will proactively clean up their sites once the new Copyright Act becomes effective. In the meantime, this area provides lots of risks for a developer.

#Law #tech #Copyright #DataScience #Government #WebScraping #scrapy #Singapore #PersonalDataProtectionCommission #StrataTitlesBoard #DataMining

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu