Love.Law.Robots. by Ang Hou Fu

Decisions

Feature image

Regular readers might have noticed the disappearance of articles relating to the Personal Data Protection Commission’s decisions lately. However, as news of the “largest” data breach in Singapore came out, I decided to look into this area again.

My lack of interest paralleled the changing environment, which allowed me to keep up-to-date on them:

  1. The PDPC removed their RSS feed for the latest updates;
  2. I am not allowed to monitor their website manually; and
  3. The PDPC started issuing shorter summaries of their decisions, which makes their work more opaque and less interesting.

Looking at this area again, I wanted to see whether the insights I gleaned from my earlier data project might hold and what would still be relevant going forward.

Data Science with Judgement Data – My PDPC Decisions JourneyAn interesting experiment to apply what I learnt in Data Science to the area of law.Love.Law.Robots.Houfu

Something big struck, well, actually not much.

Photo by Francesca Saraco / Unsplash

The respondent in the case that had attracted media attention is Reddoorz, which operates a hotel booking platform in the budget hotel space. The cause of the breach is as sad as it is unremarkable — they had left the keys to their production database in the code of a disused but still available version of their mobile app. Using those keys, bad actors probably exfiltrated the data. This is yet another example of how lazy practices in developing apps can translate to real-world harm. They even missed the breach when they tried to perform some pen tests because it was old.

PDPC | Breach of the Protection Obligation by CommeasureBreach of the Protection Obligation by CommeasurePDPC LogoRead the PDPC’s enforcement decision here.

The data breach is the “largest” because it involved nearly 6 million customers. Given that the resident population in Singapore is roughly 5.5 million, this probably includes people from around our region.

The PDPC penalised the respondent with a $74,000 fine. This roughly works out to be about 1 cent per person. Even though this is the “largest” data breach handled under the PDPA, the PDPC did not use its full power to issue a penalty of up to $1 million. Under the latest amendments, which have yet to take effect, the potential might of the PDPC can be even greater than that.

The decision states that the PDPC took into account the COVID-19 situation and its impact on the hospitality industry in reducing the penalty amount. It would have been helpful to know how much this factor had reduced the penalty to have an accurate view of it.

In any case, this is consistent with several PDPC decisions. Using the PDPC’s website’s filters, only three decisions doled out more than $75,000 in penalties, and a further 4 doled out more than $50,000. This is among more than 100 decisions with a financial penalty. Even among the rare few cases, only 1 case exercised more than 25% of the current limit of the penalty. The following case only amounts to $120,000 (a high profile health-related case, too!).

The top of the financial penalty list (As of November 2021). Take note of the financial penalty filters at the bottom left corner.

This suggests that the penalties are, in practice, quite limited. What would it take for the PDPC to penalise an offender? Probably not the number of records breached. Maybe public disquiet?

In a world without data breaches

Throttle Roll - Swap Meat MarketPhoto by Parker Burchfield / Unsplash

While the media focuses on financial penalties, I am not a big fan of them.

While doling out “meaningful” penalties strikes a balance between compliance with the law and business interests, there are limits to this approach. As mentioned above, dealing with a risk of $5,000 fines may not be sufficient for a company to hire a team of specialists or even a professional Data Protection Officer. If a company’s best strategy is not to get caught for a penalty, this does not promote compliance with the law at all.

Unfortunately, we don’t live in a world without data breaches. The decisions, including those mentioned above, are filled with human errors. Waiting to get caught for such mistakes is not a responsible strategy. Luckily, the PDPA doesn’t require the organisation to provide bulletproof security measures, only reasonable ones. Then, the crux is figuring out what the PDPC thinks is enough to be reasonable.

So while all these data protection decisions and financial penalties are interesting in showing how others get it wrong, the real gem for the data protection professional in Singapore is finding someone who got it right.

And here’s the gem: Giordano. Now I am sorry I haven’t bought a shirt from them in decades.

There was a data breach, and the suspect was compromised credentials. However, the perpetrator did not get far:

  • The organisation deployed various endpoint solutions
  • The organisation implemented real-time system monitoring of web traffic abnormalities
  • Data was regularly and automatically backed up and encrypted anyway

Kudos to the IT and data protection team!

Compared to other “Not in Breach” decisions, this decision is the only one I know to directly link to one of the many guides made by the PDPC for organisations. “How to Guard Against Common Types of Data Breaches” makes a headline appearance in the Summary when introducing the reasonable measures that Giordano implemented.

The close reference to the guides signals that organisations following them can have a better chance of being in the “No Breach” category.

An approach that promotes best practices is arguably more beneficial to society than one that penalises others for making a mistake. Reasonable industry practices must include encrypting essential data and other recommendations from the PDPC. It would need leaders like Giordano, an otherwise ordinary clothing apparel store in many shopping malls, to make a difference.

A call from the undertaking

Photo by Nicola Fioravanti / Unsplash

The final case in this post isn’t found in the regular enforcement decisions section of the PDPC’s website — undertakings.

If you view a penalty as recognising a failure of data protection and no breach as an indicator of its success, the undertaking is that weird creature in between. It rewards organisations that have the data protection system for taking the initiative to settle with the PDPC early but recognises that there are still gaps in its implementation.

I was excited about undertakings and called them the “teeth of the accountability principle”. However, I haven’t found much substance in my excitement, and the parallel with US anti-corruption practices appears unfounded.

Between February 2021, when the undertaking procedure was given legislative force, and November 2021, 10 organisations spanning different industries went through this procedure. In the meantime, the PDPC delivered 21 decisions with a financial penalty, direction or warning. I reckon roughly 30% is a good indicator that organisations use this procedure when they can.

My beef is that very little information is provided on these undertakings, which appears even shorter than the summaries of enforcement decisions. With very little information, it isn’t clear why these organisations get undertakings rather than penalties.

Take the instant case in November as an example. Do they have superior data protection structures in their organisations? (The organisation didn’t have any and had to undertake to implement something.) Are they all Data Protection Trust Mark organisations? (Answer: No.) Are they minor breaches? (On the surface, I can’t tell. 2,771 users were affected in this case.)

My hunch is that (like the Guide to Active Enforcement says) these organisations voluntarily notified the PDPC with a remediation plan that the PDPC could accept. This is not as easy as it sounds, as you might probably engage lawyers and other professionals to navigate your way to that remediation plan.

With very little media attention and even a separate section away from the good and the ugly on the PDPC’s website, the undertaking is likely to be practically the best way for organisations to deal with the consequences of a data breach. Whether the balance goes too far in shielding organisations from them remains to be seen.

Conclusion

Having peeked back at this area, I am still not sure I like what I find. There was a time when there was excitement about data protection in Singapore, and becoming a professional was seen as a viable place to find employment. It would be fascinating to see how much this industry develops. If it does or it doesn’t, I believe that the actions and the approach of the PDPC to organisations with data breaches would be a fundamental cause.

Until there is information on how many data protection professionals there are in Singapore and what they are doing, I don’t think you will find many more articles in this area on this blog.

#Privacy #PersonalDataProtectionCommission #PersonalDataProtectionAct #Penalties #Undertakings #Benchmarking #DataBreach #DataProtectionOfficer #Enforcement #Law ##PDPAAmendment2020 #PDPC-Decisions #Singapore #Decisions

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu

This Features article features many articles which may require a free subscription to read. Become a subscriber today and get access to all the articles!

This Features article is a work in progress. If you have any feedback or suggestions, please feel free to contact me!

What's the Point of this List?

Photo by Cris Tagupa on Unsplash

Unlike other jurisdictions, Singapore does not have a legal information institute like AustLII or CanLII. Legal Information institutes, as defined in the Free Access to Law Movement Declaration:

  • Publish via the internet public legal information originating from more than one public body;
  • Provide free and anonymous public access to that information;
  • Do not impede others from obtaining public legal information from its sources and publishing it; and
  • Support the objectives set out in this Declaration.

We do have an entry on CommonLII, but the resources are not always up to date. Furthermore, the difference in features and usability are worlds apart. (If you wanted to know what AustLII looked like over ten years ago, look at CommonLII.)

This does not mean that free legal resources are non-existent in Singapore. It's just that they are scattered around the internet, with varying levels of availability, coverage and features. Oh, there's also no guarantee they will be around now or in the future.

Ready to mine free online legal materials in Singapore? Not so fast!Amendments to Copyright Act might support better access to free online legal materials in Singapore by robots. I survey government websites to find out how friendly they are to this.Love.Law.Robots.HoufuAmendments to the Copyright Act have cleared some air regarding mining, but questions remain.

This post tries to gather all the resources I have found and benchmark them. With some idea of how to extract them, you can plausibly start a project like OpenLawNZ. If you're interested in, say, data protection commission decisions and are toying with the idea of NLPing them, you know where to find the source. Even if you aren't ambitious, you can browse them and add them to your bookmarks. Maybe even archive them if you are so inclined.

Data Science with Judgement Data – My PDPC Decisions JourneyAn interesting experiment to apply what I learnt in Data Science to the area of law.Love.Law.Robots.HoufuIt might be surprising to some, but there's a wealth of material out there if you can find it!

Your comments are always welcome.

Options that aren't free or online

Photo by Iñaki del Olmo on Unsplash

The premier resource for research into Singapore law is LawNet. It offers a pay per use option, but it's not cheap (at minimum $57 for pay per use). There's one terminal available for LawNet at the LCK Library if you can travel to the National Library. I haven't used LawNet since I left practice several years ago. From following the news of its developments, it hasn't departed much from its core purpose and added several collections that can be very useful for practitioners.

Source: https://eresources.nlb.gov.sg/main/Browse?browseBy=type&filter=10&page=2 (accessed 22 October 2021)

There are also law libraries at the Supreme Court (Level 1) and State Courts (B1) if you're into physical things. There are reasonably good resources for its size, but if you were looking for something very specialized, you might be trying your luck here.

Supreme Court of Singapore

Photo by Vuitton Lim on Unsplash

As the apex court in Singapore, the resources available for free here are top-notch. The Supreme Court cover the entire gamut from the High Court, Court of Appeal, Singapore International Commercial Court and all other courts in between.

The Supreme Court has been steadily (and stealthily) expanding its judgements section. They now go back to 2000, and have basic search functionality and some tagging. Judgements only cover written judgements , which are “generally issued for more complex cases or where they involve questions of law which are of public interest”. In other words, High Courts prepare them for possible appeals, and the Court of Appeal prepares them for stare decisis. As such, they don't cover all the work that the courts here do. Relying on this to study the court's work (beyond the development of law) can be biased. There's no API access.

Hearing lists are available for the current week and the following week and then sorted by judges. You can download them in PDF. Besides information relating to when the hearing is fixed, you can see who the parties are and skeletal information on the purpose of the hearing. There's no API access.

Court records aren't available to the public online. Inspection of case files by the public requires permission, and fees apply.

New homes for judgements in the UK... and Singapore?I look at envy in the UK while exploring some confusing changes in the Singapore Supreme Court website.Love.Law.Robots.HoufuThe Supreme Court may be the apex court in Singapore, but its judgements reveal that there is a real mess in here.

State Courts

A rung lower than the Supreme Court, the State Courts generally deal with more down to earth civil and criminal matters. It long felt neglected in an older building (though interesting for an architecture geek), but they changed their name (from Subordinate Courts to State Courts) and moved to a spanking new nineteen storey building in the last few years. If you watch a lot of local television, this is the court where embarrassed respondents dash past the media scrum.

Unfortunately, judgements are harder to find at this level. The only free resource is a LawNet section that covers written judgements for the last three months.

Written judgements are prepared pretty much only when they will be appealed to the Supreme Court. This means that the judgements you can see there represent a relatively small and biased microcosm of work in the State Courts. In summary, appeals at this level are restricted by law. These represent significant barriers for civil cases where costs are an issue. Such restrictions are less pronounced in criminal cases. The Public Prosecutor appeals every case that does not meet its expectations. Accused appeals every case... well, because they might want to see the written judgment so that they can decide if they're going to appeal. This might explain why there are several more criminal cases available than civil matters. On the other hand, the accused or litigant who wants to get this case over and done don't appeal.

NUS cases show why judge analytics is needed in SingaporeThrowing anecdotes around fails to convince any side of the situation in Singapore. The real solution is more data.Love.Law.Robots.HoufuDue to the lack of public information on how judges decide cases, it's difficult to get a common understanding of what they do.

Hearing lists are available for civil trials and applications, criminal trials and tribunal matters in the coming week. It looks like an ASP.Net frontend with a basic search function. Besides information relating to when the hearing is fixed, you can see who the parties are and very skeletal information on what the hearing is about. There's no API access.

Court records aren't available to the public online. Inspection of case files by the public requires permission, and fees apply.

The State Court has expanded its scope with several new courts in recent years, such as the Protection from Harassment Courts, Community Dispute Resolution Centre and Labour Claims Tribunal. None of these courts publishes their judgements on a regular basis. As they rarely get appealed, you will also not find them in the free section of LawNet.

Legislation

Beautiful view from the Parliament of Singapore 🇸🇬Photo by Steven Lasry / Unsplash

Singapore Statutes Online is the place to get legislation in Singapore. It contains historical versions of legislation, current editions, repealed versions, subsidiary legislation and bills.

When the first version was released in 2001, it was quite a pioneer. Today many countries provide their legislations in snazzier forms. (I am a fan of the UK's version).

While there isn't API access (and extraction won't be easy due to the extensive use of not so semantic HTML), you can enjoy the several RSS feeds littered around every aspect of the site.

I consider SSO to be very fast and regularly updated. However, if you need an alternative site for bills and acts, you can consider Parliament's website.

#Features #DataMining #DataScience #Decisions #Government #Judgements #Law #OpenSource #Singapore #SupremeCourtSingapore #WebScraping #StateCourtsSingapore

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu

Feature image

Introduction

Over the course of 2019 and 2020, I embarked on a quest to apply the new things I was learning in data science to my field of work in law.

The dataset I chose was the enforcement decisions from the Personal Data Protection Commission in Singapore. The reason I chose it was quite simple. I wanted a simple dataset covering a limited number of issues and is pretty much independent (not affected by stare decisis or extensive references to legislation or other cases). Furthermore, during that period, the PDPC was furiously issuing several decisions.

This experiment proved to be largely successful, and I learned a lot from the experience. This post gathers all that I have written on the subject at the time. I felt more confident to move on to more complicated datasets like the Supreme Court Decisions, which feature several of the same problems faced in the PDPC dataset.

Since then, the dataset has changed a lot, such as the website has changed, so your extraction methods would be different. I haven't really maintained the code, so they are not intended to create your own dataset and analysis today. However, techniques are still relevant, and I hope they still point you in a good direction.

Extracting Judgement Data

Dog & Baltic SeaPhoto by Janusz Maniak / Unsplash

The first step in any data science journey is to extract data from a source. In Singapore, one can find judgements from courts on websites for free. You can use such websites as the source of your data. API access is usually unavailable, so you have to look at the webpage to get your data.

It's still possible to download everything by clicking on it. However, you wouldn't be able to do this for an extended period of time. Automate the process by scraping it!

Automate Boring Stuff: Get Python and your Web Browser to download your judgements]

I used Python and Selenium to access the website and download the data I want. This included the actual judgement. Metadata, such as the hearing date etc., are also available conveniently from the website, so you should try and grab them simultaneously. In Automate Boring Stuff, I discussed my ideas on how to obtain such data.

Processing Judgement Data in PDF

Photo by Pablo Lancaster Jones / Unsplash

Many judgements which are available online are usually in #PDF format. They look great on your screen but are very difficult for robots to process. You will have to transform this data into a format that you can use for natural language processing.

I took a lot of time on this as I wanted the judgements to read like a text. The raw text that most (free) PDF tools can provide you consists of joining up various text boxes the PDF tool can find. This worked all right for the most part, but if the text ran across the page, it would get mixed up with the headers and footers. Furthermore, the extraction revealed lines of text, not paragraphs. As such, additional work was required.

Firstly, I used regular expressions. This allowed me to detect unwanted data such as carriage returns, headers and footers in the raw text matched by the search term.

I then decided to use machine learning to train my computer to decide whether to keep a line or reject it. This required me to create a training dataset and tag which lines should be kept as the text. This was probably the fastest machine-learning exercise I ever came up with.

However, I didn't believe I was getting significant improvements from these methods. The final solution was actually fairly obvious. Using the formatting information of how the text boxes were laid out in the PDF , I could make reasonable conclusions about which text was a header or footer, a quote or a start of a paragraph. It was great!

Natural Language Processing + PDPC Decisions = 💕

Photo by Moritz Kindler / Unsplash

With a dataset ready to be processed, I decided that I could finally use some of the cutting-edge libraries I have been raring to use, such as #spaCy and #HuggingFace.

One of the first experiments was to use spaCy's RuleMatcher to extract enforcement information from the summary provided by the authorities. As the summary was fairly formulaic, it was possible to extract whether the authorities imposed a penalty or the authority took other enforcement actions.

I also wanted to undertake key NLP tasks using my prepared data. This included tasks like Named Entity Recognition (does the sentence contain any special entities), summarisation (extract key points in the decision) and question answering (if you ask the machine a question, can it find the answer in the source?). To experiment, I used the default pipelines from Hugging Face to evaluate the results. There are clearly limitations, but very exciting as well!

Visualisations

Photo by Annie Spratt / Unsplash

Visualisations are very often the result of the data science journey. Extracting and processing data can be very rewarding, but you would like to show others how your work is also useful.

One of my first aims in 2019 was to show how PDPC decisions have been different since they were issued in 2016. Decisions became greater in number, more frequent, and shorter in length. There was clearly a shift and an intensifying of effort in enforcement.

I also wanted to visualise how the PDPC was referring to its own decisions. Such visualisation would allow one to see which decisions the PDPC was relying on to explain its decisions. This would definitely help to narrow down which decisions are worth reading in a deluge of information. As such, I created a network graph and visualised it. I called the result my “Star Map”.

Data continued to be very useful in leading the conclusion I made about the enforcement situation in Singapore. For example, how great an impact would the increase in maximum penalties in the latest amendments to the law have? Short answer: Probably not much, but they still have a symbolic effect.

What's Next?

As mentioned, I have been focusing on other priorities, so I haven't been working on PDPC-Decisions for a while. However, my next steps were:

  • I wanted to train a machine to process judgements for named entity recognition and summarization. For the second task, one probably needs to use a transformer in a pipeline and experiment with what works best.
  • Instead of using Selenium and Beautiful Soup, I wanted to use scrapy to create a sustainable solution to extract information regularly.

Feel free to let me know if you have any comments!

#Features #PDPC-Decisions #PersonalDataProtectionAct #PersonalDataProtectionCommission #Decisions #Law #NaturalLanguageProcessing #PDFMiner #Programming #Python #spaCy #tech

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu

Feature image

This post is part of a series relating to the amendments to the Personal Data Protection Act in Singapore in 2020. Check out the main post for more articles!

When the GDPR made its star turn in 2018, the jaw-dropping penalties drew a lot of attention. Up to €20 million, or up to 4% of the annual worldwide turnover of the preceding financial year, whichever is greater , was at stake. Several companies scrambled to get their houses in order. For the most part, the authorities have followed through. We are expecting more too. Is this the same with the Personal Data Protection Act in Singapore too?

Penalties will increase under the latest PDPA amendments.

The financial penalties under Singapore’s Personal Data Protection Act probably garner the most attention. They are still newsworthy even though they have been issued regularly since 2016. The most famous data breach concerning SingHealth resulted in a total penalty of S$1 million. The maximum penalty of $1 million is not negligible. It’s not hypothetical either.

The newest PDPA amendments will now increase the maximum penalty to up to 10% of an organisation’s annual gross turnover in Singapore. To help imagine what this means: According to Singtel’s Annual Report in 2020, operating revenues for Singapore consumers was S$2.11b. The maximum penalty would be at least S$200m.

Is this the harbinger of doom and gloom for local companies? Will local companies scramble to hire personal data specialists like for the GDPR? Will an army of lawyers be groomed to fine-comb previous PDPC decisions to distinguish their clients' cases? Is my CIPP/A finally worth something?

Penalties imposed under the PDPA appear limited.

Before trying to spend on compliance, savvier companies would want to find out more about how the Personal Data Protection Commission enforces the PDPA. This makes sense. The costs of compliance have to be rational in light of the risks. If the dangers of being susceptible to a financial penalty are valued at $5,000, it makes no sense to hire a professional at $80,000 a year. If liability for data breaches is a unique and rare event, hiring a firm of lawyers to defend you in that event is better than hiring a professional every day to prevent it.

So here is the big question: What’s the risk of being penalised $1 million or gasp(!) at least $200 million?

Unfortunately, one does not need a big data science chart to realise that being penalised $1 million is a rare event. Being penalised $100,000 is also a rare event. Using the filters from the PDPC’s decisions database reveals a total of 2 cases with financial penalties greater than $75,000 since 2016.

Screen capture of filters of PDPC decisions with financial penalties of more than $75000. (As of October 2020)

However, if you insist on having a “big data science chart”, here’s one I created anyway:

Histogram of the number of cases binned on enforcement value.

Notes :

  • I excluded the Singhealth penalties ($750K and $250K) because they were outliers.
  • It’s named “enforcement value” and not “penalty sum” because I considered warnings and directions to have $0 as a financial penalty.

The “big data science chart” tells the same story as the PDPC’s website. Most financial penalties fall within the $0 to $35,000 range, with the mean penalty being less than $10,000. While the PDPC certainly has the power to impose a $1 million penalty, it appears to flex around 1% of its capabilities most of the time.

Past performance does not represent future returns. However, the amendments to the PDPA were not supposed to represent a change to the PDPC’s practices. They are for “flexibility” and to match other areas like the Competition Act. There is very little indication that an increase in the financial cap now means that companies will be liable for more.

Why are the penalties so low?

The decisions cite several factors in determining the amount of penalty – the number of individuals affected, the significance of the data lost and even whether the respondent cooperated with the PDPC.

In Horizon Fast Ferry, the PDPC cited the “ICO Guidance on Monetary Penalties” as a principle in determining monetary penalties:

The Commissioner’s underlying objective in imposing a monetary penalty notice is to promote compliance with the DPA or with PECR. The penalty must be sufficiently meaningful to act both as a sanction and also as a deterrent to prevent non-compliance of similar seriousness in the future by the contravening person and by others.

The key phrase in the quote is “sufficiently meaningful”. Given the PDPC’s desire to promote businesses, the PDPC would not like to kill off a company by imposing a crippling penalty. The penalties serve a signalling purpose. As they continue to attract public attention and encourage companies to comply, penalties are the most effective tool in the PDPC’s arsenal.

However, even if the penalties are “sufficiently meaningful” in an objective sense, they may still be meaningless subjectively. $5,000 might be peanuts to a large business. Some businesses may even treat it as a cost of “innovation”. PDPC decisions are replete with “repeat” offenders. Breaking the PDPA, for example, seems to be a habit for Grab.

While doling out “meaningful” penalties strikes a balance between compliance with the law and business interests, there are limits to this approach. As mentioned above, dealing with a risk of $5,000 fines may not be sufficient for a company to hire a team of specialists or even a professional Data Protection Officer. If a company’s best strategy is not to get caught for a penalty, this does not promote compliance with the law at all.

Moving beyond penalties

I am not a fan of financial penalties. I have always viewed them as a “transaction”, so they never really comply with the spirit of compliance.

Asking companies to comply with directions may be far more punishing than doling out a fine. A law firm might help you negotiate the best directions you can get, but the company has to implement them through its employees. The company will need data protection specialists. This approach is more effective than just essentially issuing a company a ticket.

For this reason, I was pretty excited about the PDPC’s Active Enforcement guidelines. Here’s something to watch out for: a new section on undertakings appeared last month.

Conclusion

Still, I am probably an outlier in this regard. The increased penalty cap has repeatedly featured as one of the most critical changes in the PDPA. Experience does not suggest that a higher cap will change much. Nevertheless, as a signal, the news would probably make management sit up and review their data protection policies. Data Protection Officers should take advantage of the new attention to polish up their data protection policies and practices.

This post is part of a series on my Data Science journey with PDPC Decisions. Check it out for more posts on visualisations, natural languge processing, data extraction and processing!

#Privacy #Singapore ##PDPAAmendment2020 #Compliance #DataBreach #DataProtectionOfficer #Decisions #GDPR #Enforcement #Penalties #PersonalDataProtectionAct #PersonalDataProtectionCommission #Undertakings

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu