Love.Law.Robots. by Ang Hou Fu

Updated

Feature image

Bar admission applications in Singapore are mainly administrative and symbolic affairs. If you missed the big one in July/August, you would gather in a chamber and have your admission acknowledged by a High Court judge. That would be the last time they would ever see a court robe for some.

In a rare show of drama, six applicants had to wait for their admission to the bar. Five of them cheated on the bar exam in 2020 by sharing their answers on WhatsApp. One colluded with another but fought the charges. All of them had, anyway, retaken their exams and passed. At the Attorney General’s proposal, their admission to the bar should be delayed by six months or a year so that they can “reflect on their error”. Choo J agreed.

[2022] SGHC 87Choo Han Teck J:You can read the full facts and reasons of the case in the judgement of this case here.

Update: Originally Choo J decided to anonymise and seal the case, so that the identities will not be revealed. Choo J reversed his decision on 27/4. “strong sentiments may sometimes interfere with the proper understanding of the idea of second chances.”

Choo J’s concluding remarks, in his characteristic brevity, are worth reproducing:

Measuring justice is never an easy task. Judges are ever mindful not to set standards that they themselves cannot achieve. They are loathe to shut the door on a wrongdoer with no prospects of redemption. But they also have a duty to prevent a repeat of the wrong, and to do so without breaking young backs in the process.

Some might claim that their treatment is too lenient. Don’t we expect lawyers to represent the highest standards of honesty and integrity? Wouldn’t cheating in an exam for bar admission strike at the heart of all that?

However, if the bar exam is supposed to show one’s readiness to become a lawyer, I start to feel conflicted. Do we expect lawyers to collaborate or show off their mettle doggedly? The approach would likely result in a better product for the court or the client is obvious.

If you start walking down that path, how we conduct bar exams becomes questionable. How much of civil procedure we learnt then is relevant today? Does everyone need to know about family law when only a minor subset of us will specialise in it? Do we need to test people who recently graduated from law school all the things they learnt from law school again (or find something they might have missed)?

I remember very little about what I studied or was tested on in my bar exam. Indeed, this shows how the bar exam has such little bearing on my activities in the law today.

I hope this incident is an awkward reminder of how relevant the bar exam is today. Interestingly, other jurisdictions are relooking the bar exam radically, though they have not taken that step. I like how this Above the Law article summarised the nub of this issue.

The bar exam has been a rite of passage barrier to entry for lawyers in America since the late 1800s. After more than 130 years of forcing would-be lawyers to go through months of intense study of laws they’ll never need to know in actual practice, the bar exam will finally be changing — four years from now.

Ideally, the new test will focus on seven skills areas, including client counselling and advising, client relationships and management; legal research; legal writing; and negotiation. It hasn't been implemented, and it's easy to be cynical about this.

Cheating should not be allowed on a test to assess your capability. But unwittingly these applicants might have drawn attention to something worth considering: what is the place of the bar exam, and is it instrumental in transitioning a student to practice? The absurd result is that those who wish to be admitted to the bar might have to learn to cheat on the bar exam to prepare them for the real world.

#Singapore #Law #Lawyers #Training #Ethics #SupremeCourtSingapore #Judgements #Updated

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu

Feature image

Update 11 May 2020 : A few days after I wrote this post, Pi-Hole released version 5.0. Some of the new features impact the content here. Since it’s only been days, I have updated the content accordingly.

It was a long weekend, so it was time to play. Ubuntu 20.04 LTS just came out. This is important because of the “LTS” at the back of its name. I took the opportunity to upgrade “Ursula”, my home server. I have not been installing OSes like changing my clothes since High School, but I had big plans for this one.

Ad Blocking on a Network Level

Securing your internet is tough. I have “fond” memories of earlier days of the internet when browsing the internet exposed you to porn. How about flash movies that install software on your computer? It now seems quaint that people are surprised that they can be tricked over the internet with phishing and social engineering.

I value my privacy and I would like to control what goes on about me and my computers. I don’t like ads or tracking technologies. More people seem to be on my side on this one: with every browser claiming that they will block ads or trackers.

Browsers are important because they are the main window for ads or trackers. However, other activities also generate such risks, such as handphones, smart gadgets, and other internet-connected devices.

If you are accessing the internet outside of your browser, your browser won’t protect you. The more comprehensive solution is to protect on a network level.

To protect yourself on a network level, you will adjust your internet router settings and how your internet traffic is processed so that all requests are caught. A blacklist of trackers and suspicious websites is usually maintained. If a query meets the blacklist, they are not processed.

As you might expect, fidgeting with your internet router settings, finding out what your ISP’s upstream servers are, or even niggling around config files is very daunting for most users.

Enter the Pi-Hole

I first learned about Pi-Hole through the DigitalOcean Marketplace. It was great that it was designed for containers from the start, because I wanted “Ursula” to serve services using containers instead of the complexity of figuring out Ubuntu Linux’s oddities.

Home1. Install a supported operating systemYou can run Pi-hole in a container, or deploy it directly to a supported operating system via our automated installer.DPi-hole logotelekrmorPi-hole Web Page

Previously I implemented my internet blacklist using response policy zones in a bind9 server. I am not entirely sure how I did it… which would be a disaster if my server gets wiped out.

The best thing about dockers is that you would write the configuration in one file (like a docker-compose.yml for me) and it’s there. Once you have reviewed the configuration, you would just call docker-compose up and the program starts up for you.

Once you have the server running, you can ogle at its work with pi-hole’s gorgeous dashboard:

So many queries, so many blocked. ( Update 11/5/20 : Screenshot updated to show the new version 5.0 interface. So many bars now!)

I could make a few conclusions from the work of my Pi-Hole server so far:

  • Several queries were blocked from my handphone. This shows that phones are a hotbed for ad trackers. Since most of us use our phones for web browsing, advertising on the internet has not taken a hit even though more browsers feature some form of adblocking.
  • The second chart (labelled “Clients “Over time)”) roughly corresponds to the computers used during the day. During this circuit breaker period, you can see your work computers dialling “home”. At night, more home computers are sending queries.

Installation Headaches

Using Pi-Hole as a local LAN DNS server

My previous LAN DNS server was meant to serve DNS queries for my home network. My home server and Network Attached Storage device were its main customers. I also exposed some of the services (like my Plex) to the outside world. If my LAN server was not around, I will have to remember many octets (read IP addresses).

Update 11/5/2020 : In the original post, I complained about setting local LAN hostnames being hidden. Version 5.0 now allows you to set hostnames through the admin dashboard. This is one feature that I would be using! Turns out, it was quick and easy!

The dashboard used to add local DNS domains. New in version 5.0.

Installing Pi-Hole Behind a Traefik Server/Reverse Proxy

I didn’t wreck my Ubuntu 18.04 LTS server so that I could install Pi-Hole. I wanted to be able to serve several services through my Home Server without having to be limited by one set of 80 (HTTP) and 443 (HTTPS) ports. Pi-Hole uses both of those ports. I will not be able to have any more web servers.

A reverse proxy routes a request to the correct server. My forays with Nginx and the traffic server had not been successful. Traefik got me curious because it claimed it could automatically figure out configurations automatically. If I could get Traefik to work, Traefik could sort out how to have several applications on one host!

Traefik, The Cloud Native Application Proxy | Traefik LabsTraefik is the leading open-source reverse proxy and load balancer for HTTP and TCP-based applications that is easy, dynamic and full-featured.Traefik Labs: Makes Networking Boring

So getting Traefik to work was a priority, but I also really wanted to set up Pi-Hole first. Curiously, there are some resources on getting both to work together correctly. Since this was the first time I was using both Traefik and Pi-Hole, I needed to experiment badly. In the end, I went down with this configuration in my docker-compose file:

version: '3'

services: reverse-proxy: # The official v2 Traefik docker image image: traefik:v2.2 containername: traefik # Enables the web UI and tells Traefik to listen to docker command: —api.insecure=true —providers.docker ports: # The HTTP/HTTPS port – “80:80” – “443:443” # The Web UI (enabled by —api.insecure=true) – “8080:8080” volumes: # So that Traefik can listen to the Docker events – /var/run/docker.sock:/var/run/docker.sock – /home/houfu/traefik/:/etc/traefik/ environment: DOAUTH_TOKEN: [... Token provided by Digital Ocean for SSL certificate generation] restart: unless-stopped

### pi-hole

pihole: containername: pihole domainname: xxx.home hostname: pihole image: pihole/pihole:latest dns: – 127.0.0.1 – 1.1.1.1 ports: – '0.0.0.0:53:53/tcp' – '0.0.0.0:53:53/udp' #– '0.0.0.0:67:67/udp' – '0.0.0.0:8052:80/tcp' – “0.0.0.0:8443:443/tcp” volumes: – ./etc-pihole/:/etc/pihole/ – ./etc-dnsmasqd/:/etc/dnsmasq.d/ # run touch ./pihole.log first unless you like errors # – ./pihole.log:/var/log/pihole.log environment: ServerIP: 192.168.2.xxx PROXYLOCATION: pihole VIRTUALHOST: pihole.xxx VIRTUALPORT: 80 TZ: 'Asia/Singapore' WEBPASSWORD: PASSWORD DNS1: [VQ Server 1] DNS2: [VQ Server 2] restart: unless-stopped labels: # required when using —docker.exposedbydefault=false – “traefik.enable=true” # https://www.techjunktrunk.com/docker/2017/11/03/traefik-default-server-catch-all/ – “traefik.frontend.rule=HostRegexp:pihole.xxx,{catchall:.*}” – “traefik.frontend.priority=1” – “traefik.backend=pihole” – “traefik.port=80” – “traefik.port=443”

(Some private information, like the names of my private servers and the IP of my ISP’s DNS servers, have been anonymised.)

Conclusion

I could not have done this without the copious time at home created by the circuit breaker. For now, though, I hope I can run this and many experiments on this server and report it on this blog. Is there something I should try next? Let me know in the comments!

#blog #tech #docker #DigitalOcean #Updated #OpenSource

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu

Feature image

This post is part of a series on my Data Science journey with PDPC Decisions. Check it out for more posts on visualisations, natural languge processing, data extraction and processing!

Update 13 June 2020: “At least until the PDPC breaks its website.” How prescient… about three months after I wrote this post, the structure of the PDPC’s website was drastically altered. The concepts and the ideas in this post haven’t changed, but the examples are outdated. This gives me a chance to rewrite this post. If I ever get round to it, I’ll provide a link.

Regular readers would already know that I try to run a github repository which tries to compile all personal data protection decisions in Singapore. Decisions are useful resources teeming with lots of information. They have statistics, insights into what factors are relevant in decision making and show that data protection is effective in Singapore. Even basic statistics about decisions make newspaper stories here locally. It would be great if there was a way to mine all that information!

houfu/pdpc-decisionsData Protection Enforcement Cases in Singapore. Contribute to houfu/pdpc-decisions development by creating an account on GitHub.GitHubhoufu

Unfortunately, using the Personal Data Protection Commission in Singapore’s website to download judgements can be painful.

This is our target webpage today – Note the website has been transformed.

As you can see, you are only able to view no more than 5 decisions at one time. As the first decision dates back to 2016, you will have to go through several pages to grab everything! Actually just 23. I am sure you can do all that in 1 night, right? Right?

If you are not inclined to do it, then get your computer to do it. Using selenium, I wrote a python script to automate the whole process of finding all the decisions available on the website. What could have been a tedious night’s work was accomplished in 34 seconds.

Check out the script here.

What follows here is a step by step write up of how I did it. So hang on tight!

Section 1: Observe your quarry

Before setting your computer loose on a web page, it pays to understand the structure and inner workings of your web page. Open this up by using your favourite browser. For Chrome, this is Developer's Tools and in Firefox, this is Web Developer. You will be looking for a tab called Sources, which shows you the HTML code of the web page.

Play with the structure of the web page by hovering over various elements of the web page with your mouse. You can then look for the exact elements you need to perform your task:

  • In order to see a new page, you will have to click on the page number in the pagination. This is under a section (a CSS class) called group__pages. Each page-number is under a section (another CSS class) called page-number.
  • Each decision has its own section (a CSS class) named press-item. The link to the download, which is either to a text file or a PDF file, is located in a link in each press-item.
  • Notice too that each press-item also has other metadata regarding the decision. For now, we are curious about the date of the decision and the respondent.

Section 2: Decide on a strategy

Having figured out the website, you can decide on how to achieve your goal. In this case, it would be pretty similar to what you would have done manually.

  1. Start on a page
  2. Click on a link to download
  3. Go to the next link until there are no more links
  4. Move on to the next page
  5. Keep repeating steps 1 to 4 until there are no more pages
  6. Profit!

Since we did notice the metadata, let’s use it. If you don’t use what is already in front of you, you will have to read the decision to extract such information In fact, we are going to use the metadata to name our decision.

Section 3: Get your selenium on it!

Selenium drives a web browser. It mimics user interactions on the web browser, so our strategy in Step 2 is straightforward to implement. Instead of moving our mouse like we ordinarily would, we would tell the web driver what to do instead.

WebDriver :: Documentation for SeleniumDocumentation for SeleniumSelenium

Let’s translate our strategy to actual code.

Step 1: Start on a page

We are going to need to start our web driver and get it to run on our web page.

from selenium.webdriver import Chrome from selenium.webdriver.chrome.options import Options PDPCdecisionssite = “https://www.pdpc.gov.sg/Commissions-Decisions/Data-Protection-Enforcement-Cases" # Setup webdriver options = Options() # Uncomment the next two lines for a headless chrome # options.addargument('—headless') # options.addargument('—disable-gpu') # options.addargument('—window-size=1920,1080') driver = Chrome(options=options) driver.get(PDPCdecisions_site)

Steps 2: Download the file

Now that you have prepared your page, let’s drill down to the individual decisions itself. As we figured out earlier, each decision is found in a section named press-item. Get selenium to collect all the decisions on the page.

judgements = driver.findelementsbyclassname('press-item')

Recall that we were not just going to download the file, we will also be using the date of the decision and the respondent to name the file. For the date function, I found out that under each press-item there is a press-date which gives us the text of the decision date; we can easily convert this to a python datetime so we can format it anyway we like.

def getdate(item: WebElement): itemdate = datetime.strptime(item.findelementbyclassname('press_date').text, “%d %b %Y”) return itemdate.strftime(“%Y-%m-%d”)

For the respondent, the heading (which is written in a fixed format and also happens to be the link to the download – score!) already gives you the information. Use a regular expression on the text of the link to suss it out. (One of the decisions do not follow the format of “Breach … by respondent “, so the alternative is also laid out)

def get_respondent(item): text = item.text return re.split(r”\s+[bB]y|[Aa]gainst\s+“, text, re.I)[1]

You are now ready to download a file! Using the metadata and the link you just found, you can come up with meaningful names to download your files. Naming your own files will also help you avoid the idiosyncratic ways the PDPC names its own downloads.

Note that some of the files are not PDF downloads but instead are short texts in web pages. Using the earlier strategies, you can figure out what information you need. This time, I used BeautifulSoup to get the information. I did not want to use selenium to do any unnecessary navigation. Treat PDFs and web pages differently.

def downloadfile(item, filedate, filerespondent): url = item.getproperty('href') print(“Downloading a File: “, url) print(“Date of Decision: “, filedate) print(“Respondent: “, filerespondent) if url[-3:] == 'pdf': dest = SOURCEFILEPATH + filedate + ' ' + filerespondent + '.pdf' wget.download(url, out=dest) else: with open(SOURCEFILEPATH + filedate + ' ' + filerespondent + '.txt', “w”) as f: from bs4 import BeautifulSoup from urllib.request import urlopen soup = BeautifulSoup(urlopen(url), 'html5lib') text = soup.find('div', class_='rte').getText() lines = re.split(r”ns+“, text) f.writelines([line + 'n' for line in lines if line != “”])

Steps 3 to 5: Download every item on every page

The next steps follow a simple idiom — for every page and for every item on each page, download a file.

for pagecount in range(len(pages)): pages[pagecount].click() print(“Now at Page “, pagecount) pages = refreshpages(driver) judgements = driver.findelementsbyclassname('press-item') for judgement in judgements: date = getdate(judgement) link = judgement.findelementbytagname('a') respondent = getrespondent(link) download_file(link, date, respondent)

Unfortunately, once selenium changes a page, it needs to be refreshed. We are going to need a new group__pages and page-number in order to continue accessing the page. I wrote a function to “refresh” the variables I am using to access these sections.

def refreshpages(webdriver: Chrome): grouppages = webdriver.findelementbyclassname('group_pages') return grouppages.findelementsbyclassname('page-number') . . . pages = refresh_pages(driver)

Conclusion

Once you got your web driver to be thorough, you are done! In my last pass, 115 decisions were downloaded in 34 seconds. The best part is that you can repeat this any time there are new decisions. Data acquisition made easy! At least until the PDPC breaks its website.

Postscript: Is this… Illegal?

I’m listening…

Web scraping has always been quite controversial and the stakes can be quite high. Copyright infringement, Misuse of Computer Act and trespass, to name a few. Funnily enough, manually downloading may be less illegal than using a computer. The PDPC’s own terms of use is not on point at this.

( Update 15 Mar 2021 : OK I think I am being fairly obtuse about this. There is a paragraph that states you can’t use robots or spiders to monitor their website. That might make sense in the past when data transfers were expensive, but I don't think that this kind of activity at my scale can crash a server.)

Personally, I feel this particular activity is fairly harmless — this is considered “own personal, non-commercial use” to me. I would likely continue with this for as long as I would like my own collection of decisions. Or until they provide better programmatic access to their resources.

Ready to mine free online legal materials in Singapore? Not so fast!Amendments to Copyright Act might support better access to free online legal materials in Singapore by robots. I survey government websites to find out how friendly they are to this.Love.Law.Robots.HoufuIn 2021, the Copyright Act in Singapore was amended to support data analysis, like web scraping? I wrote this follow-up post.

#PDPC-Decisions #Programming #Python #tutorial #Updated

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu

Feature image

How do I get on this legal technology wave? Where do I even start? A “contract management system” or a “document management system” (“CMS”) is a good place. Business operations are not affected, but the legal department can get their hands dirty and show results for it.

If you would like a CMS, then the next question is how actually to do it? If you have the budget and the resources, getting a neat and fancy tech solution is excellent. If you're strapped for cash and need to be creative, a solution may be hiding in your computer.

For this little victory, I present to you the most powerful application in the Microsoft Office family — Microsoft Excel. It’s a spreadsheet program that does well with numbers and formulas, but since it started added fonts and cell shading (apparently it was the pioneer), some people have used for other purposes. This includes our CMS.

Microsoft Office PROTIP : Instead of using Word to lay out complicated information, try using Excel instead. A massive table with multiple rows and columns, or trying to fit too much data on one page. Put all the information in one worksheet and print it to fit the sheet on one page. Done! (You might want to question yourself why you are trying to present something so complicated though.)

Hey, wait a second! Isn’t Microsoft Excel a spreadsheet program? If we are compiling a table of information, shouldn’t we be using a database program? Like Microsoft Access? Wrong tool for the job, right?!

Excel can be used for your Contract Management System

I have got nothing against database programs. Heck, my first programming project when I was a teenager was to create a database application detailing the lives of my hamsters. Reports, Forms, queries — I am quite okay with all that. However, there are several reasons why I would still use Excel.

  • Everyone has Excel: If you already work in an environment with Microsoft Office, everyone has Excel. There is no need to install anything new. Compared to a fancy dandy web app (no guarantees about user interface either) or even Microsoft Access, more people are likely to accept using Excel compared to other applications.
  • Anyone can use Excel : Excel is a battle-hardened program that people of different skill levels have used. You will find that more people are able to access and use your CMS. This is important if you are not going to be the one inputting information into the system. You can actually tell your intern to get in there and just do it. Access (and probably other programs) do have a learning curve, and you will have to teach every new user.
  • Excel has underrated features which are very useful for a CMS : Excel is over 30 years old, but it has been improving all this time. There are two features I would highlight:
  1. Formatting as Table unlocks sorting, filtering by phrases and other dandy stuff. You can even filter and sort by colour. I use these features to filter say the contracts that are expiring in the current quarter. I also can filter information such as the place where the contract is formed or the contracting party.
  2. Pivot Tables also help to organise data in a way to gain new insights. For example, I can find out quickly which jurisdictions my counterparties are from.
  • Hyperlinks: Some organisations may store their soft copy contracts in file servers, and it becomes easy to provide access to such soft copies through hyperlinks quickly. For a listing of General Terms and Conditions which Business uses and Legal has reviewed, you can also embed an object in your Excel file together with Legal and business’s comments. This way, everyone knows which GTC we have reviewed.

You can adopt this Excel CMS Format

Here is a blank format of an Excel Contract Management System you can download. You can modify or adapt it in any way you deem fit.

CMS Format CMS Format.xlsx 15 KB download-circle

Here are a few highlights of the form:

  • The format is divided into a few sections — Meta, Counterparty, Contract Term and Subject.
  • In the Meta section, you can adapt to suit your organisation’s needs and quirks. For example, we need every contract approved by a form, and we link the form here. There is also a link to a soft copy Word-editable version of the contract if it is available.
  • In the Counterparty section, this is information relating to your contract parties (not yourself obviously). You can also have Yes-No (or unsure) columns to filter.
  • The Contract Term and Subject sections refer to important information what you would like to review quickly using the sorting and filtering functions.

Some Limitations in your Excel Contract Management System

The Excel CMS presents a rough and ready format you can use to get your contract management system tooled up quickly. The filter and sorting has immediate benefits even in contract review, since now I can have access to other similar or related contracts across the company to see what are the standards.

However, the system has many limitations:

  • The table is mighty wide and might not fit very well on one piece of paper. It makes data entry difficult, although I find that Excel’s data form does alleviate some of the problem.
  • Summarising data (for example, I want to know all the contracts with Company X, but I do not need to see who the Person in Charge was) is nearly impossible. You can hack it out by freezing or hiding cells, but this is not a long term solution.
  • Data input can be quite tedious. That’s a lot of columns which are prone to arbitrary data input or mistakes. Not to mention that it can be very time-consuming.

However, once you can demonstrate practical benefits and a workflow, stepping up to a real made for the purpose document or contract management system is easier to climb.

Would I still use Excel for Contract Management?Many people would like to use Excel to manage their contract data. After two years of operating such a system, would I still recommend it?Love.Law.Robots.HoufuHere's my follow up to this post – two years after using this system. (Free subscription required)

Conclusion

This little victory challenges the idea that you have to leap into a system someone made for that purpose to get tech on your side. Using tools that your organisation already has and paid for, this is a straightforward hack. For the win!

#tech #MicrosoftOffice #LegalTech #ContractManagementSystem #Updated

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu