Love.Law.Robots. by Ang Hou Fu

DigitalOcean

Feature image

This is the story of my lockdown during a global pandemic. [ Cue post-apocalypse 🎼]

Amid a global pandemic, schools and workplaces shut down, and families had to huddle together at home. There was no escape. Everyone was getting on each others' nerves. Running out of options, we played Monopoly Junior, which my daughter recently learned how to play. As we went over several games, we found the beginner winning every game. Was it beginner's luck? I seethed with rage. Intuitively, I knew the odds were against me. I wasn't terrible at Monopoly. I had to prove it with science.

How would I prove it with science? This sounds ridiculous at first, but you'd do it by playing 5 million Monopoly Junior games. Yes, once you play enough games, suddenly your anecdotal observations become INSIGHT. (Senior lawyers might also call it experience, but I am pretty sure that they never experienced 5 million cases.)

This is the story of how I tried different ways to play 5 million Monopoly JR games.

What is Monopoly Junior?

Front cover for the box of the board game, Monopoly JR.The cover for the Monopoly Junior board game.

Most people know what the board game called Monopoly is. You start with a bunch of money, and the goal is to bankrupt everyone else. To bankrupt everyone else, you go round the board and purchase properties, build hotels and take everyone else's money through rent-seeking behaviour (the literal kind). It's a game for eight and up. Mainly because you have to count with hundreds and thousands, and you would need the stamina to last an entire night to crush your opposition.

If you are, like my daughter, five years old, Monopoly JR is available. Many complicated features in Monopoly (for example, counting past 30, auctions and building houses and hotels) are removed for young players to join the game. You also get a smaller board and less money. Unless you receive the right chance card, you don't have a choice once you roll the dice. You buy, or you pay rent on the space you land on.

As it's a pretty deterministic game, it's not fun for adults. On the other hand, the game spares you the ignominy of mano-a-mano battles with your kids by ending the game once anyone becomes bankrupt. Count the cash, and the winner is the richest. It ends much quicker.

Hasbro Monopoly Junior Game,A69843480 : Amazon.sg: ToysHasbro Monopoly Junior Game,A69843480 : Amazon.sg: ToysInstead of letting computers have all the fun, now you can play the game yourself! (I earn a commission when you buy from this amazon affiliate link)

Determining the Approach, Realising the Scale

Ombre BalloonsPhoto by Amy Shamblen / Unsplash

There is a pretty straightforward way to write this program. Write the rules of the game in code and then play it five million times. No sweat!

However, if you wanted to prove my hypothesis that players who go first are more likely to win, you would need to do more:

  • We need data. At the minimum, you would want to know who won in the end. This way, you can find out the odds you'd win if you were the one who goes first (like my daughter) or the last player (like me).
  • As I explored the data more, interesting questions began to pop up. For example, given any position in the game, what are the odds of winning? What kind of events cause the chances of winning to be altered significantly?
  • The data needs to be stored in a manner that allows me to process and analyse efficiently. CSV?
  • It'd be nice if the program would finish as quickly as possible. I'm excited to do some analysis, and waiting for days can be burdensome and expensive! (I'm running this on a DigitalOcean Droplet.)

The last point becomes very troublesome once you realise the scale of the project. Here's an illustration: you can run many games (20,000 is a large enough sample, maybe) to get the odds of a player winning a game. Imagine you decide to do this after every turn for each game. If the average game had, say, 100 turns (a random but plausible number), you'd be playing 20,000 X 100 = 2 million additional games already! Oh, let's say you also want to play three-player and four-player games too...

It looks like I have got my work cut out for me!

Route One: Always the best way?

I decided to go for the most obvious way to program the game. In hindsight, what seemed obvious isn't obvious at all. Having started programming using object-orientated languages (like Java), I decided to create classes for all my data.

An example of how I used a class to store my data. Not rocket science.

The classes also came in handy when I wrote the program to move the game.

Object notation makes programming easy.

It was also pretty fast, too — 20,000 two-player games took less than a minute. 5 million two-player games would take about 4 hours. I used python's standard multiprocessing modules so that several CPUs can work on running games by themselves.

Yes, my computers are named after cartoon characters.

After working this out, I decided to experiment with being more “Pythonic”. Instead of using classes, I would use Python dictionaries to store data. This also had the side effect of flattening the data, so you would not need to access an object within an object. With a dictionary, I could easily use pandas to save a CSV.

This snippet shows how the basic data for a whole game is created.

Instead of using object notation to find data about a player, the data is accessed by using its key in a python dictionary.

The same code to move player is implemented for a dictionary.

I didn't think it would make a difference honestly. However, I found the speed up was remarkable: almost 10 times! 20,000 two-player games now took 4 seconds. The difference between 4 seconds and less than a minute is a trip to the toilet, but for 5 million games, it was reduced from 4 hours to 16 mins. That might save me 20 cents in Droplet costs!

Colour me impressed.

Here are some lessons I learnt in the first part of my journey:

  • Your first idea might not always be the best one. It's best to iterate, learn new stuff and experiment!
  • Don't underestimate standard libraries and utilities. With less overhead, they might be able to do a job really fast.
  • Scale matters. A difference of 1 second multiplied by a million is a lot. (More than 11.5 days, FYI.)

A Common-Sense Guide to Data Structures and Algorithms, Second EditionBig O notation can make your code faster by orders of magnitude. Get the hands-on info you need to master data structures and algorithms for your daily work.Jay WengrowLearn new stuff – this book was useful in helping me dive deeper into the underlying work that programmers do.

Route 3: Don't play games, send messages

After my life-changing experiment, I got really ambitious and wanted to try something even harder. I then got the idea that playing a game from start to finish isn't the only way for a computer to play Monopoly JR.

https://media.giphy.com/media/eOAYCCymR0OqSN4aiF/giphy.gif

As time progressed in lockdown, I explored the idea of using microservices (more specifically, I read this book). So instead of following the rules of the game, the program would send “messages” depicting what was going on in a game. The computer would pick up these messages, process them and hand them off. In other words, instead of a pool of workers each playing their own game, the workers would get their commands (or jobs) from a queue. It didn't matter what game they were playing. They just need to perform their commands.

A schematic of what messages or commands a worker might have to process to play a game of Monopoly JR.When the NewGameCommand is executed, it writes some information to a database and then puts a PlayRoundCommand in the queue for the next worker.

So, what I had basically done was chop the game into several independent parts. This was in response to certain drawbacks I observed in the original code. Some games took longer than others and this held back a worker as it furiously tried to finish it before it could move on to the next one. I hoped that it could finish more games quickly by making independent parts. Anyway, since they were all easy jobs, the workers would be able to handle them quickly in rapid succession, right?

It turns out I was completely wrong.

It was so traumatically slow that I had to reduce the number of games from 20000 to 200 just to take this screenshot.

Instead of completing hundreds or thousands of games in a second, it took almost 1 second to complete a game. You wouldn't be able to imagine how long 5 million seconds would now take. How much I would have to pay DigitalOcean again for their droplets? (Maybe $24.)

What slowed the program?

Tortoise 🐢 Photo by Craig Pattenaude / Unsplash

I haven't been able to drill down the cause, but this is my educated guess: The original command might be simple and straightforward, but I might have introduced a laborious overhead: operating a messaging system. As the jobs finished pretty quickly, the main thread repeatedly asked what to do next. If you are a manager, you might know why this is terrible and inefficient.

Experimenting again, I found the program to improve its times substantively once I allowed it to play several rounds in one command rather than a single round. I pushed it so hard that it was literally completing 1 game in 1 command. It never reached the heights of the original program using the python dictionaries though. At scale, the overhead does matter.

Best Practices — Dask documentationThinking about multiprocessing can be counterintuitive for a beginner. I found Dask's documentation helpful in starting out.

So, is sending messages a lousy solution? The experiment shows that when using one computer, its performance is markedly poor. However, there are circumstances when sending messages is better. If several clients and servers are involved in the program, and you are unsure how reliable they are, then a program with several independent parts is likely to be more robust. Anyway, I also think it makes the code a bit more readable as it focuses on the process.

I now think a better solution is to improve my original program.

Conclusion: Final lessons

Writing code, it turns out, isn't just a matter of style. There are serious effects when different methods are applied to a problem. Did it make the program faster? Is it easier to read? Would it be more conducive to multiprocessing or asynchronous operations? It turns out that there isn't a silver bullet. It depends on your current situation and resources. After all this, I wouldn't believe I became an expert, but it's definitely given me a better appreciation of the issues from an engineering perspective.

Now I only have to work on my previous code😭. The fun's over, isn't it?

#Programming #blog #Python #DigitalOcean #Monopoly #DataScience

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu

I run a docassemble server at work, ostensibly introducing co-workers to a different way of using templates to generate their agreements. It's been pretty useful, so much so that I use it myself for my work. However, due to the pandemic, it's not been easy to go and sell it. Maybe I am going to have better luck soon.

In the meantime, I decided to move the server from AWS to DigitalOcean.

Why move?

I liked the wide variety of features available on AWS, such as CodeCommit, Lambda and SES. DigitalOcean is not comparable in that regard. If I wanted to create a whole suite of services for my application, I would probably find something on AWS's glorious one-page filled with services.

However, with great functions come great complexity. I had a headache trying to exploit them. I was not going to be able to make full use of their ecosystem. (I shall never scoff at AWS certification anymore.)

On the other hand, I was more familiar with DigitalOcean and liked their straightforward pricing. So, if I wanted to move my pet project somewhere, I would have liked it to be in my backyard.

Let's get moving!

Lesson 1: Respect the shutdown

The docassemble docs expressly ask you to shut down your docassemble server gracefully. This is not the usual docker stop <container> command but with a timeout flag. It isn't fatal to forget the timeout flag in many simple use cases, so you would never actually notice it.

However, there's another way to kill your server in the cloud — flip the switch on your cloud instance on the management console. It doesn't feel like that when you click the red button, but it has the same effect. The cloud instance is sent straight to heaven, and there is nothing you can do about it.

The shutdown is important because docassemble does quite a lot of work when it shuts down. It dumps the database records in your storage. If the storage is located in the cloud (like AWS's S3 or DigitalOcean's Spaces), there is some lag when sending all the files there. If the shutdown is not respected, the server's state is not saved, and you might not be able to restore it when you start the container.

So with my AWS container gone in a cloud of dust, I found my files in my S3 storage were not updated. The last copy was over several months ago — the last time I had shut down my container normally. This meant that several months of work was gone! 😲

Lesson 2: Restore from backup

This blog could have ended on that sad note. Luckily for CloudOps newbies like me, docassemble automatically stores backups of the server state. These are stored in the backup folder of your storage and are arranged by date.

If you, like me, borked your docassemble server and set it back to August 2020, you can grab your latest backup and replace the main directory files (outside backup). The process is described in the docassemble docs here. Instead of having no users back in August 2020, I managed to retrieve all my users in the Postgres database stored in the backups. Phew!

Lesson 3: Check your config.yml file

After this exercise, I decided to go with a DigitalOcean Droplet and AWS S3. Given that I was already on S3 and the costs of S3 are actually fairly negligible, this seems like a cost-effective combo. DigitalOcean spaces cost $5 no matter how big they are, whereas my S3 usage rarely comes up to more than a dollar.

Before giving your new docassemble server a spin, do check your config.yml file. You can specify environment variables when you start a container, but once the server is running free, it uses the config.yml file found in the storage. If the configuration file was specially set for AWS, your server might not be able to run properly on DigitalOcean. This means you have to download the config.yml file on the storage (I used the web interface of S3 to do this) and edit it manually to fit your new server.

In my setup, my original configuration file was set up for an AWS environment. This meant that my EC2 instance used security policies to access the S3. At the time, it simplified the set-up of the server. However, my Droplet cannot use these features. Generate an access key and secret key, and input these details and more in your updated config.yml file. Oh, and turn off ec2.

If you are going to use Spaces, you will transfer the files in your old S3 to Spaces (I used s4cmd) and fill in the details of your S3 in the configuration file.

Conclusion

To be honest, the migration was essentially painless. The design of the docassemble server allows it to be restored from a single source of truth — the storage method you choose. Except for the problems that come from hand-editing your old config.yml (I had to type my SecretKey a few times 😢), you probably don't need to enter the docker and read initialize error logs. Given my positive experience, I will be well prepared to move back to AWS again! (Just kidding for now.)

#tech #docassemble #AWS #DigitalOcean #docker #OpenSource #tutorial #CloudComputing

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu

Feature image

Update 11 May 2020 : A few days after I wrote this post, Pi-Hole released version 5.0. Some of the new features impact the content here. Since it’s only been days, I have updated the content accordingly.

It was a long weekend, so it was time to play. Ubuntu 20.04 LTS just came out. This is important because of the “LTS” at the back of its name. I took the opportunity to upgrade “Ursula”, my home server. I have not been installing OSes like changing my clothes since High School, but I had big plans for this one.

Ad Blocking on a Network Level

Securing your internet is tough. I have “fond” memories of earlier days of the internet when browsing the internet exposed you to porn. How about flash movies that install software on your computer? It now seems quaint that people are surprised that they can be tricked over the internet with phishing and social engineering.

I value my privacy and I would like to control what goes on about me and my computers. I don’t like ads or tracking technologies. More people seem to be on my side on this one: with every browser claiming that they will block ads or trackers.

Browsers are important because they are the main window for ads or trackers. However, other activities also generate such risks, such as handphones, smart gadgets, and other internet-connected devices.

If you are accessing the internet outside of your browser, your browser won’t protect you. The more comprehensive solution is to protect on a network level.

To protect yourself on a network level, you will adjust your internet router settings and how your internet traffic is processed so that all requests are caught. A blacklist of trackers and suspicious websites is usually maintained. If a query meets the blacklist, they are not processed.

As you might expect, fidgeting with your internet router settings, finding out what your ISP’s upstream servers are, or even niggling around config files is very daunting for most users.

Enter the Pi-Hole

I first learned about Pi-Hole through the DigitalOcean Marketplace. It was great that it was designed for containers from the start, because I wanted “Ursula” to serve services using containers instead of the complexity of figuring out Ubuntu Linux’s oddities.

Home1. Install a supported operating systemYou can run Pi-hole in a container, or deploy it directly to a supported operating system via our automated installer.DPi-hole logotelekrmorPi-hole Web Page

Previously I implemented my internet blacklist using response policy zones in a bind9 server. I am not entirely sure how I did it… which would be a disaster if my server gets wiped out.

The best thing about dockers is that you would write the configuration in one file (like a docker-compose.yml for me) and it’s there. Once you have reviewed the configuration, you would just call docker-compose up and the program starts up for you.

Once you have the server running, you can ogle at its work with pi-hole’s gorgeous dashboard:

So many queries, so many blocked. ( Update 11/5/20 : Screenshot updated to show the new version 5.0 interface. So many bars now!)

I could make a few conclusions from the work of my Pi-Hole server so far:

  • Several queries were blocked from my handphone. This shows that phones are a hotbed for ad trackers. Since most of us use our phones for web browsing, advertising on the internet has not taken a hit even though more browsers feature some form of adblocking.
  • The second chart (labelled “Clients “Over time)”) roughly corresponds to the computers used during the day. During this circuit breaker period, you can see your work computers dialling “home”. At night, more home computers are sending queries.

Installation Headaches

Using Pi-Hole as a local LAN DNS server

My previous LAN DNS server was meant to serve DNS queries for my home network. My home server and Network Attached Storage device were its main customers. I also exposed some of the services (like my Plex) to the outside world. If my LAN server was not around, I will have to remember many octets (read IP addresses).

Update 11/5/2020 : In the original post, I complained about setting local LAN hostnames being hidden. Version 5.0 now allows you to set hostnames through the admin dashboard. This is one feature that I would be using! Turns out, it was quick and easy!

The dashboard used to add local DNS domains. New in version 5.0.

Installing Pi-Hole Behind a Traefik Server/Reverse Proxy

I didn’t wreck my Ubuntu 18.04 LTS server so that I could install Pi-Hole. I wanted to be able to serve several services through my Home Server without having to be limited by one set of 80 (HTTP) and 443 (HTTPS) ports. Pi-Hole uses both of those ports. I will not be able to have any more web servers.

A reverse proxy routes a request to the correct server. My forays with Nginx and the traffic server had not been successful. Traefik got me curious because it claimed it could automatically figure out configurations automatically. If I could get Traefik to work, Traefik could sort out how to have several applications on one host!

Traefik, The Cloud Native Application Proxy | Traefik LabsTraefik is the leading open-source reverse proxy and load balancer for HTTP and TCP-based applications that is easy, dynamic and full-featured.Traefik Labs: Makes Networking Boring

So getting Traefik to work was a priority, but I also really wanted to set up Pi-Hole first. Curiously, there are some resources on getting both to work together correctly. Since this was the first time I was using both Traefik and Pi-Hole, I needed to experiment badly. In the end, I went down with this configuration in my docker-compose file:

version: '3'

services: reverse-proxy: # The official v2 Traefik docker image image: traefik:v2.2 containername: traefik # Enables the web UI and tells Traefik to listen to docker command: —api.insecure=true —providers.docker ports: # The HTTP/HTTPS port – “80:80” – “443:443” # The Web UI (enabled by —api.insecure=true) – “8080:8080” volumes: # So that Traefik can listen to the Docker events – /var/run/docker.sock:/var/run/docker.sock – /home/houfu/traefik/:/etc/traefik/ environment: DOAUTH_TOKEN: [... Token provided by Digital Ocean for SSL certificate generation] restart: unless-stopped

### pi-hole

pihole: containername: pihole domainname: xxx.home hostname: pihole image: pihole/pihole:latest dns: – 127.0.0.1 – 1.1.1.1 ports: – '0.0.0.0:53:53/tcp' – '0.0.0.0:53:53/udp' #– '0.0.0.0:67:67/udp' – '0.0.0.0:8052:80/tcp' – “0.0.0.0:8443:443/tcp” volumes: – ./etc-pihole/:/etc/pihole/ – ./etc-dnsmasqd/:/etc/dnsmasq.d/ # run touch ./pihole.log first unless you like errors # – ./pihole.log:/var/log/pihole.log environment: ServerIP: 192.168.2.xxx PROXYLOCATION: pihole VIRTUALHOST: pihole.xxx VIRTUALPORT: 80 TZ: 'Asia/Singapore' WEBPASSWORD: PASSWORD DNS1: [VQ Server 1] DNS2: [VQ Server 2] restart: unless-stopped labels: # required when using —docker.exposedbydefault=false – “traefik.enable=true” # https://www.techjunktrunk.com/docker/2017/11/03/traefik-default-server-catch-all/ – “traefik.frontend.rule=HostRegexp:pihole.xxx,{catchall:.*}” – “traefik.frontend.priority=1” – “traefik.backend=pihole” – “traefik.port=80” – “traefik.port=443”

(Some private information, like the names of my private servers and the IP of my ISP’s DNS servers, have been anonymised.)

Conclusion

I could not have done this without the copious time at home created by the circuit breaker. For now, though, I hope I can run this and many experiments on this server and report it on this blog. Is there something I should try next? Let me know in the comments!

#blog #tech #docker #DigitalOcean #Updated #OpenSource

Author Portrait Love.Law.Robots. – A blog by Ang Hou Fu