Van Gogh in images on Wikipedia

Van Gogh data visualization

I’m a love the art of Vincent Van Gogh since I was little. I like his color palette and how he interprets his surrounding environment. With this project, I got the chance to learn more about his life and work.

Lionel Michel showed me his project Geolinguistic Contrasts in Wikipedia. I liked the idea to take one wikipedia article and to compare the different language versions. It is interesting to see their similarities and differences.

First exploration

I started the project by gathering interesting data from Wikipedia. Wikipedia has a great API, that makes it easy to use their data. I downloaded all images from four different languages to get a better understanding of the data.

First exploration of Van Gogh's images

The first exploration is quite simple. It demands for a kind of sorting or clustering to see similarities between languages.

Refined first exploration

Refined first exploration of Van Gogh's images

The refined version makes it easier to spot the most used images, but it still misses the bigger picture.

Second exploration

I tried to find more data for the next exploration. I used the api again to scrape most of the image ages. I also added manually the authorship and the category of the images to get a better understanding how different languages select their images.

Second exploration of Van Gogh's images

Those are the first prototypes how the collected data looked like. I tried different ways to represent the data. Here you can see it side by side as well as together. I decided to go with the first approach to make it more readable. Both version reminded me on music notes. I thought about flipping them by 90 degrees to get even closer to the notes, but I decided against it to represent the flow of the page in the right way.

Second exploration of Van Gogh's images

I refined the first prototype and made the whole visualization more compact. I also thought about connecting all images between different languages, but it made the visualization too noisy.

The last version

The last version, but not the final. I guess it would need more time and feedback to make the visualization more useful for others.

First exploration of Van Gogh's images

The visualization growed over the time. It shows the year, category and the number of occurrences of each image. The reader can see the connections between the languages through the images via mouse over.


After a long break from visualizing data, I started the project out of curiosity for the topic and to get back to data viz. It is nice to create something for your self, but the missing of a real target audience makes it less interesting for others.

From a development perspective: I tried to avoid any additional Javascript framework to keep it simple for the beginning. I guess, I have to invest more time in structuring the code in the next projects.


How are your favourite TV shows connected ?

This is the next version of the tv show connection series. This time the reader can select his/her favourite show and sees the people which connect them. The selection of the individual shows makes it easier to show all their connected people, but it is harder to get a proper overview.

Tv Shows


The diversity of career paths of oscar winners

The imdb dataset is rich of interesting data to play with. This time I had a look at the diversity of career paths of actors and actresses.

Tv Shows

I learned about Simpson-Index that scientists use to measure diversity in data sets. It was quiet interesting to compare the visual output with that single index. The index alone is quiet handy to find actors/actresses or sort them by diversity. The visualisation on the other side show more about pattern and how people evolved through the years.


TV Shows and their connections by their writers and directors

After watching a tv show, I’ve got interested how the production of the tv shows works. I downloaded a dataset from IMDB and started to explore the data. Here you can see two different views on the connection between different show through their writers and directors.


Tv Shows


Tv Shows

It is interesting to see that both visualisations have their own purpose.


Historical buildings in Berlin and about when to break up with a personal project


I found a promising database about historical buildings in Berlin. For me as a architecture lover, it was a treasure of more than 12.000 buildings. It contains information about the architects, building year, historical facts and a lot more.

berlin map

The city of Berlin provides two different data sources. One text file which is free structured and therefore hard to parse. And one online platform, which provides geo locations and structured data in html tables. The website is a great tool, but there is one downside. You can’t really explore the content. You have to know what you are looking for. In that case there is a lot of interesting content to explore.

berlin map


My idea was to solve that issue and create a visualisation/tool that gives you different ways to explore the dataset. A good example how it could work is the visulisation for the German Digital Library by Christian Bernhardt, Gabriel Credico, Christopher Pietsch and Prof. Dr. Marian Dörk. It combines a time chart, text clouds and networks. I would add an omnipresent map and the pictures of the buildings. It felt like a good thing to do for the next couple of weeks.


I started to gather the data with those goals in mind. I decided to deal with the online platform, because it seamed easier to get the full dataset in a structured form. So I wrote a python scraper that downloaded the complete website over the timespan of three days. I reduced the speed of the scraper to prevent overloading the web server. After that time, I got 12.000+ HTML files to parse and started to check the distribution of the data attributes to understand which attributes I could use for the visualisation.


During the analysis of the attributes appeared one big issue. The data from the website had the same structural problems like the text file. The attributes were different between the buildings and unsharp in there detail level. I realised after some time that it would be quite hard to generalise the data without changing too much of the original information. So I decided to stop working on the project although it was a hard decision after one week of work.

So when should you break a personal project ?

I just have an hour per day to work on personal projects, so it takes me quite long to realise them. After the experience with the last project I understood two important rules for working on my projects: The projects should be small in general. Furthermore the inner steps should be reachable in a short amount of time. I guess I can work on a project over a longer span if I’m able to create a flow of working and reaching small goals every now and then.


The wealth of the world #2

I  followed up the findings from the last experiment, that shows the difference between the distribution of wealth. I looked for other ways to visualize that. I guess the easiest way would be to show ratio of wealth per adult of a country. Unfortunately I think this solution doesn’t show the contrast between the rich and poor in the right relation.

The first graph shows the distribution of wealth in the whole world and the amount of adults per country. It is interesting to see how the distribution changes by switching from wealth to adults. The drawback of this visualization is that the reader has to memorize the former position. Furthermore the reader can’t see poorer countries because of their size.

The second graph tries to improve those disadvantages. The reader can select a country. The graph shows the wealth of the selected country on the left side. On the right side the reader sees how many other countries you would need to reach the same amount. And now it becomes interesting: Underneath the wealth you can see the amount of adults. On the left you see the selected country vs the added up ones on the right . It is easier to see the differences now, but the areas of the treemap still makes it too hard to compare the countries.

The third graph shows the same information like the previous one. It is now easier to compare the different countries, but it misses on elegance. I liked the second approach, but it is still hard to read. I guess the purpose of the graph should decide which one I should pick.


The wealth of the world

Oxfam published an article that the world 62 richest people own the same wealth as the 3.6 billion poorest people. Those two numbers made me curious. I downloaded Credit Suisse’s The Global Wealth Databook 2015 and scraped the Forbes Billionaires List. It was not so easy to get to the same results as Oxfam. So I decided to only concentrate on the wealth of the countries and their differences to each others.

wealth of the world 1

The first chart shows the wealth of the countries and the amount of adults per country. The outcome isn’t stunning. Most of the countries are in the left corner and overlapping each other. The big outliers are China and India, but you can’t see anything new.

wealth of the world 2

The second chart shows the same information in a different way. It is maybe not the right way to show it, but is helps me to spot some findings. The chart adds up all countries and their adults. Each triangle represents a country and the form of the triangle the relation of wealth and adults. You can see India with a huge amount of adults, but with a smaller wealth than for example Switzerland.


Understanding the dataset a little bit to late

Finally, I understood the data set and its limitations, while working on the next tree exploration.

My former explorations tried to show the evolution of planting trees. While trying to achieve that, I didn’t realized that it is not possible with that data set. The data shows only the existing trees and their age. All the trees which don’t exist any more are not in there. I guess a huge amount of trees burned down during the war. All these missing trees falsify the image. There is no evolution possible with als those missing trees.


Evolution of planting trees in three german cities


I’ve got inspired by the annual ring of a tree to make that visualization. I have to say the outcome is not that impressive. You can maybe see, that the 80s were the time of the trees, but it is quite hard to read. I have to try something else.


Planted Street Trees per year between 1800 — 2015


The first visualization shows the annual amount of planted street trees. I used the time range from 1800, to make it easier to compare. Berlin, for example has data before 1800, but the amount of trees was to small.

You can see an increase of planting trees in all three cities until 1990. The planting was mainly done every 5 years.


Trees of Berlin

The tree subject started at KIKK conference in Namur, Beligum. Sebastian Sadowski showed me his new side project. He got interested in the trees in front of his house and looked for a way to know more about them. He found a dataseet of the berlin city, that holds all street trees planted over the year.

I got inspired by his interest and started my own journey through the data. I will create a bunch of small explorations to find something interesting.