I found a promising database about historical buildings in Berlin. For me as a architecture lover, it was a treasure of more than 12.000 buildings. It contains information about the architects, building year, historical facts and a lot more.
The city of Berlin provides two different data sources. One text file which is free structured and therefore hard to parse. And one online platform, which provides geo locations and structured data in html tables. The website is a great tool, but there is one downside. You can’t really explore the content. You have to know what you are looking for. In that case there is a lot of interesting content to explore.
My idea was to solve that issue and create a visualisation/tool that gives you different ways to explore the dataset. A good example how it could work is the visulisation for the German Digital Library by Christian Bernhardt, Gabriel Credico, Christopher Pietsch and Prof. Dr. Marian Dörk. It combines a time chart, text clouds and networks. I would add an omnipresent map and the pictures of the buildings. It felt like a good thing to do for the next couple of weeks.
I started to gather the data with those goals in mind. I decided to deal with the online platform, because it seamed easier to get the full dataset in a structured form. So I wrote a python scraper that downloaded the complete website over the timespan of three days. I reduced the speed of the scraper to prevent overloading the web server. After that time, I got 12.000+ HTML files to parse and started to check the distribution of the data attributes to understand which attributes I could use for the visualisation.
During the analysis of the attributes appeared one big issue. The data from the website had the same structural problems like the text file. The attributes were different between the buildings and unsharp in there detail level. I realised after some time that it would be quite hard to generalise the data without changing too much of the original information. So I decided to stop working on the project although it was a hard decision after one week of work.
So when should you break a personal project ?
I just have an hour per day to work on personal projects, so it takes me quite long to realise them. After the experience with the last project I understood two important rules for working on my projects: The projects should be small in general. Furthermore the inner steps should be reachable in a short amount of time. I guess I can work on a project over a longer span if I’m able to create a flow of working and reaching small goals every now and then.