I am looking for an apartment in Berlin for the third time in my life. For the first two times, I was looking for a place to hang my hat. So, I was not very picky about it. This time I am pickier because I have leverage as my current apartment is decent and I don’t need to move out anytime soon.
What I want at the end is an affordable apartment near my office. This means I am looking for an apartment with a cold rent below the Berlin average, in Kreuzberg or in the quarters nearby.
My questions are:
There are probably great reports on the real estate market in Germany that could answer these questions, but they all hide behind a very tall paywall. So, I decided to scrape data from ImmobilientsScout24, one of the biggest real estate websites in Germany.
I scraped the data from Immobilienscout24 using a script that I have found here.
You can find the all data cleaning and analysis code in Python on my GitHub page.
You can find this project’s Tableau Dashboard here.
First I load up the scraped data and start exploring it:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from matplotlib.pyplot import figure
df_orig = pd.read_csv('.../DE_rentals.csv',sep=';')
df = df_orig.copy()
df.head()
There are some unnecessary column. I’ll remove them:
df.drop(['url', 'firstname', 'lastname', 'phoneNumber','lat',
'lon', 'creation', 'ID', 'address'], axis=1, inplace=True)
In the dataset some values are marked with "no value" or "None". I should change that to standard null value:
df = df.replace({"no value":pd.NA})
df = df.replace({"None":pd.NA})
for col in df.columns:
pct_missing = (np.mean(df[col].isnull())*100)
print('{} - {}%'.format(col, pct_missing))
About 21 per cent of listings have their company column missing. These listings are probably from private sellers. I’ll drop rows where company column is missing and not offered by a private person. Then, I’ll fill the remaining null “Company rows with the string “Private Seller”: