Shengen-visa.jpg

In this post, I'll be discussing the intriguing relationship between the number of asylum applications and Schengen visa rejection rates. In the past decades, political unrest, humanitarian crises, or sheer poverty have forced many people to seek refuge in Schengen countries. This made me wonder if the increase in asylum applications could be affecting the Schengen visa rejection rates for applicants from these troubled countries.

My question is: Is there a correlation between the increasing number of asylum applications from a country's citizens and the subsequent rise in rejection rates for Schengen visa applications from that country's residents?

You find out about the answer in this post but you can also find it on my GitHub page.

Additionally, I created an interactive Tableau dashboard where you can discover new insights about Schengen visa statistics.

For this post, I used the Schengen visa statistics published by the European Commission and Eurostat asylum application statistics.

Data Cleaning

Here are the some of the columns in this dataframe and what they mean:

app_country Country where the applications was made
schengen_state Schengen state that applicant applied to
consulate Schengen state consulate
app Number of Schengen visa applications
rejected Number of Schengen visa rejections
asyl_app Asylum applications made by nationals of the "application country" in "Schengen" states

Creating the Dataframe

The data of each year is stored in different excel sheets, and a third of them are not formatted the same. However, this is easily solvable with Python.

I stored the similarly-formatted excel files in the same folder and loaded them onto Pandas. After merging them together, I used "re.findall()" to assign the proper year based on the file name which includes the year.

df = pd.DataFrame()
for f in glob.glob('.../*.xlsx'):
    data = pd.read_excel(f, sheet_name='Data for consulates')
    data['year'] = int(re.findall(r'.*([1-3][0-9]{3})', f)[0])
    df = df.append(data,ignore_index=True)

df = df.iloc[:, [0,1,2,8,9,10,12,13,-1]]

Then, I selected the relevant columns and renamed them:

column_names = ["schengen_state", "app_country", "consulate", "app","total_issued", "mev_issued", 
                "ltv", "rejected", 'year']
df.columns = column_names

Now, I need to reformat the ones that are formatted properly, starting with 2013. I add the year column manually. Then, I standardised the column names, I made sure that the original dataframe (df) and 2013 dataframe (df2013) are aligned, and replaced latter's column names with the former:

df2013 = pd.read_excel(".../synthese_2013_with_filters_en.xls", sheet_name='Complete data')
df2013['year'] = 2013
df2013 = df2013.iloc[:, [0,1,2,8,9,10,12,13,-1]]
df2013.columns = df.columns

I replicated the same process for the excel files for the years between 2009 and 2012. However, each required special attention, so I could not automate this process fully.

After I was done with this process, I put the "delinquent" data frames in a list and merged with the main data frame:

for g in df_list:
    df = df.append(g,ignore_index=True)