Urbanization and Economic Development

Yohan Jeong
12 min readApr 1, 2020

--

Balance between Concentration and Efficiency

Photo by ben o'bro on Unsplash

It is well known that countries become urbanized as their economies grow. Here, I will explore how the world has became urbanized over time, and how urbanization looks different according to income levels.

Reading the Data

import numpy as np 
import pandas as pd
import geopandas as gpd
import seaborn as sns
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
# read the file of 'indicators'
data = pd.read_csv('...\Indicators.csv')

The data source for this analysis is the data set called ‘World Development Indicators’ in the Kaggle datasets. Among the several files under the title, ‘indicator.csv’ was used.

data.head()

This file contains thousands of various indicators for each country over several years. In this analysis, I am going to use Population in the largest city (% of urban population), Urban population (% of total), Population in the largest city (% of urban population), Population in urban agglomerations of more than 1 million (% of total population), and GDP per capita (constant 2005 US$).

data = data[data['IndicatorCode'].isin(['NY.GDP.PCAP.KD','EN.URB.LCTY.UR.ZS','SP.URB.TOTL.IN.ZS','EN.URB.LCTY.UR.ZS','EN.URB.MCTY.TL.ZS'])]

How did the urbanization of the world change?

First, I will show how the urbanization of the world changed through looking at the comparisons between 1960 and 2010. For this purpose, I will use the color maps for the world showing the levels of urbanization for each country.

The code below shows how I extract ‘Urban population (% of total)’ in the data set and make a data frame with columns of country, year, and urban_percent.

df = data[(data['IndicatorCode']=='SP.URB.TOTL.IN.ZS')&(data['Year'].isin([1960,2010]))] #'SP.URB.TOTL.IN.ZS' is 'Urban population (% of total)'df = df.pivot_table('Value', ['CountryName', 'Year'], 'IndicatorName')df.reset_index(inplace=True)
df['Year']=df['Year'].astype(int)
df.rename(columns={'CountryName':'country', 'Urban population (% of total)':'urban_percent','Year':'year'}, inplace=True)
df.head()

The next step is to match the data about urban_percent for each country to the world map provided from Geopandas.

# read the file for the world map using GeoPandas
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

The country names in df do not perfectly match the country names in world. So, I am going to check the difference in the country names between the two data sets.

country_data = list(df['country'].unique())
country_geo = list(world['name'])
country_diff = [country for country in country_data if country not in country_geo]country_diff['American Samoa','Andorra','Antigua and Barbuda','Arab World', 'Aruba','Bahamas, The','Bahrain','Barbados','Bermuda','Bosnia and Herzegovina','Brunei Darussalam','Cabo Verde','Caribbean small states','Cayman Islands','Central African Republic','Central Europe and the Baltics','Channel Islands','Comoros','Congo, Dem. Rep.',
..................'United States','Upper middle income','Venezuela, RB','Virgin Islands (U.S.)','West Bank and Gaza','World','Yemen, Rep.']

The list of the country names above shows the names of the countries in the df which do not match the names in world. Therefore, it is necessary to change the names of the countries in df in order to match the names in world. However, since the original lists of the country names in two data sets are not exactly the same, some countries in df won’t be used in this analysis.

The code below shows the steps of how the country’s names are changed in order to match two data sets, and the function for the color map is created.

CountryChanged = pd.DataFrame(df['country'].replace({'Bahamas, The':'Bahamas','Bosnia and Herzegovina':'Bosnia and Herz.','Brunei Darussalam':'Brunei','Central African Republic':'Central African Rep.','Congo, Dem. Rep.':'Dem. Rep. Congo','Congo, Rep.':'Congo', "Cote d'Ivoire": "Côte d'Ivoire", 'Czech Republic': 'Czechia', 'Dominican Republic': 'Dominican Rep.', 'Egypt, Arab Rep.': 'Egypt', 'Equatorial Guinea': 'Eq. Guinea', 'Gambia, The': 'Gambia', 'Iran, Islamic Rep.': 'Iran', 'Korea, Dem. Rep.': 'North Korea', 'Korea, Rep.': 'South Korea', 'Kyrgyz Republic': 'Kyrgyzstan', 'Lao PDR': 'Laos', 'Macedonia, FYR': 'Macedonia', 'Mauritius': 'Mauritania', 'Russian Federation': 'Russia', 'Slovak Republic': 'Slovakia', 'Solomon Islands': 'Solomon Is.','South Sudan': 'S. Sudan', 'Syrian Arab Republic': 'Syria', 'United States': 'United States of America', 'Venezuela, RB': 'Venezuela', 'Yemen, Rep.': 'Yemen'}))df['country'] =  CountryChanged# Separating the data by year
df1960 = df[df['year'] == 1960]
df2010 = df[df['year'] == 2010]
# combining two data sets
mapped1960 = world.set_index('name').join(df1960.set_index('country')).reset_index()
mapped2010 = world.set_index('name').join(df2010.set_index('country')).reset_index()#creating worldcolormap function
def Worldcolormap(VariableName, JointDF,TitleName):
variable = VariableName
fig,ax = plt.subplots(1, figsize=(15,15))
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="3%", pad=0.1)
JointDF.dropna(subset=[VariableName]).plot(column=variable, cmap ='Blues',ax=ax,cax=cax, linewidth=0.8, edgecolors='0.8', legend=True)
ax.set_title(TitleName, fontdict={'fontsize':20})
ax.set_axis_off()

Urban Population (% of Total Population) in 1960

So, let’s look at the color map for 1960 indicating the urban population (% of total population) in the world.

Worldcolormap('urban_percent', mapped1960, 'Urban Population (% of Total Population) in 1960')

In 1960, there is a vast difference in the levels of urbanization across the continents. It appears that Australia, North America, some parts of South America and European countries had high levels of urbanization already. Australia seems to have already reached 90% of urbanization. USA and Canada seem to be around 70%. On the other hand, most of the African countries and Southeast Asian countries appear to be under 30%.

One thing we can anticipate from the map is that there may be a possible association of the urbanization of a country with the country’s income levels. It is well known that firms and people need to be clustered in order to be efficient in their economic activities. Therefore, as the economy of a country grows, more firms and population gather in some clusters, and this process encourages more urbanization in countries.

Urban Population (% of Total Population) in 2010

Then, how did the urbanization of the world change in 2010?

Worldcolormap('urban_percent', mapped2010, 'Urban Population (% of Total Population) in 2010')

In contrast to the situation in 1960, many countries in the world seem to achieve high levels of urbanization in 2010. As mentioned earlier, this growth of urbanization may be related to the growth of the world economy.

In the next chapter, whether countries’ income levels are related to the urbanization of the countries will be examined.

Economic Levels and Urbanization

From the maps shown earlier, it appears that a country’s economic level and urbanization seem to have some relationship. Instead of checking this relationship for all the individual countries, I will use the groups of countries according to their income levels. ‘Indicators.csv’ includes the data for high imcome countries (OEC), middle income countries (MIC), and lower income countries (LIC). Because some of the indicators necessary for this analysis are missing for high income countries (HIC), I will replace it for high income countries in OECD (OEC) representing the group of the countries with the level of high income.

Income Level Difference

First, compare the income levels for those groups.

# GDP per capita for three groups of the countries
hic_gdpcapita =data[(data['CountryCode']=='OEC')&(data['IndicatorCode']=='NY.GDP.PCAP.KD')]
mic_gdpcapita =data[(data['CountryCode']=='MIC')&(data['IndicatorCode']=='NY.GDP.PCAP.KD')]
lic_gdpcapita =data[(data['CountryCode']=='LIC')&(data['IndicatorCode']=='NY.GDP.PCAP.KD')]
# Line Plot for GDP per capita over Time
plt.figure(figsize=(8,7))
plt.style.use('seaborn-darkgrid')
palette = plt.get_cmap('Set1')

plt.plot(hic_gdpcapita.Year, hic_gdpcapita.Value, marker='', color=palette(1), linewidth=1, alpha=0.9, label = 'High Income Countries')
plt.plot(mic_gdpcapita.Year, mic_gdpcapita.Value, marker='', color=palette(2), linewidth=1, alpha=0.9, label = 'Middle Income Countries')
plt.plot(lic_gdpcapita.Year, lic_gdpcapita.Value, marker='', color=palette(3), linewidth=1, alpha=0.9, label = 'Low Income Countries')

plt.legend(loc=2, ncol=1, fontsize=15)
plt.title('GDP per capita', loc='center', fontsize=15, fontweight=0)
plt.xlabel("Year", fontsize=15)
plt.ylabel("constant 2005 US$", fontsize=15)

The graph above clearly shows that there are huge differences in income levels among the groups. High income countries started at around $10,000 in 1960 and kept growing in their income level. Their income levels reached $35,000 in the middle of 2000’s. On the other hand, middle income countries stayed between $1,000 and $5,000 during the same period. The income levels of the low income countries seem to be under $1,000 during the entire period.

Urbanization by Income Levels

Using the various indicators in ‘Indicators.csv’, different measures can be used to measure urbanization in addition to the Urban population (% of total population). Here, I will also use Population in urban agglomerations of more than 1 million (% of total population) and Population in the largest city (% of urban population) as well.

Urban population (% of total population) tells us the percentage of the total population who reside in any cities. On the other hand, Population in urban agglomerations of more than 1 million (% of total population) indicates how many people live in the large cities which have more than 1 million people. Population in the largest city (% of urban population) is how many people live in the largest city in the country.

# urban population (% of total population) for each income level
hic_city =data[(data['CountryCode']=='OEC')&(data['IndicatorCode']=='SP.URB.TOTL.IN.ZS')]
mic_city =data[(data['CountryCode']=='MIC')&(data['IndicatorCode']=='SP.URB.TOTL.IN.ZS')]
lic_city =data[(data['CountryCode']=='LIC')&(data['IndicatorCode']=='SP.URB.TOTL.IN.ZS')]
# population in urban agglomerations of more than 1 million (% of total population) for each income level
hic_large =data[(data['CountryCode']=='OEC')&(data['IndicatorCode']=='EN.URB.MCTY.TL.ZS')]
mic_large =data[(data['CountryCode']=='MIC')&(data['IndicatorCode']=='EN.URB.MCTY.TL.ZS')]
lic_large =data[(data['CountryCode']=='LIC')&(data['IndicatorCode']=='EN.URB.MCTY.TL.ZS')]
# population in the largest city (% of urban population) for each income level
hic_largest =data[(data['CountryCode']=='OEC')&(data['IndicatorCode']=='EN.URB.LCTY.UR.ZS')]
mic_largest =data[(data['CountryCode']=='MIC')&(data['IndicatorCode']=='EN.URB.LCTY.UR.ZS')]
lic_largest =data[(data['CountryCode']=='LIC')&(data['IndicatorCode']=='EN.URB.LCTY.UR.ZS')]

Since the indicator for population in the largest city is for % of urban population, not of total population, it is necessary to convert this data for the percentage of total population for the comparison between indicators.

What we need for this calculation is population in the largest city (% of total population). If A = population in the largest city, B = urban population, and C = total population, then population in the largest city (% of total population) = (A/C)x100. We have population in the largest city (% of urban population), which is (A/B)x100, and urban population (% of total population), which is (B/C)x100. This means that we can get data for population in the largest city (% of total population) by computing ((A/B)x100)x((B/C)x100)x(1/100), which is population in the largest city (% of urban population) x urban population (% of total population) x (1/100).

# Creating 'population in the largest city (% of total population)' for high income countries
hic_largest = hic_largest.rename(columns={'Value':'LargestUrban'})
hic_largest = pd.merge(hic_largest, hic_city, how='inner', on=['CountryName','CountryCode','Year'])
hic_largest['LargestTotal'] = (hic_largest['LargestUrban']*hic_largest['Value'])/100
hic_largest = hic_largest.drop(columns=['IndicatorName_x','IndicatorCode_x','IndicatorName_y','IndicatorCode_y','LargestUrban','Value'])
hic_largest = hic_largest.rename(columns={'LargestTotal':'Value'})

# Creating 'population in the largest city (% of total population)' for middle income countries
mic_largest = mic_largest.rename(columns={'Value':'LargestUrban'})
mic_largest = pd.merge(mic_largest, mic_city, how='inner', on=['CountryName','CountryCode','Year'])
mic_largest['LargestTotal'] = (mic_largest['LargestUrban']*mic_largest['Value'])/100
mic_largest = mic_largest.drop(columns=['IndicatorName_x','IndicatorCode_x','IndicatorName_y','IndicatorCode_y','LargestUrban','Value'])
mic_largest = mic_largest.rename(columns={'LargestTotal':'Value'})
# Creating 'population in the largest city (% of total population)' for low income countries
lic_largest = lic_largest.rename(columns={'Value':'LargestUrban'})
lic_largest = pd.merge(lic_largest, lic_city, how='inner', on=['CountryName','CountryCode','Year'])
lic_largest['LargestTotal'] = (lic_largest['LargestUrban']*lic_largest['Value'])/100
lic_largest = lic_largest.drop(columns=['IndicatorName_x','IndicatorCode_x','IndicatorName_y','IndicatorCode_y','LargestUrban','Value'])
lic_largest = lic_largest.rename(columns={'LargestTotal':'Value'})
# define a function for drawing plots
def Urbantrends(TitleName, Highincome, Middleincome, Lowincome):

plt.plot(Highincome.Year, Highincome.Value, marker='', color=palette(1), linewidth=1, alpha=0.9, label = 'High Income Countries')
plt.plot(Middleincome.Year, Middleincome.Value, marker='', color=palette(2), linewidth=1, alpha=0.9, label = 'Middle Income Countries')
plt.plot(Lowincome.Year, Lowincome.Value, marker='', color=palette(3), linewidth=1, alpha=0.9, label = 'Low Income Countries')
plt.legend(loc=2, ncol=1, fontsize=10)
plt.title(TitleName, loc='center', fontsize=15, fontweight=0)
plt.xlabel("Year", fontsize=15)
plt.ylabel("% of Total Population", fontsize=15)
plt.ylim(0,100)
# Graphs for Urbanization
plt.figure(figsize=(18,6))
plt.style.use('seaborn-darkgrid')
palette = plt.get_cmap('Set1')

plt.subplot(131)
Urbantrends('Population in Any Cities', hic_city, mic_city, lic_city)

plt.subplot(132)
Urbantrends('Population in Large Cities', hic_large, mic_large, lic_large)

plt.subplot(133)
Urbantrends('Population in the Largest City', hic_largest, mic_largest, lic_largest)

The graphs above show how population for each category of cities has changed over time. In the left graph, the urbanization (% of total population) for high income countries was over 60% even in 1960. This shows that more than half of the total population live in urban areas regardless of the size of cities. This pattern continued to increase and 80% of urbanization was reached. The urbanization of the middle and low income countries are much lower than that of the high income countries.

In the middle graph, the high income countries start at around 30% and almost reached 40% in 2010’s. On the other hand, the middle and low income countries were below 20% over most of the entire period.

One noticeable thing in the right graph is that the population ratio in the largest city for high income countries did not increase much compared to the other levels of cities. We might think that the economic growth is related to the growth of the largest city in the country. For example, the economic growth of the US seems to be related to the growth of New York city. However, the data tells us that the growth of the largest city in a country is not the main factor for economic growth. This can be supported by looking at data from low income countries. For low income countries, the growth of the largest city is noticeable compared to other income groups. To see this from a different angle, let’s look at the graphs by income levels.

def Urbantrends(TitleName, Largest, Large, Any):

plt.plot(Largest.Year, Largest.Value, marker='', color=palette(1), linewidth=1, alpha=0.9, label = 'Population in the Largest City')
plt.plot(Large.Year, Large.Value, marker='', color=palette(2), linewidth=1, alpha=0.9, label = 'Population in Large Cities')
plt.plot(Any.Year, Any.Value, marker='', color=palette(3), linewidth=1, alpha=0.9, label = 'Population in Any Cities')
plt.legend(loc=2, ncol=1, fontsize=10)
plt.title(TitleName, loc='center', fontsize=15, fontweight=0)
plt.xlabel("Year", fontsize=15)
plt.ylabel("% of Total Population", fontsize=15)
plt.ylim(0,100)
plt.figure(figsize=(18,6))
plt.style.use('seaborn-darkgrid')
palette = plt.get_cmap('Set1')

plt.subplot(131)
Urbantrends('High Income Countries', hic_largest, hic_large, hic_city)

plt.subplot(132)
Urbantrends('Middle Income Countries', mic_largest, mic_large, mic_city)

plt.subplot(133)
Urbantrends('Low Income Countries', lic_largest, lic_large, lic_city)

Each graph above shows the change of urbanization for each income group. Comparing the proportion of the population in the largest city, the low income countries have about more than 1/3 urban population in the largest city while the other income groups have much lower proportions.

In these graphs, we should keep in mind that each category of urbanization includes the population in the higher city classification of indicators. For example, if the population in the large cities for high income countries is 30%, this number includes the population in the largest city as well. Considering this, the population in the large cities for the low income countries seem to be very little compared to other size of cities.

In the next graphs, this is more clearly revealed.

# high income countries
hic_city = hic_city.rename(columns={'Value':'pop_any'})
hic_large = hic_large.rename(columns={'Value':'pop_large'})
hic_city = pd.merge(hic_city, hic_large, how='inner', on=['CountryName','CountryCode','Year'])
hic_city['Medium or Small'] = hic_city['pop_any']-hic_city['pop_large']
hic_largest = hic_largest.rename(columns={'Value':'Largest'})
hic_large = pd.merge(hic_large, hic_largest, how='inner', on=['CountryName','CountryCode','Year'])
hic_large['Large'] = hic_large['pop_large']-hic_large['Largest']
hic_large = pd.merge(hic_large, hic_city, how='inner', on=['CountryName','CountryCode','Year'])
hic = hic_large[['CountryName', 'Year', 'Largest','Large', 'Medium or Small']]
hic = hic.loc[hic['Year'].isin([1960,1970,1980,1990,2000,2010])]
# middle income countries
mic_city = mic_city.rename(columns={'Value':'pop_any'})
mic_large = mic_large.rename(columns={'Value':'pop_large'})
mic_city = pd.merge(mic_city, mic_large, how='inner', on=['CountryName','CountryCode','Year'])
mic_city['Medium or Small'] = mic_city['pop_any']-mic_city['pop_large']
mic_largest = mic_largest.rename(columns={'Value':'Largest'})
mic_large = pd.merge(mic_large, mic_largest, how='inner', on=['CountryName','CountryCode','Year'])
mic_large['Large'] = mic_large['pop_large']-mic_large['Largest']
mic_large = pd.merge(mic_large, mic_city, how='inner', on=['CountryName','CountryCode','Year'])
mic = mic_large[['CountryName', 'Year', 'Largest','Large', 'Medium or Small']]
mic = mic.loc[mic['Year'].isin([1960,1970,1980,1990,2000,2010])]
# low income countries
lic_city = lic_city.rename(columns={'Value':'pop_any'})
lic_large = lic_large.rename(columns={'Value':'pop_large'})
lic_city = pd.merge(lic_city, lic_large, how='inner', on=['CountryName','CountryCode','Year'])
lic_city['Medium or Small'] = lic_city['pop_any']-lic_city['pop_large']
lic_largest = lic_largest.rename(columns={'Value':'Largest'})
lic_large = pd.merge(lic_large, lic_largest, how='inner', on=['CountryName','CountryCode','Year'])
lic_large['Large'] = lic_large['pop_large']-lic_large['Largest']
lic_large = pd.merge(lic_large, lic_city, how='inner', on=['CountryName','CountryCode','Year'])
lic = lic_large[['CountryName', 'Year', 'Largest','Large', 'Medium or Small']]
lic = lic.loc[lic['Year'].isin([1960,1970,1980,1990,2000,2010])]
fig, (ax1, ax2, ax3) = plt.subplots(1,3)
fig.set_size_inches(18,6)
hic.plot.bar(x='Year', ax=ax1).legend(loc='upper left')
ax1.set_ylim((0,50))
ax1.set_title("High Income Countries", fontsize = 15)
ax1.set_ylabel("% of Total Population", fontsize = 15)
mic.plot.bar(x='Year', ax=ax2).legend(loc='upper left')
ax2.set_ylim((0,50))
ax2.set_title("Middle Income Countries", fontsize = 15)
lic.plot.bar(x='Year', ax=ax3).legend(loc='upper left')
ax3.set_ylim((0,50))
ax3.set_title("Low Income Countries", fontsize = 15)

The graphs above show the Net population percentage for each category. The percentage for each category does not include the percentage for the higher city classification. For high and medium income countries, the greatest population are in medium or small cities, and the next largest population are in the large cities. The population in the largest city contains the least population of the country.

However, for the low income countries, the population in large cities is less than the population in the largest city. This means that among the entire population only 15~30% of the population live in urban areas, and almost one third of them live in one city. With this city structure, it’s hard to imagine that many manufacturing industries can spread out over the entire country and work efficiently. Compared to the low income countries, high income countries have most of their entire population in cities. And the population do not concentrate in a few cities, but spread out across the country. This shows that not only concentration but also efficiency are important for the economic growth of a country.

Conclusion

Urbanization clearly is one of the main factors that is closely related to the economic growth of a country. The reason why I do not say that urbanization ‘affects’ the economic growth is that we have not gone over the causality relationship between them yet. However, we can still say that urbanization is a key factor that each country must consider for economic growth.

Yet, concentration should not be the only factor considered. Concentration of economic activities in only one city may overwhelm the function of the city and actually may lower the efficiency of economic activities. To achieve the efficiency of economic activities, spreading the population throughout the country is another thing to be considered as well.

--

--

Yohan Jeong

Business Analyst at Samsung Electronics America. PhD in Economics from University of California, Davis.