Beta-Convergence Analysis of Country Growth with Python

What is β-Convergence and Why is it Important?

β-Convergence is an economic concept that examines whether poorer economies (or those with lower initial growth rates) tend to grow faster than richer ones, thus gradually "catching up" over time. In this project, β-convergence is analyzed by regressing the GDP per capita growth rate in a later period (e.g., 2019-2024) against the growth rate in an earlier period (e.g., 2004-2008). A negative regression coefficient (β < 0) indicates convergence, as countries with lower initial growth rates exhibit higher subsequent growth. Conversely, a positive β suggests divergence, where faster-growing countries continue to grow rapidly.

Why is it important? β-Convergence is a key concept in economic growth theory, which predicts that economies with lower initial income levels should grow faster due to diminishing returns to capital. Understanding convergence helps policymakers assess whether economic disparities between countries (e.g., emerging vs. developed economies) are narrowing over time. For instance:

For emerging economies, evidence of convergence could indicate successful development policies or access to global markets.
For developed economies, convergence may reflect economic stabilization or recovery after crises (e.g., the 2008 financial crisis or COVID-19).
Analyzing convergence across different time periods provides different perspectives into how global events, such as financial crises or pandemics, affect the growth.

In this project, I investigate β-convergence by comparing emerging and developed countries over multiple periods (2004-2024). By calculating regression coefficients and R² values, I assess whether convergence occurs and how it differs between country groups. This analysis not only tests economic theory but also demonstrates the application of data science techniques to real-world macroeconomic data.

Dataset Description

We'll be working with GDP per capita growth data from the World Development Indicators (WDI) between 2004 and 2024. Each row in the dataset represents a different country and contains information on GDP growth rates over the specified periods. Here are some of the key variables in the dataset:

Country Code — The unique identifier for each country (e.g., USA, LUX, JPN, IRL, KOR).
Group — Classification of countries into Emerging or Developed groups.
Period — Time periods for which the growth rates are calculated.
Growth_rate — The GDP growth rate for the respective period.
Beta — The beta coefficient indicating convergence or divergence in economic growth within the group:
- If β < 0: Convergence — The gap gets smaller, balanced growth.
- If β > 0: Divergence — The gap gets larger, uneven growth.

Import libraries and data cleaning

We set up the environment by importing necessary libraries, loading the WDI dataset, cleaning column names, removing empty rows, and calculating average growth rates for each period. The resulting dataset contains country codes and growth rates for the four periods.

# Import packages
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Define, clean and partitionate the data
df_wdi = pd.read_csv(r"C:\Users\Usuario\Downloads\gdp_perc_growth_2004_2024.csv", sep=";", decimal=',')
df_wdi = df_wdi.dropna()
df_wdi.columns = [col.split(' [')[0] if ' [' in col else col for col in df_wdi.columns]

Data partition

We set the groups and times and group properly the countries, then we sample the output, which is rounded to two decimals to check:

# Data partitioning in groups
df_wdi['Pre_Crisis_Growth (2004-2008)'] = df_wdi.loc[:, '2004':'2008'].mean(axis=1, skipna=True)
df_wdi['Recuperation_Growth (2009-2013)'] = df_wdi.loc[:, '2009':'2013'].mean(axis=1, skipna=True)
df_wdi['Stability_Growth (2014-2018)'] = df_wdi.loc[:, '2014':'2018'].mean(axis=1, skipna=True)
df_wdi['Recent_Growth (2019-2024)'] = df_wdi.loc[:, '2019':'2024'].mean(axis=1, skipna=True)

results = df_wdi[['Country Code', 'Pre_Crisis_Growth (2004-2008)', 'Recuperation_Growth (2009-2013)', 
                  'Stability_Growth (2014-2018)', 'Recent_Growth (2019-2024)']]
                  
emerging = ['CHL', 'POL', 'ETH', 'VNM', 'IND', 'EGY', 'IDN', 'PER', 'MAR', 'PHL', 'NGA', 'BGD', 'PAK', 'MOZ']
developed = ['USA', 'CHE', 'LUX', 'GBR', 'KOR', 'IRL', 'DEU', 'ESP', 'JPN', 'CAN', 'AUS', 'NOR', 'NLD', 'DNK', 'SWE', 'FIN']

df_wdi['Group'] = df_wdi['Country Code'].apply(lambda x: 'Emerging' if x in emerging else 'Developed' if x in developed else 'Other')

results.round(2)

We extract the results, partiotioned by periods:

Country Code  Pre_Crisis_Growth (2004-2008)  Recuperation_Growth (2009-2013)  Stability_Growth (2014-2018)  Recent_Growth (2019-2024)
USA                   1.49                              0.40                            1.79                            0.98
LUX                   2.48                             -0.85                            0.28                           -0.41
CHE                   2.40                              0.02                            1.02                            0.27
CHL                   4.46                              3.07                            0.87                           -0.68
POL                   5.18                              2.69                            4.59                            2.78
ETH		      6.61				7.30				6.38				3.77
VNM		      5.52				3.63				5.61				2.61
IRL		      0.01			       -0.78				5.07				1.63
GBR		      1.27			       -0.33				1.61				1.88
KOR		      4.22				2.80				2.08				1.33
NGA		      3.53				3.54			       -0.41				0.13
BGD		      5.07				3.86				5.80				4.18
PAK		      2.07				0.81				3.59				0.89
EGY		      2.96				0.58				2.10				2.51
PHL		      3.66				2.38				3.25				4.14
MAR		      3.52				2.09				1.91				0.07
IDN		      2.90				4.39				3.91				1.60
IND		      5.26				5.20				5.02				4.24
PER		      4.90				3.39				1.45				0.69
MOZ		      5.28				3.95				2.14			       -0.46
DEU		      1.33				1.02				1.43				0.12
ESP		      1.25			       -2.00				2.68				2.61
JPN		      0.83				0.46				1.09				0.48
CAN		      1.39				0.17				0.62			       -0.12
AUS		      1.41				0.88				0.99				0.76
NOR		      1.62		               -0.53				0.77				0.72
NLD		      1.78		               -0.74				1.73				0.46
DNK		      1.52			       -0.57				1.64				1.97
SWE		      2.20				0.21				1.37				0.91		
FIN		      2.99	                       -1.46				1.13	                       -0.16

Function definition with linear regression, groups, R² and β-Coefficient

# Function with group, beta-coeficient and r2
def plot_beta_convergence(data, x_col, y_col, x_label, y_label, title, groups=['Emerging', 'Developed']):
    plt.figure(figsize=(10, 6))
    colors = {'Emerging': 'blue', 'Developed': 'orange'}
    for group in groups:
        subset = data[data['Group'] == group].dropna(subset=[x_col, y_col])
        X = subset[[x_col]]
        y = subset[y_col]
        model = LinearRegression().fit(X, y)
        r2 = r2_score(y, model.predict(X))
        plt.scatter(X, y, label=f'{group} data', color=colors[group], alpha=0.5)
        plt.plot(X, model.predict(X), linestyle='-', color=colors[group], 
                 label=f'{group} (β = {model.coef_[0]:.2f}, R² = {r2:.2f})')
    
    plt.title(title, fontsize=14, pad=15)
    plt.xlabel(x_label, fontsize=12)
    plt.ylabel(y_label, fontsize=12)
    plt.legend()
    plt.grid(alpha=0.4)
    plt.tight_layout()
    plt.show()

Data visualization

To start, we apply the previously deifned function to explore β-convergence by comparing growth rates between the Pre-Crisis (2004–2008) and Recent (2019–2024) periods for Emerging and Developed economies. This first visualization helps to start revealing signs of divergence or convergence.

# First plot
plot_beta_convergence(
    df_wdi, 
    'Pre_Crisis_Growth (2004-2008)', 'Recent_Growth (2019-2024)',
    'Pre-Crisis Growth (%)', 'Recent Growth (%)',
    'β-Convergence: Pre-Crisis (2004-2008) vs. Recent (2019-2024)'
)

Next, we apply the function to examine β-convergence by plotting a linear regression of growth rates between the Recuperation (2009–2013) and Stability (2014–2018) periods.

# Second plot
plot_beta_convergence(
    df_wdi, 
    'Recuperation_Growth (2009-2013)', 'Stability_Growth (2014-2018)',
    'Recuperation Growth (%)', 'Stability Growth (%)',
    'β-Convergence: Post-Crisis (2009-2013) vs. Stability (2014-2018)'
)

Finally, we create another plot with regression lines to evaluate β-convergence from the Stability (2014–2018) to the Recent (2019–2024) period. This comparison is the most interesting one, due to the two β coefficients being positive, and also these are the most recent periods in this project.

# Third plot
plot_beta_convergence(
    df_wdi, 
    'Stability_Growth (2014-2018)', 'Recent_Growth (2019-2024)',
    'Stability Growth (%)', 'Recent Growth (%)',
    'β-Convergence: Stability (2014-2018) vs. Recent (2019-2024)'
)

Conclusions

Emerging economies show clear and growing divergence:In the three analyzed periods, the Beta coefficient (β) for emerging countries is always positive (β = 0.47, 0.54, 0.69). This indicates there is no convergence; on the contrary, emerging economies that grew faster in one period tended to also grow faster in the next one.
Developed economies have changed from a very weak convergence to a moderate divergence:In the first two graphs (analyzing pre-crisis and post-crisis periods), the β for developed countries (red line) is slightly negative (β = -0.24 and -0.13). This would point to slight convergence: those growing faster tended to slow down a bit.

View Full Code on GitHub