## Microfinance Growth Across Regions

#### Motivation

Started as a service to lift people in rural areas of developing countries out of poverty, microfinance in recent years has attracted private investors for reasons beyond social benefits. First and foremost, microfinance investments target developing countries, many of which have the fastest growing economies, and may yield higher returns than traditional investments in developed countries. Moreover, microfinance investments are considered as a way to diversify portfolio and reduce risk because microfinance markets are less correlated with major stock and commodity markets. In this project, we use a linear mixed model to study the regional impact on microfinance growth since 2008 financial crisis. Countries in different regions have different macroeconomies and policies that affect microfinance; we wish to identify the region with the best environment for microfinance to thrive.

#### Exploratory Data Analysis

Our main interest is growth and the best variable is gross loan portfolio (in USD). Since the distribution of gross loan portfolio is highly right skewed (like income) and gross loan portfolio grows multiplicatively (like compound interest), it is appropriate to use log gross loan portfolio. As we can see, log gross loan portfolio is somewhat normally distributed and seems to follow a linear trend.

The variable of interest is region, which is categorized into: Africa, East Asia and the Pacific, Eastern Europe and Central Asia, Latin America and The Caribbean, Middle East and North Africa, South Asia. The most represented countries in the regions are: Senegal, Cambodia, Azerbaijan, Ecuador, Morocco and India. Annual box plots and the time series from 2008 to 2014 are shown below. We can observe subtle differences in growth patterns across regions.

The dataset comes from Microfinance Information Exchange (MIX), a non-profit organization based in Washington, DC that has the most extensive data on MFI’s. The dataset includes 2804 MFI’s from 1997 to 2015 with 80 variables that measure their financial and operational structures. However, not all 2804 MFI’s were present during the entire time period; some MFI’s were founded after 1997 and some failed before 2015. To reduce data missingness and extract the most relevant information, we restrict the time frame from 2008 to 2014 (data from 2015 are still being collected). In addition, we remove MFI’s that are not audited to prevent unreliable data.

#### Mixed Model

In usual regression setting, we assume each observation (data point) to be independent. In this case, the independent assumption is clearly violated since we have longitudinal data (data collected from same MFI’s over time). If we don’t address the correlation between observations, we will get skewed estimates with large variances. Therefore, we introduce random effects (individual effects) in our model. In English, we really want to get at the common effects of the variables having taken individual differences into account.

The linear mixed model is given by

$$MFI_i(t)=b_{i0}+\beta_0+\beta_1\cdot I_{Profit}+\beta_2\cdot I_{Medium}+\beta_3\cdot I_{Small}+\beta_{4}\cdot OSS_{it}$$ $$+(b_{i1}+\beta_{5}\cdot I_{Africa}+...+\beta_{10}\cdot I_{South Asia}$$ $$+\beta_{11}\cdot I_{Profit}+\beta_{12}\cdot I_{Medium}+\beta_{13}\cdot I_{Small})\cdot t+\epsilon_{it}$$ where $$b_{i0}\sim N(0,\sigma_{b0}^2)$$ $$b_{i1}\sim N(0,\sigma_{b1}^2)$$ $$\epsilon_{it}\sim N(0,\sigma^2).$$

We also implement an equivalent Bayesian hierarchical model with log-normal distribution and proper non-informative priors for comparison.

The adjustment variables are: profit status (profit vs non-profit), scale (small, medium, large) and operational self sufficiency. Rural banks and other non-banking financial institutions have distinct sources of funding and strategies of investing from non-profit organizations. Organizations of different scales have different channels to raise capital and distribute loans. Operational self sufficiency (OSS) calculated as financial revenue/(financial expense+impairment loss+operating expense) is an overall measure of financial health.

#### A Sidenote on Variable Selection

Given a dataset with 80 variables, one may be tempted to throw in all of them. However, this approach has a few serious problems.

First, many of the variables are taken from financial statements, which have a direct relationship with gross loan portfolio. To see why we shouldn’t include them, let’s do a thought experiment. Suppose you want to test whether obesity (measured by BMI) differs between New York and Los Angeles. You would build a regression model with indicator variables for New York and Los Angeles and maybe other variables such as height or race. However, once you include weight in your model, other variables are almost guaranteed to have no effect.

Secondly, what we really want to adjust for are confounding variables. Mathematically speaking, in regression analysis, we model the conditional distribution of the response given predictor of interest. Confounding variables are defined to be correlated with both the repsonse and predictor of interest. Many variables can affect gross loan portfolio but do they also differ across regions? Probably not.

Lastly, many of the variables are correlated. When variables are correlated, they will produce unstable estimates and inflate the variance, which means wider confidence intervals. In a nutshell, the calculation behind linear regression is inverting a matrix. When the variables are highly correlated, the matrix will be close to singular so the numerical computation is unstable. Also, the inverted matrix that makes up the estimated variance-covariance will have large eigenvalues.

Though our model may seem simple, it fits the data sufficiently well without losing interpretability and generalizability.

#### Results

We examine the random effects with likelihood ratio tests and check model assumptions with residual diagnostic plots. The random effects are statistically significant and the linearity and Gaussianity assumptions are met. However, there appears to be strong residual autocorrelation that does not exhibit AR(1) pattern and imposing more complicated covariance structure is not suitable for such short time series. For more accurate inferences, we use the Huber-White robust variance estimator.

Below are simultaneous 95% confidence intervals of exponentiated estimated coefficients that translate to median annual gross portfolio growth for large non-profit MFI’s with median self-sufficiency in different regions:

Africa (1.162, 1.173)

East Asia and the Pacific (1.185, 1.202)

Eastern Europe and Central Asia (1.129, 1.135)

Latin America and the Caribbean (1.065, 1.103)

Middle East and North Africa (1.048, 1.092)

South Asia (1.098, 1.102).

Our results are both statistically and practically significant. The fastest growing region, East Asia and the Pacific, saw nearly 20% median annual growth rate, or nearly 300% median growth from 2008 to 2014. In nearby South Asia, the median annual growth rate was almost 10% lower and the median growth during the same period was merely 180%. We believe a more in-depth study would be hugely beneficial for investors, policy makers, and more importantly, the people in need of microfinance.