All measurements of poverty, inequality and so forth rely in some ways on estimation of a representation of an income (or consumption) distribution. The most commonly used representation is the PDF therefore in this post I review common ways of density estimation. The excellent handbook chapter of Cowell and Flachaire (2015) covers this topic in some details (see also Lubrano 2013). My take here is instead brief, informal, and based on personal experiences.

**1- Nonparametric Estimation**: Under this approach, a density function is estimated using a nonparametric method e.g. the kernel smoothing.

*Advantages:*

- No prior assumption about the functional form of the density is required
- The model is univariate with reasonably big number of observations therefore no curse of dimensionality

*Disadvantages:*

- In poverty and inequality analysis we are often interested in the tails, both lower and upper, but these are not often estimated precisely using the standard kernel method. The standard methods do not perform well near the boundaries (in this case zero) and do not perform well under heavy-tailed situations (which is often the case with income distributions).

There are ways to mitigate these problems for example by combining kernel with Pareto distributions at tails (see e.g. Cowell and Victoria-Feser 2008). MATLAB also has the function “*paretotails*” which can estimate a kernel (or an empirical CDF) for the middle parts and Pareto distributions at tails. These more sophisticated methods are however not often used and as a result some prefer even something like histogram to kernel. Another thing that I would like to know more about is how kernel is compared to the empirical CDF.

**2- Parametric Estimation**: Under this approach, a parametric functional form is specified and often estimated using a maximum likelihood approach.

*Advantages:*

- The main advantage is parsimony i.e. most features of a distribution can be represented with a few parameters.
- It can be used with small number of observations or grouped data

*Disadvantages:*

- Inflexibility e.g. most of the commonly used parametric densities are uni-modal but there are important situations that a density is clearly multi-modal.

There are a variety of parametric functional forms for densities to choose from (see e.g. Kleiber and Kotz 2003 and references cited therein). Some of the more prominent ones are: *2-parameter* (Lognormal, Gamma); *3-parameter* (Generalised gamma, Singh-Maddala, Dagum, B2, Parteo Lognormal); and *4-parameter* (GB2, Double-Pareto, GB1). There is enough evidence that the 2-parameter densities are severely inadequate to model incomes densities. Among the 3-parameter ones, Singh-Maddala is probably the most well-known but based on my experiences Dagum (see also Kleiber, and here) and Pareto-lognormal often provide better fits. Among the *4-parameter* densities, I have had only experiences with GB2 and Double-Pareto. GB2 seems to perform slightly better and we did not found much evidence for Pareto laws in the lower tail (but see Reed and Wu 2008 for opposite claims).

**3- Mixtures:** Another flexible way of modelling of densities is by using mixtures e.g. mixture of Lognormals or mixture Gammas [see e.g. here, here, here or here)

Advantages:

- It is flexible while still manageable with good tail properties.
- Using mixtures, it might be possible to identify subgroups in the populations
- By increasing the number of components in the mixture one can approximate any distribution.

*Disadvantages:*

- The main issue in using mixtures is the difficult non-standard inferential issues that arise when the number of considered components is more than the actual one or when the weight of one component is very small.

Based on my personal experience with grouped data, the mixture of Lognormals performs well and better than mixture of gammas.

So, which of these models should be preferred? I think there is no simple answer to this, it all depends on the situation.

Posted by Sriram on October 15, 2015 at 12:48 pm

Another approach would be to estimate density using maximum entropy estimators. For example, see http://www.mitpressjournals.org/doi/abs/10.1162/003465305775098206#.Vh-hZH4rLIU by Wu and Perloff.

Posted by Reza on October 15, 2015 at 1:45 pm

I didn’t mean to list all the possible techniques, just the common ones. Regarding the “maximum entropy”, I think there are two separate things: 1- Just use maximum entropy principle to propose a functional form for an income density and then estimate the density with a conventional method. I might be wrong but I think this is what Wu and Perloff in that paper do. 2- One can also estimate the model using a maximum entropy approach. I think in the latter case, the “empirical likelihood” is more practical with individual data. Empirical likelihood is nonparametric and worth studying more. I have seen a couple of works in income distribution contexts but don’t know enough to comment further.

Posted by Sriram on October 16, 2015 at 6:02 am

Is there a regression based method (like the one to compute standard errors for GINI) to calculate the standard error of Theil index? I know that one could use bootstrap methods to calculate the standard errors. Alternatively, once can fit a parametric density and then compute the standard error for Theil index.

Posted by Reza on October 17, 2015 at 4:07 am

You don’t have to use regression based standard errors for either Gini or Theil. There are simpler formula for both of them [with individual data]. See page 94 of Cowell and Flachaire (2015) linked above.