Grouped data are not uncommon in income and wealth distribution studies. Inference about distributions with grouped data is one of my research areas (joint with several colleagues). However I am occasionally asked ,in seminars or privately, whether at the age of “big data” there is any point in working with grouped data. I think grouped data are still interesting and at present probably unavoidable in global income distribution studies for following reasons:

  • When we are dealing with large scale poverty and inequality analyses [i.e. almost all countries over several time periods] the available data sources through the World Bank or the World Institute for Development Economics Research (WIDER) are all in grouped data form. As far as I know, there is no similar source that provides individual data at such scale.
  • There are still statistically interesting questions to be studied with grouped data some of which we are going to discuss in this blog.
  • My experience shows that inferential results based on grouped data, at least under a parametric framework, are not that different from inferential results based on individual data. This of course may not be true if data has substantial irregularities and nonparametric inference is used.

  1. Posted by Sriram on September 22, 2015 at 11:49 pm

    – Other than inequality measurement are there any other applications of group data in economics?

    – Are there any other applications in economics where we model proportions that add to 1?


  2. Posted by Reza on September 23, 2015 at 2:37 pm

    The methodology can be used for estimation of any distribution with grouped data and I guess interests in distributions is not confined to income. I can’t right now cite specific interesting applications for which we only have grouped data. This is of interest to me but I haven’t yet seriously looked for it. Distribution of health related variables, wealth, firm size and things like these could be potential areas in economics to look for. I have also seen studies in biology trying to estimate distribution of a particular species in a region with some sort of grouped data. There might be applications in astronomy as well.


