1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
| % GINI computes the Gini coefficient and the Lorentz curve.
%
% Usage:
% g = gini(pop,val)
% [g,l] = gini(pop,val)
% [g,l,a] = gini(pop,val)
% ... = gini(pop,val,makeplot)
%
% Input and Output:
% pop A vector of population sizes of the different classes.
% val A vector of the measurement variable (e.g. income per capita)
% in the diffrerent classes.
% g Gini coefficient.
% l Lorentz curve: This is a two-column array, with the left
% column representing cumulative population shares of the
% different classes, sorted according to val, and the right
% column representing the cumulative value share that belongs to
% the population up to the given class. The Lorentz curve is a
% scatter plot of the left vs the right column.
% a Same as l, except that the components are not normalized to
% range in the unit interval. Thus, the left column of a is the
% absolute cumulative population sizes of the classes, and the
% right colun is the absolute cumulative value of all classes up
% to the given one.
% makeplot is a boolean, indicating whether a figure of the Lorentz
% curve should be produced or not. Default is false.
%
% Example:
% x = rand(100,1);
% y = rand(100,1);
% gini(x,y,true); % random populations with random incomes
% figure;
% gini(x,ones(100,1),true); % perfect equality
%
% Explanation:
%
% The vectors pop and val must be equally long and must contain only
% positive values (zeros are also acceptable). A typical application
% would be that pop represents population sizes of some subgroups (e.g.
% different countries or states), and val represents the income per
% capita in this different subgroups. The Gini coefficient is a measure
% of how unequally income is distributed between these classes. A
% coefficient of zero means that all subgroups have exactly the same
% income per capital, so there is no dispesion of income; A very large
% coefficient would result if all the income accrues only to one subgroup
% and all the remaining groups have zero income. In the limit, when the
% total population size approaches infinity, but all the income accrues
% only to one individual, the Gini coefficient approaches unity.
%
% The Lorenz curve is a graphical representation of the distribution. If
% (x,y) is a point on the Lorenz curve, then the poorest x-share of the
% population has the y-share of total income. By definition, (0,0) and
% (1,1) are points on the Lorentz curve (the poorest 0% have 0% of total
% income, and the poorest 100% [ie, everyone] have 100% of total income).
% Equal distribution implies that the Lorentz curve is the 45 degree
% line. Any inequality manifests itself as deviation of the Lorentz curve
% from the 45 degree line. By construction, the Lorenz curve is weakly
% convex and increasing.
%
% The two concepts are related as follows: The Gini coefficient is twice
% the area between the 45 degree line and the Lorentz curve.
%
% Author : Yvan Lengwiler
% Release: $1.0$
% Date : $2010-06-27$
function [g,l,a] = gini(pop,val,makeplot)
% check arguments
assert(nargin >= 2, 'gini expects at least two arguments.')
if nargin < 3
makeplot = false;
end
assert(numel(pop) == numel(val), ...
'gini expects two equally long vectors (%d ~= %d).', ...
size(pop,1),size(val,1))
pop = [0;pop(:)]; val = [0;val(:)]; % pre-append a zero
isok = all(~isnan([pop,val]'))'; % filter out NaNs
if sum(isok) < 2
warning('gini:lacking_data','not enough data');
g = NaN; l = NaN(1,4);
return;
end
pop = pop(isok); val = val(isok);
assert(all(pop>=0) && all(val>=0), ...
'gini expects nonnegative vectors (neg elements in pop = %d, in val = %d).', ...
sum(pop<0),sum(val<0))
% process input
z = val .* pop;
[~,ord] = sort(val);
pop = pop(ord); z = z(ord);
pop = cumsum(pop); z = cumsum(z);
relpop = pop/pop(end); relz = z/z(end);
% Gini coefficient
% We compute the area below the Lorentz curve. We do this by
% computing the average of the left and right Riemann-like sums.
% (I say Riemann-'like' because we evaluate not on a uniform grid, but
% on the points given by the pop data).
%
% These are the two Rieman-like sums:
% leftsum = sum(relz(1:end-1) .* diff(relpop));
% rightsum = sum(relz(2:end) .* diff(relpop));
% The Gini coefficient is one minus twice the average of leftsum and
% rightsum. We can put all of this into one line.
g = 1 - sum((relz(1:end-1)+relz(2:end)) .* diff(relpop));
% Lorentz curve
l = [relpop,relz];
a = [pop,z];
if makeplot % ... plot it?
area(relpop,relz,'FaceColor',[0.5,0.5,1.0]); % the Lorentz curve
hold on
plot([0,1],[0,1],'--k'); % 45 degree line
axis tight % ranges of abscissa and ordinate are by definition exactly [0,1]
axis square % both axes should be equally long
set(gca,'XTick',get(gca,'YTick')) % ensure equal ticking
set(gca,'Layer','top'); % grid above the shaded area
grid on;
title(['\bfGini coefficient = ',num2str(g)]);
xlabel('share of population');
ylabel('share of value');
end
end |
Partager