Reference Points and forms of Average

There are several different ways of taking an average, and when trying to understand suspicious and confusing statistics or the points of view of those who differ from you politically, it’s useful to inventory the different meanings of “average” that get tossed around in conversation. An “average” of some sort is often used as a reference point in debate - either explicitly or implicitly through terms like “most”, “usually”, or “from my experience”. Different sorts of “average” have different sampling errors.

The “Mean” is the formal name for the usual concept of an “average” as taught in mathematics classes: Add up all the values, then divide by the total number of values. With continuous data, you integrate over the domain, then divide by the size of the domain. This form of average is used when GDP is used as a metric of national wealth. It ignores the shape of the distribution and implicitly assumes that everyone (or everything) measured is reasonably close to some central value.
The “Median” is calculated by sorting the values and picking the one in the middle. It’s useful when you want to talk about a typical person’s experience. Median income is usually much lower than mean income because most of society’s wealth and income tend to be concentrated in relatively few hands. The median tends to be less effected by outliers than the mean.
The “Mode” is the most common value - even if nearly all values are almost as frequent. For continuous variables (like income or age), it’s tricky since the data must be binned before use and since, depending on the binning, there may not be multiple individuals with the same value, or the “most common value” may not be shared by a significant portion of the population. With respect to U.S. race, Whites are more common than other racial groups, so talking about the “mode” of a population might be analogous to describing “typical for Whites” as if it was synonymous with “Typical for US residents”. Similarly, men are born slightly more often than women, and men make up a slight majority of the global population, despite dying younger - so the mode of global gender is “male” and women are neglected despite being nearly 50% of the population. For individual income, a “mode” might well be zero dollars per year - since most children have no meaningful personal income. The mode doesn’t get much explicit reference in academic work - likely because statisticians recognize its tendency to ignore large but relevant portions of the data. If someone uses phrases like “the most common situation” or “most often”, they may be appealing to the mode and neglecting a wide variety of samples that collectively outnumber the single “most common” type, or favoring a very narrow majority at the expense of a significant minority.
When thinking about potential loss or gain, or historical loss or gain, people often compare where they are now to where they personally have been before, where they expect to be, or where they imagine their parents or children would be. Mathematically, this is temporal integration, and is appropriate when trying to forecast future state of the individual. When an empowered majority decries the relative losses they’ve faced as society becomes more diverse, they’re comparing their current and past states - making averages and forecasts in time. This can be upsetting, even if they still have more wealth, privilege and power than those from different backgrounds: They are comparing their personal current state to their personal past state, and that shows a loss of relative power as power has been democratized across a broader portion of the population.
When you compare yourself to those you know personally, those whom you follow on social media, to specific politicians or news organizations, you’re comparing yourself to a Dunbar-windowed average: You’re tending to assume “everyone you know” is a useful proxy for “everyone”. This can be mutually reinforcing with racial segregation.
It is often useful to divide a surveyed population into subgroups, and to discuss the average characteristics of each subgroup. This “windowing” can introduce bias or errors because random outliers have a larger effect in a small sample size. The details of the windowing or subdivision tend to be important too: Are people being assigned to demographic groups based on self-identification, by nationality, or by genetic tests? If the subgroups were based on survey data, how were the surveys conducted and how were the survey results extrapolated from the raw data to the population at large?

When debating policy, when consuming statistics, or when trying to understand others’ point of view, pay attention to their averages and to the resulting reference points. Which of these averages matter to a person varies by broad political persuasion, and you may be able to get more traction in a debate by emphasizing averages and reference points that are meaningful to your audience.

A blog on US politics, Math, and Physics… with occasional bits of gaming

Jul 15 Reference Points and forms of Average

Aug 2 US-sanctioned racism and Antifa

Jun 25 Metrology - Social Science