I have a dataset. Say $10$ observations and $3$ variables:

obs  A   B   C
1    0   0   1
2    0   1   0
3    1   0   1
4    1   1   0
5    1   0   1
6    1   0   0
7    1   1   0
8    0   0   1
9    0   1   1
10   0   1   1

Say that is $10$ customers who have bought (1) or not (0) in each category A, B, C. There are $16$ ones there so these $10$ customers buy $1.6$ products on average.

If I look at only those who buy A, there are $5$ customers who have bought $9$ products, so that's $1.8$ on average.

B is $9/5$ again, or $1.8$.

C is $10/6 = 1.67.$

All of them above $1.6.$

which seems strange. I understand it but need to explain this to marketing next week and so need help!

What is this thing called?

I know it's not Simpson's paradox. To me it feels similar in logic to the Monty Hall problem and conditional probability.

share|improve this question
1  
Personally, I have no idea what you're talking about. Why not create a contingency table of the As, Bs and Cs to examine the cross-purchase patterns? – DJohnson 22 hours ago
2  
We have reports that say "Customers who buy C are worth more than average - 1.67 vs 1.6" Which is True, but A and B are worth more than average too. To which the inevitable question will arise "How can all customers be worth more than average"? – James Adams 22 hours ago
3  
I think his puzzle is that it superficially looks like Lake Wobegon where everyone is above average :P Let $X$ be the number of categories/item a customer purchased. Let $A$, $B$, and $C$ be indicators for purchasing in category A, B, and C respectively. $\operatorname{E}[X\mid A] = 1.8$, $\operatorname{E}[X\mid B] = 1.8$, and $\operatorname{E}[X\mid C] = 1.67$ while $\operatorname{E}[X] = 1.6$ – Matthew Gunn 21 hours ago
9  
You might want to think in terms of complementary sets and Venn diagrams. The sets "customers who buy A" and "customers who do not buy A" are non-overlapping. But the sets you list in your question overlap. You can compute the overall average as a (weighted) average of subset averages only if the subsets form a partition. – GeoMatt22 21 hours ago
3  
Is this loosely similar to the majority-illusion paradox? In the same way that any individual is likely to be connected to a super networker, any purchase category is likely to contain a super purchaser? (I'm calling a super networker someone who connects with many people and a super purchaser someone who purchases many different items) – Matthew Gunn 21 hours ago

Simple example to gain intuition:

  • Let $A$ be an indicator whether an individual purchased an item in category A.
  • Let $B$ be an indicator whether an individual purchased an item in category B.
  • Let $X = A + B$ be the number of items purchased.

\begin{array}{ccc} \text{Person} & A & B \\ i & 1 & 0 \\ ii & 0 & 1 \\ iii & 1 & 1 \end{array}

The set of individuals where $A$ is true overlaps the set of individuals where $B$ is true. They are NOT disjoint sets.

Then $\operatorname{E}[X] \approx 1.33$ while $\operatorname{E}[X \mid A] = 1.5$ and $\operatorname{E}[X \mid B] = 1.5$

The statement that would be true is:

$$ P(A)\operatorname{E}[X\mid A] + P(B)\operatorname{E}[X\mid B] - P(AB)\operatorname{E}[X\mid AB] = \operatorname{E}[X]$$

$$ \frac{2}{3}1.5 + \frac{2}{3}1.5 - \frac{1}{3}2 = 1.3333$$

You can't simply compute $P(A)\operatorname{E}[X\mid A] + P(B)\operatorname{E}[X\mid B] $ because sets $A$ and $B$ overlap, the expression double counts the person who purchases both item $A$ and $B$!

Name for illusion/paradox?

I'd argue it's related to the majority illusion paradox in social networks.

You may have a single dude who networks/friends everyone. That person may be one out of a million overall, but he'll be one of each persons's $k$ friends.

Similarly, you have 1 out of 3 here purchasing both categories A and B. But within either category A or B, 1 out of the 2 purchasers is the super purchaser.

Extreme case:

Let's create $n$ sets of lotto tickets. Every set $S_i$ includes two tickets: a losing ticket $i$ and the jackpot winning ticket.

The average winnings in every set $S_i$ is then $\frac{J}{2}$ where $J$ is the jackpot. The average of each category is WAY above the average winnings per ticket overall $\frac{J}{n+1}$.

It's the same conceptual dynamic as the sales case. Every set $S_i$ includes the jackpot ticket in the same way that every category A, B, or C includes the heavy purchasers.

My bottom line point would be that intuition based upon disjoint sets, a full partition of the sample space does not carry over to a series of overlapping sets. If you condition on overlapping categories, every category can be above average.

If you partition the sample space and condition on disjoint sets, then categories have to average out to the overall mean, but that's not true for overlapping sets.

share|improve this answer
1  
Thanks! I think the double counting is the key to explaining. I don't think this is necessarily the result of a few extreme values though. My example dataset above is fairly mundane and the "all groups above average" effect still happens. My guess it it will happen in most cases. Just wondered if it had a name or a previous example. – James Adams 20 hours ago
    
This explanation would not hold if the data @JamesAdams is analyzing is flawed. I am contending that it is. You can't have a mutually exclusive and complete set of A, B and C categories where the group averages are all higher than the average of all 3 taken together without there being a violation of some fundamental assumption of data analysis. In your case, it's most likely that the denominator for the overall average differs (e.g., contains more respondents) from the ones used for the estimation of the means for A, B and C. – DJohnson 18 hours ago
1  
@DJohnson Of course you're right if sets A, B, and C partition the sample space. My reading of the question and the supplied "data" (whatever it is) is that A, B, and C are overlapping sets. If A, B, and C overlap, then the group averages can all be higher than the overall average (which is the point of my answer; the sets overlap on the biggest customers!). Nothing the OP has said is internally inconsistent. Your "we're getting passed BS data" detector might be better than mine though, and I agree it's always important to ask critical questions about the validity of the data/numbers. – Matthew Gunn 18 hours ago
    
Yes they are overlapping sets. My dataset is millions of customers and 12 categories. When I saw my averages were all higher than the overall average I thought it looks odd but explainable. I put together the example set of 10 obs and 3 categories to see it. I just scattered 1s and 0s here and it came out the same. I suspect this happens with most datasets where this type of average is calculated. @Djohnson my example above that I am uses 10 as the denominator for the overall average, 5 for the As, 5 for the Bs, 6 for the Cs. Can you tell me what I am violating in this example? – James Adams 15 hours ago
    
What does '10' represent? The net of respondents across the 3 categories? What happens to the averages if you use the same denominator for all? It should return averages that fluctuate around the grand mean. – DJohnson 13 hours ago

I would call this the family size paradox or something similar

Suppose, for a simple example, everybody had one partner and a Poisson-distributed number of children with parameter $2$:

  • The average number of children per person would be $2$
  • The average number of children per person with children would be $\frac{2}{1-e^{-2}} \approx 2.313$
  • The average sibling group size for each individual (counting their brothers and sisters and themselves) would be $3$

Real demographic and survey numbers produce different numbers but similar patterns

The apparent paradox is that the average size of individuals' sibling groups is larger than the average number of children per family; with stable population dynamics, people tend to have fewer children on average than their parents did

The explanation is whether the average is being taken over parents and families or over siblings: there are different weightings being applied to large families. In your example there is a difference between weighting by individuals or by purchases; your conditional averages are pushed up by fact you condition on a particular purchase being made.

share|improve this answer

The other answers are overthinking what's going on. Suppose there is one product and two customers. One bought the product (once) and one didn't. The average number of products bought is 0.5, but if you look only at the customer who bought the product, the average rises to 1.

This doesn't seem like a paradox or counterintuitive to me; conditioning on buying a product will generally raise the average number of products bought.

share|improve this answer

Is this not merely the "average of averages" confusion (e.g. previous stackexchange question) in disguise? Your temptation appears to be that the subsample averages should end up averaging to the population average, but this will rarely happen.

In the classical "average of averages", someone finds the average of N mutually exclusive subsets, and then is flabbergasted that these values do not average to the population average. The only way this average of averages works out is if your non-overlapping subsets have the same size. Otherwise, you need to take a weighted average.

Your problem is made more complex than this traditional average of averages confusion by having overlapping subsets, but it appears to me to just be this classic mistake with a twist. With overlapping subsets, it is even harder to end up with subsample averages that average to the population average.

In your example, since users who appear in multiple subsamples (and therefore have bought many things) will increase these averages. Basically you're counting each big-spender multiple times, while the frugal people that only buy one item are only encountered once, so you're biased to larger values. This is why your particular subsets have above average values, but I think this is still just the "average of averages" problem.

You can also construct all kinds of other subsets from your data where the subsample averages take on different values. For example, let's take subsets somewhat similar to your subsets. If you take the subset of people who did not buy A, you get 7/5=1.4 items on average. With the subset that did not buy B, you also get 1.4 items on average. Those who did not buy C, bought 1.5 items on average. These are all below the population average of 1.6 items/customer. Given the right dataset and the right collection of subsets, you could end up with overlapping subsets whose averages average to the population average; however, this would be uncommon in normal applications.

Is it just me, or does the word average now seem weird after so many repetitions... Hope my answer was helpful, and sorry if I ruined the word average for you!

share|improve this answer
    
Thanks! The comment about non-overlapping same size partitions clarified it in my mind. I was hoping when I come to present these figures I could say something like "All the category averages are higher than the overall average, but that's the Blahblah paradox". Like when you say "Simpson's Paradox!, Ivy League Sexism!" and then run out of the room. (You all do that sometimes don't you?) Would love to say to them "It's because these are overlapping subsets of different sizes" but don't think that will land! – James Adams 4 hours ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.