Simple example to gain intuition:
- Let $A$ be an indicator whether an individual purchased an item in category A.
- Let $B$ be an indicator whether an individual purchased an item in category B.
- Let $X = A + B$ be the number of items purchased.
\begin{array}{ccc}
\text{Person} & A & B \\
i & 1 & 0 \\
ii & 0 & 1 \\
iii & 1 & 1
\end{array}
The set of individuals where $A$ is true overlaps the set of individuals where $B$ is true. They are NOT disjoint sets.
Then $\operatorname{E}[X] \approx 1.33$ while $\operatorname{E}[X \mid A] = 1.5$ and $\operatorname{E}[X \mid B] = 1.5$
The statement that would be true is:
$$ P(A)\operatorname{E}[X\mid A] + P(B)\operatorname{E}[X\mid B] - P(AB)\operatorname{E}[X\mid AB] = \operatorname{E}[X]$$
$$ \frac{2}{3}1.5 + \frac{2}{3}1.5 - \frac{1}{3}2 = 1.3333$$
You can't simply compute $P(A)\operatorname{E}[X\mid A] + P(B)\operatorname{E}[X\mid B] $ because sets $A$ and $B$ overlap, the expression double counts the person who purchases both item $A$ and $B$!
Name for illusion/paradox?
I'd argue it's related to the majority illusion paradox in social networks.
You may have a single dude who networks/friends everyone. That person may be one out of a million overall, but he'll be one of each persons's $k$ friends.
Similarly, you have 1 out of 3 here purchasing both categories A and B. But within either category A or B, 1 out of the 2 purchasers is the super purchaser.
Extreme case:
Let's create $n$ sets of lotto tickets. Every set $S_i$ includes two tickets: a losing ticket $i$ and the jackpot winning ticket.
The average winnings in every set $S_i$ is then $\frac{J}{2}$ where $J$ is the jackpot. The average of each category is WAY above the average winnings per ticket overall $\frac{J}{n+1}$.
It's the same conceptual dynamic as the sales case. Every set $S_i$ includes the jackpot ticket in the same way that every category A, B, or C includes the heavy purchasers.
My bottom line point would be that intuition based upon disjoint sets, a full partition of the sample space does not carry over to a series of overlapping sets. If you condition on overlapping categories, every category can be above average.
If you partition the sample space and condition on disjoint sets, then categories have to average out to the overall mean, but that's not true for overlapping sets.