I'd like to expand a little bit on the idea of order dependence vs. independence.
In the problem of calculating the expected number of heads from flipping 8 coins, we're summing the values from 8 identical distributions, each of which is the Bernoulli distribution [; B(1, 0.5) ;] (in other words, a 50% chance of 0, a 50% chance of 1). The distribution of the sum is the binomial distribution [; B(8, 0.5) ;], which has the familiar hump shape with most of the probability centered around 4.
In the problem of calculating the expected value of a byte made of 8 random bits, each bit has a different value that it contributes to the byte, so we're summing the values from 8 different distributions. The first is [; B(1, 0.5) ;], the second is [; 2 B(1, 0.5) ;], the third is [; 4 B(1, 0.5) ;] , so on up to the eighth which is [; 128 B(1, 0.5) ;]. The distribution of this sum is understandably quite different from the first one.
If you wanted to prove that this latter distribution is uniform, I think you could do it inductively — the distribution of the lowest bit is uniform with a range of 1 by assumption, so you would want to show that if the distribution of the lowest [; n ;] bits is uniform with a range of [; 2^n - 1} ;] then the addition of the [; n+1 ;]st bit makes the distribution of the lowest [; n + 1 ;] bits uniform with a range of [; 2^{n+1} - 1 ;], achieving a proof for all positive [; n ;]. But the intuitive way is probably the exact opposite. If you start at the high bit, and choose values one at a time down to the low bit, each bit divides the space of possible outcomes exactly in half, and each half is chosen with equal probability, so by the time you get to the bottom, each individual value must have had the same probability to be chosen.