I am generating 8 random bits (either a 0 or a 1) and concatenating them together to form an 8-bit number. A simple Python simulation yields a uniform distribution on the discrete set [0, 255].

I am trying to justify why this makes sense in my head. If I compare this to flipping 8 coins, wouldn't the expected value be somewhere around 4 heads/4 tails? So to me, it makes sense that my results should reflect a spike in the middle of the range. In other words, why does a sequence of 8 zeroes or 8 ones seem to be equally as likely as a sequence of 4 and 4, or 5 and 3, etc.? What am I missing here?

share|improve this question
11  
The expected value of the distribution of bits in a uniform random the range [0,255] is also somewhere around 4 1's/4 0's. – immibis yesterday
1  
Just because you give an equal weight to each number 0 through 255, doesn't mean the function result "difference between count of 1s and 0s" will also occur once and only once. I could give an equal weight to every person in my organization. Doesn't mean the their ages would be equally weighted. Some ages might be much more common than others. But one person is not more common than any other person. – Brad Thomas 22 hours ago
1  
Think of it this way... Your first random bit will determine the value of bit 7, a 1 is worth 128 and a 0 worth 0. Out of 256 numbers you have a 50% chance of the number being 0-127 if the bit is 0 and 128-255 if the bit is 1. Let's say it's 0, then the next bit determines if the result will be 0-63 or 64-127. All 8 bits are required to form one of 256 equally likely outcomes. You're thinking of adding totals like you would with dice. The odds of getting 4 1s and 4 0s is higher than getting 8 1s, but there are more ways they can be arranged to give you a different result. – Jason Goemaat 8 hours ago

10 Answers 10

up vote 42 down vote accepted

In the coin toss example, you're ignoring the order of the outcomes. HHHHTTTT is treated as the same as TTTTHHHH (both have 4 heads and 4 tails). But in bits, you care about the order (because you have to give "weights" to the bit positions in order to get 256 outcomes), so 11110000 is different from 00001111.

Note that you can directly map the coin flipping model to the outcome of the bytes if you record the order of the coin flips, so that TTTTHHHH is treated as distinct from HHHHTTTT.

share|improve this answer
4  
@Carl I think that you and I have different understandings of what the question is about. – Sycorax yesterday
15  
@Carl I really don't understand your complaint about Sycorax's answer. It seems spot on to me, dealing correctly with the essential misunderstanding of the OP. A fair coin flip (with faces labelled 0/1 in place of H/T) is a perfectly reasonable model for a random bit. The only remaining issue is how those eight 0's / 1's are to be interpreted (the OP implicitly jumped from the idea of flipping coins to adding the number of 1's as if they were all worth the same), but in interpreting the 8 bits as a byte - as an 8 digit binary number - we assign a different value to each one; based on position. – Glen_b yesterday
10  
@Carl Let me be more direct, so you don't mistake my meaning then. I think your answer doesn't really respond to the main issue, I think it's confusing (in places I barely have any idea what you're trying to say) and not nearly as directly relevant as this one (now I write it down, clearly I should have downvoted your answer). I think your response to whoever did downvote should have been to improve your answer, not to complain about what looks to me to be easily the best answer here. Try to understand why Sycorax's answer is a good one, rather than to maintain that what you wrote is better. – Glen_b yesterday
2  
@Carl while the stackoverflow FAQ's make it clear that a downvote is for fairly extreme cases - "Use your downvotes whenever you encounter an egregiously sloppy, no-effort-expended post, or an answer that is clearly and perhaps dangerously incorrect." - people that haven't carefully read every bit of help content on the site (i.e. most people) will downvote any answer that doesn't seem helpful, or doesn't seem to address the question being asked (such as yours). – Walt yesterday
3  
@Carl complaining via the comments on the answer of someone else about the downvote on your answer doesn't seem like a good idea. It comes across as petty and you're less likely to get constructive criticism where you want it. – iheanyi 17 hours ago

why does a sequence of 8 zeroes or 8 ones seem to be equally as likely as a sequence of 4 and 4, or 5 and 3, etc

The aparent paradox can be summarized in two propositions:

  1. The sequence $s_1: 00000000$ (eight zeroes) is as probable as sequence $s_2: 01010101$ (four zeroes, four ones). (In general: all sequences are equally probable, regardless of how many zeroes/ones they have.)

  2. The event "E1: the sequence had four zeroes" is more probable than the event "E2: the sequence had eight zeroes".

These propositions are not contradictory; indeed, they are both true. Because the event E1 includes more sequences.

share|improve this answer

Sycorax's answer is correct, but it seems like you're not entirely clear on why. When you flip 8 coins or generate 8 random bits taking order into account, your result will be one of 256 equally likely possibilities. In your case, each of these 256 possible outcomes uniquely map to an integer, so you get a uniform distribution as your result.

If you don't take order into account, such as considering just how many heads or tails you got, there are only 9 possible outcomes (0 Heads/8 Tails - 8 Heads/0 Tails), and they're no longer equally likely. The reason for this is because out of the 256 possible results, there are 1 combination of flips that gives you 8 Heads/0 Tails (HHHHHHHH) and 8 combinations that give 7 Heads/1 Tails (a Tails in each of the 8 positions in the order), but 8C4 = 70 ways to have 4 Heads and 4 Tails. In the coin flipping case each of those 70 combinations maps to 4 Heads/4 Tails, but in the binary number problem each of those 70 outcomes maps to a unique integer.

share|improve this answer

All of the $2^8$ sequences have the same probability 1/$2^8$=1/256. It is a wrong to think that the sequences that have closer to an equal number of 0s and 1s is more likely as the question is interpreted.. It should be clear that we arrive at 1/256 because we assume independence from trial to trial. That is why we multiply the probabilities and the outcome of one trial has no influence on the next.

share|improve this answer
2  
This would be an ok, if short, answer... if the question didn't include the word "why". As it is, you simply reiterate one of the givens in the question, with no explanation given. – Walt yesterday
1  
Actually... This answer is factually wrong, see leonbloy's answer for why. – Walt yesterday
2  
@Walt it's not incorrect. Subtlety of language. Any given sequence is not more likely because it has less imbalance between 0s and 1s. There are simply more such sequences. – hobbs yesterday
3  
Does anybody agree with me? If a 0 has probability 1/2 and a 1 has probability 1/2 and one term in the sequence is independent of the next the probability of a given sequence of length 8 has probability $1/2^8 = 1/256$. and so does any other sequence of 8. – Michael Chernick yesterday
1  
@Michael I fully agree and am pleased to see--at last!--an explicit appeal to the very heart of the matter: independence. I would be happy to upvote your answer if you would include that comment in it. – whuber 16 hours ago

I'd like to expand a little bit on the idea of order dependence vs. independence.

In the problem of calculating the expected number of heads from flipping 8 coins, we're summing the values from 8 identical distributions, each of which is the Bernoulli distribution [; B(1, 0.5) ;] (in other words, a 50% chance of 0, a 50% chance of 1). The distribution of the sum is the binomial distribution [; B(8, 0.5) ;], which has the familiar hump shape with most of the probability centered around 4.

In the problem of calculating the expected value of a byte made of 8 random bits, each bit has a different value that it contributes to the byte, so we're summing the values from 8 different distributions. The first is [; B(1, 0.5) ;], the second is [; 2 B(1, 0.5) ;], the third is [; 4 B(1, 0.5) ;] , so on up to the eighth which is [; 128 B(1, 0.5) ;]. The distribution of this sum is understandably quite different from the first one.

If you wanted to prove that this latter distribution is uniform, I think you could do it inductively — the distribution of the lowest bit is uniform with a range of 1 by assumption, so you would want to show that if the distribution of the lowest [; n ;] bits is uniform with a range of [; 2^n - 1} ;] then the addition of the [; n+1 ;]st bit makes the distribution of the lowest [; n + 1 ;] bits uniform with a range of [; 2^{n+1} - 1 ;], achieving a proof for all positive [; n ;]. But the intuitive way is probably the exact opposite. If you start at the high bit, and choose values one at a time down to the low bit, each bit divides the space of possible outcomes exactly in half, and each half is chosen with equal probability, so by the time you get to the bottom, each individual value must have had the same probability to be chosen.

share|improve this answer
    
It is not a continuous uniform. The bit is either 0 or 1 and nothing in between. – Michael Chernick yesterday
    
@MichaelChernick of course we're only dealing with discrete distributions here. – hobbs yesterday
    
The OP said that the bits are only 1 or 0 and nothing in between. – Michael Chernick 13 hours ago
    
@MichaelChernick correct. – hobbs 13 hours ago

Each bit you choose is independent from each other bit. If you consider for the first bit there is a

  • 50% probability it will be 1

and

  • 50% probability it will be 0.

This also applies to the second bit, third bit and so on so that you end up with so for each possible combination of bits to make your byte you have $(\frac{1}{2})^8$ = $\frac{1}{256}$ chance of that unique 8 bit integer occurring.

share|improve this answer
    
All of these statements are true, but this doesn't address why coin tosses, which are also fair and independent, have only 9 distinct outcomes when an outcome is defined as the number of heads and tails. – Sycorax 3 mins ago

If you do a binary search comparing each bit, then you need the same number of steps for each 8 bit number, from 0000 0000 to 1111 1111, they both have the length 8 bit. At each step in the binary search both sides have a 50/50 chance of occuring, so in the end, because every number has the same depth and the same probabilities, without any real choice, each number must have the same weight. Thus the distribution must be uniform, even when each individual bit is determined by coin flips.

However, the digitsum of the numbers isn't uniform and would be equal in distribution to tossing 8 coins.

share|improve this answer

There is only one sequence with eight zeros. There are seventy sequences with four zeros and four ones.

Therefore, while 0 has a probability of 0.39%, and 15 [00001111] also has a probability of 0.39%, and 23 [00010111] has a probability of 0.39%, etc., if you add up all seventy of the 0.39% probabilities you get 27.3%, which is the probability of having four ones. The probability of each individual four-and-four result does not have to be any higher than 0.39% for this to work.

share|improve this answer
    
This doesn't change the fact that all 256 sequences are equally probable. – Michael Chernick 21 hours ago
    
@MichaelChernick I didn't say it did, I explicitly said that they all have a probability of 0.39%, I'm addressing OP's assumptions. – Random832 20 hours ago
    
You are right. It is another way of saying what I said in my answer. Some of the other answers are wrong. – Michael Chernick 20 hours ago

Consider dice

Think about rolling a couple of dice, a common example of non-uniform distribution. For the sake of the math, imagine the dice are numbered from 0 to 5 instead of the traditional 1 to 6. The reason the distribution is not uniform is that you are looking at the sum of the dice rolls, where multiple combinations can yield the same total like {5, 0}, {0, 5}, {4, 1}, etc. all generating 5.

However, if you were to interpret the dice roll as a 2 digit random number in base 6, each possible combination of dice is unique. {5, 0} would be 50 (base 6) which would be 5*($6^1$) + 0*($6^0$) = 30 (base 10). {0, 5} would be 5 (base 6) which would be 5*($6^0$) = 5 (base 10). So you can see, there is a 1 to 1 mapping of possible dice rolls interpreted as numbers in base 6 versus a many to 1 mapping for the sum of the two dice each roll.

As both @Sycorax and @Blacksteel point out, this difference really boils down to the question of order.

share|improve this answer

The problem, restated, is: Why is the number of combinations of 8 binary digits taken in a range of 1 to 8 objects at a time different from the number of permutations of 8 binary digits.

The answer is: Those are two different encodings. To encode the numbers so that each sequence is unique we view that number as being a binary integer $\sum_{i=1}^82^{i-1}{X_i}$, where ${X_i}$ are the left to right $i^{th}$ digits in the binary sequence of random 0's and 1's. What that does is make each permutation unique, as each random digit is then positional encoded. And the total number of permutations is then $2^8=256$. Then, coincidentally one can translate those binary digits into the base 10 numbers 0 to 255 without loss of uniqueness, or for that matter one can rewrite that number using any other lossless encoding (e.g. lossless compressed data, Hex, Octal). The question itself, however, is a binary one. Each permutation is then equally probable because there is then only one way each unique encoding sequence can be created, and we have assumed that the appearance of a 1 or a 0 is equally likely anywhere within that string.

When the lossless encoding is abandoned (mentally, mind you), we then have a lossy encoding in which outcomes (and information) is combined. We are then viewing the number series as $\sum_{i=1}^82^{0}{X_i}$, which in turn reduces to $C(8,\sum_{i=1}^8{X_i})$, the number of combinations of 8 objects taken $\sum_{i=1}^8{X_i}$ at at time, and for that different problem, the probability of exactly 4 1's is 70 ($C(8,4)$) times greater than obtaining 8 1's, because there are 70, equally likely permutations that can produce 4 1's.

share|improve this answer
1  
Using the conventional way to name numbers--by omitting all reference to preceding zeros--potentially confuses this explanation. Don't you think the situation would become much clearer by writing $0$ as 00000000, $1$ (which you inadvertently omitted) as 00000001, and so on? – whuber yesterday
    
@whuber Yes, I did inadvertently omit the 1. Finger counting in binary is not limited by the number of "digits" in a register. integer counting in a register shows the state of all of the places in that register. I suppose that is confusing for people who do not know enough electronics to know how to assemble a register from spare parts. To get negative votes from a lack of knowledge is somehow a bit discouraging. There is a difference in counting numbers in binary which is not register limited, and the contents of a byte sized register. – Carl yesterday
    
@Glen_b What about now that I have (again) followed up on your advice? – Carl yesterday
13  
Frankly this is all correct as far as it goes but it doesn't address the question. You've done a fine job of showing how eight ordered bits can represent numbers in the range, but haven't explained why selecting those bits at random give a uniform distribution (something which is, admittedly, so simple that explaining it clearly takes some subtlety). – dmckee yesterday
6  
Wouldn't it be simpler to say that 8 (independently) random bits is uniformly distributed on [00000000, 11111111] for the same reason that 3 random digits is uniformly distributed on [000, 999]? The side rant about how/why computers use binary and the fractional bases is totally unnecessary and unrelated. I mean, the fact that binary uses only the symbols 0 and 1 is just an inherent property of base 2... no need to explain that. If you wanted to keep that kind of explanation in there, it would probably be more useful to explain how bases work in general, but it would still be beside the point. – Blackhawk 17 hours ago

protected by Michael Chernick 9 hours ago

Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).

Would you like to answer one of these unanswered questions instead?

Not the answer you're looking for? Browse other questions tagged or ask your own question.