I am writing one socket client server application where server need to send one large buffer to client and all buffer should be processed separate, so I want to put buffer length in buffer so client read length amount of data from buffer and process.

To put length value I need to divide integer value in one byte each and store it in buffer to be sent over socket. I am able to break integer into four parts but at time of joining I am not able to retrieve correct value. To demonstrate my problem I have written one sample program where I am dividing int into four char variables and then join it back in another integer. Goal is that after joining I should get same result.

Here is my small program.

#include <stdio.h>

int main ()
{
    int inVal = 0, outVal =0;
    char buf[5] = {0};

    inVal = 67502978;

    printf ("inVal : %d\n",inVal);

    buf[0] = inVal & 0xff;
    buf[1] = (inVal >> 8) & 0xff;
    buf[2] = (inVal >> 16) & 0xff;
    buf[3] = (inVal >> 24) & 0xff;

    outVal = buf[3];
    outVal = outVal << 8;
    outVal |= buf[2];
    outVal = outVal << 8;
    outVal |= buf[1];
    outVal = outVal << 8;
    outVal |= buf[0];

    printf ("outVal : %d\n",outVal);
    return 0;
}

Output inVal : 67502978 outVal : -126

Please let me know what I am doing wrong.

Thanks!

share|improve this question
1  
Possible duplicate of Integer overflow and undefined behavior – LPs 3 hours ago
    
Is it possible that your architecture has 64 bit ints? – Peter A. Schneider 3 hours ago

One problem is that you are using bit-wise operators on signed numbers. This is always a bad idea and almost always incorrect. Please note that char has implementation-defined signedness, unlike int which is always signed.

Therefore you should replace int with uint32_t and char with uint8_t. Should you use bit shifts on a number that is negative, you might get bugs. Similarly, if you shift data into the sign bits of a signed number, you will get bugs.

And needless to say, the code will not work if integers are not 4 bytes large.

share|improve this answer
    
@LPs My thinking too. The general algorithmic idea is, if I'm not mistaken, endianness-independent because the bit shift operator is endianness-ignorant (or -invariant, or whatever: left shift is always multiplying, right-shift dividing). This is the way to go in order to implement ntohl and friends without looking at the architecture's endianness. – Peter A. Schneider 2 hours ago
1  
@LPs Hmm, yeah that was a bit confusing, I removed it. Pre-coffee post, sorry. – Lundin 2 hours ago
    
@PeterA.Schneider I was referring only to the "array notation is endianess-dependent". Cross platforms matters is my bread and butter ;) – LPs 2 hours ago

Your method has potential implementation defined behavior as well as undefined behavior:

  • storing values into the array of type char beyond the range of type char has implementation defined behavior: buf[0] = inVal & 0xff; and the next 3 statements.

  • left shifting negative values invokes undefined behavior: if any of the 3 first bytes in the array becomes negative as the implementation defined result of storing a value larger than CHAR_MAX into it, the resulting outVal becomes negative, left shifting it is undefined.

In your specific example, your architecture uses 2s complement representation for negative values and the type char is signed. The value stored into buf[0] is 67502978 & 0xff = 130, becomes -126. The last statement outVal |= buf[0]; sets bits 7 through 31 of outVal and the result is -126.

You can avoid these issues by using an array of unsigned char and values of type unsigned int:

#include <stdio.h>

int main(void) {
    unsigned int inVal = 0, outVal = 0;
    unsigned char buf[4] = {0};

    inVal = 67502978;

    printf("inVal: %u\n", inVal);

    buf[0] = inVal & 0xff;
    buf[1] = (inVal >> 8) & 0xff;
    buf[2] = (inVal >> 16) & 0xff;
    buf[3] = (inVal >> 24) & 0xff;

    outVal = buf[3];
    outVal <<= 8;
    outVal |= buf[2];
    outVal <<= 8;
    outVal |= buf[1];
    outVal <<= 8;
    outVal |= buf[0];

    printf("outVal: %u\n", outVal);
    return 0;
}

Note that the above code still assumes 32-bit ints.

share|improve this answer
    
Actually, it doesn't assume 8-bit bytes. It assumes that unsigned char can hold at least 8 bits (which the standard guarantees). (For the calculation of outval it also assumes that buf only contains valid values in the range 0..255.) – Martin Bonner 2 hours ago
    
The program does not have undefined behavior. It also does not have implementation defined behavior since all values stored in the chars are masked by 0xff and thus representable in a char which is guaranteed to have at least 8 bits. – Peter A. Schneider 1 hour ago
    
@MartinBonner: good point. I removed the 8-bit char part. – chqrlie 1 hour ago
    
@PeterA.Schneider: it does have implementation defined behavior in C: 6.3.1.3 Signed and unsigned integers: When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged. 2 Otherwise, if the new type is unsigned, ... 3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised. The values masked by 0xff may exceed the range of type char, and 130 indeed does. – chqrlie 1 hour ago
    
@PeterA.Schneider: The program does not invoke undefined behavior for the specific value inVal = 67502978, but the method used does for many other values, such as inVal = 32768. – chqrlie 1 hour ago

use unsigned char buf[5] = {0}; and it should work.

At cppreference.com on bitshift operators we can find the following:

For signed and positive a, the value of a << b is a * 2b if it is representable the return type, otherwise the behavior is undefined. (until C++14), the value of a << b is a * 2b if it is representable in the unsigned version of the return type (which is then converted to signed: this makes it legal to create INT_MIN as 1<<31), otherwise the behavior is undefined. (since C++14)

For negative a, the behavior of a << b is undefined.

So the main issue is operation

outVal = buf[3];
outVal = outVal << 8;

because buf[3] being negative implies that outVal is negative, and behaviour of left shifting negative numbers is undefined.

share|improve this answer
    
Where do you see an overflowing integer operation? All shifts are done on (afaics none-overflowing) ints. – Peter A. Schneider 3 hours ago
2  
This is not an integer overflow, it is a direct manipulation of the sign bits. It is undefined behavior per C11 6.5.7/4: "The result of E1 << E2 is E1 left-shifted E2 bit positions;" /--/ "If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined." – Lundin 2 hours ago
    
@Lundin: You are right, since there is a more specific specification of bit shift's behaviour. I actually thought it would also violate the more general overflow rule, but maybe I am wrong. – Stephan Lechner 2 hours ago
    
@Lundin All shits are well defined, cf. my comment to Saurav's post. Stephan: The shifts are not the issue here. – Peter A. Schneider 2 hours ago
    
@PeterA.Schneider - ignore my comment. It was wrong. – Martin Bonner 2 hours ago

C++ standard N3936 quotes about shift operators:

The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled.

If E1 has an unsigned type,

the value of the result is E1 × 2^E2, reduced modulo one more than the maximum value representable in the result type.

Otherwise, if E1 has a signed type and non-negative value,

and E1 × 2^E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.

So, to avoid undefined behaviour, it is recommended to use unsigned data types, and ensure the 64-bits length of data type.

share|improve this answer
1  
Hm. I see the following values for outval before each shift: (32 bit little endian ints): 4; 1030; 263680. None of them are negative, and the results are all within range, so they are all well defined. The value after the last shift is 67502848. It's the last OR that creates the negative value. – Peter A. Schneider 2 hours ago

While bit shifts of signed values can be a problem, this is not the case here (all left hand values are positive, and all results are within the range of a 32 bit unsigned int).

The problematic expression with somewhat unintuitive semantics is the last bitwise OR:

outVal |= buf[0];

buf[0] is a (on your and my architecture) signed char with the value -126, simply because the most significant bit in the least significant byte of 67502978 is set. In C all operands in an arithmetic expression are subject to the arithmetic conversions. Specifically, they undergo integer promotion which states: "If an int can represent all values of the original type [...], the value is converted to an int". Accordingly, the signed character buf[0] is converted to a (signed) int, preserving its value of -126. A negative signed int has the sign bit set. ORing that with another signed int sets the result's sign bit as well, making that value negative. That is exactly what we are seeing.

Making the bytes unsigned chars fixes the issue because the value of the temporary integer to which the unsigned char is converted is then a simple 8 bit value of 130.

share|improve this answer

Because of endian differences between architectures, it is best practice to convert numeric values to network order, which is big-endian. On receipt, they can then be converted to the native host order. We can do this in a portable way by using htonl() (host to network "long" = uint32_t), and convert to host order on receipt with ntohl(). Example:

#include <stdio.h>
#include <arpa/inet.h>

int main(int argc, char **argv) {
  uint32_t inval = 67502978, outval, backinval;

  outval = htonl(inval);
  printf("outval: %d\n", outval);
  backinval = ntohl(outval);
  printf("backinval: %d\n", backinval);
  return 0;
}

This gives the following result on my 64 bit x86 which is little endian:

$ gcc -Wall example.c
$ ./a.out
outval: -2113731068
backinval: 67502978
$
share|improve this answer
    
While your remark raises an important point for the binary exchange of values, it does not explain the OP's problem. Using a different byte order would not solve the problem either. – chqrlie 2 hours ago
1  
While @chqrlie has a point, +1 for the mandatory pointer to htonl and friends. I suspect that many spontaneous resets of the various embedded systems in my household could be avoided if people stopped attempts at re-implementing htonl etc. Disseminating best practice is a service to mankind. Especially because the attempts are usually not a sign of competence and thus bound to fail. – Peter A. Schneider 2 hours ago
2  
I really really don't like the interface of htonl et al. htonl should return an array of octets - not an int (I can imagine a platform where htonl of a valid integer ends up returning a trap value). – Martin Bonner 2 hours ago
1  
A little endian sign and magnitude or ones-complement machine where negative zero is a trap. A 0x00000080 would end up as a big-endian 0x80000000 which is the trap value. – Martin Bonner 2 hours ago
1  
@MartinBonner Although it is called htonl() it doesn't return an int, it returns a uint32_t. I would expect that 0x80000000 is a valid unsigned 32 bit integer on all architectures. – Jim D. 1 hour ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.