Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does the following code fragment cause undefined behaviour?

    unsigned int x;
    x -= x;
 
There's a lengthy StackOverflow thread where various C language-lawyers disagree on what the spec has to say about trap values, and under what circumstances reading an uninitialised variable causes UB. I'd appreciate an authoritative answer. Thanks for dropping by on HN!

https://stackoverflow.com/q/11962457/



Yes, it's undefined. It involves a read of an uninitialized local variable. Except for the special case of unsigned char, any uninitialized read is undefined.


>Except for the special case of unsigned char, any uninitialized read is undefined.

Could you expand on this?


An object of any type, initialized or not, can be read by an lvalue of unsigned char (or any character type). That lets functions like memcpy (either the standard one or a hand-rolled loop) copy arbitrary chunks of memory.

There's some debate about the effects of reading an uninitialized local variable of unsigned char (like whether the same value must be read each time, or whether it's okay for each read to yield a different value).

This special exemption doesn't extend to any other types, regardless of whether or not they have padding bits or trap representations that could cause the read to trap. Few types do, yet the behavior of uninitialized reads in existing implementations is demonstrably undefined (inconsistent or contradictory to invariants expressed in the code of a test case), so any subtleties one might derive from the text of the standard must be viewed in that light.


Thanks for your answers. A related question: this article [0] appears to single out memcpy and memmove as being special regarding effective type. Is it accurate? It seems to be at odds with your suggestion that there's nothing stopping me writing my own memcpy provided I'm careful to use the right types.

[0] https://en.cppreference.com/w/c/language/object#Effective_ty...


I think that may be inaccurate -- IIRC, in C, you can do type punning via a union but not memcpy, and in C++ you can do type punning via memcpy but not a union and this incompatibility drives me nuts because it makes inline functions in a header file shared between C and C++ really messy. (Moral of the story: don't pun types.)


The C standard also allows to use memcpy to do type punning:

    If a value is copied into an object having no declared type using memcpy or memmove,
    or is copied as an array of character type, then the effective type of the modified
    object for that access and for subsequent accesses that do not modify the value is
    the effective type of the object from which the value is copied, if it has one
Simply memcpy into a variable (as opposed to dynamically allocated memory).

https://port70.net/~nsz/c/c11/n1570.html#6.5p6


I must be remembering incorrectly then, thank you!


memcpy and memmove aren't special. The part that discusses the copying of allocated objects is 6.5, p6, quoted below:

The effective type of an object for an access to its stored value is the declared type of the object, if any. If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is opied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.


I see, so in short the article is failing to reflect this excerpt: or is copied as an array of character type. Thanks again.


Has there ever been any consensus as to what that "...or is copied as an array of character type..." text is supposed to mean, or what sort of hoops must be jumped through for a strictly conforming program to generate an object whose bit pattern matches another without copying the effective type thereof?



I'm guessing you were asking about this part rather than UB in general:

> Except for the special case of unsigned char,

The SO article makes the bizarre claim that because

(1) an unsigned char, per the standard, cannot have any padding bits, it therefore cannot have a trap representation. And

(2) if it cannot have a trap representation, the use of an uninitialized value isn't undefined.

I'm willing to buy (1) but I don't remember (2) being required for UB. I think (2) is the step that is harder to follow intuitively. Admittedly, I have not read that part of the standard closely in some time.


This example is clearly UB.

You could argue that it suddenly becomes less UB if you take the address of x:

  unsigned int x;
  &x;
  x -= x;
I'm not sure if this will add anything to the discussion on SO, but if you allow programs to do this, then after applying modern optimizing C compilers, you may end with multiplications by 2 that produce odd results, or uninitialized char variables that contain 500: http://blog.frama-c.com/index.php?post/2013/03/13/indetermin...

So the short answer is that, for all intent and purposes, you should consider use of uninitialized variables as UB, because C compilers already do. (There exists somewhere a document clarifying what C compilers can and cannot do with indeterminate values. A search for “wobbly values” might turn it up. Anyway, you do not want to have wobbly values in your C programs any more than you want it to have undefined behavior.)


Interesting link, thanks. So then:

* Under C90, reading an uninitialized local was explicitly listed as UB.

* Under C99, if you weren't using a character type, it was still essentially UB, by way of trap values. (I don't think the particulars of the target hardware platform are relevant.)

* C11 reintroduced UB even for some cases involving character types. We were already invoking UB under C99, so we know we're still invoking UB under C11.

> You could argue that it suddenly becomes less UB if you take the address of x

I don't think so. As we're not using a character type, I don't think taking its address would change anything. This aligns with what msebor said.

Lastly, from the article:

    > No, GCC is still acting as if j *= 2; was undefined.
I think GCC's behaviour is legal here. The target platform may have no trap values, but I don't see that GCC is prohibited from behaving as if there are. It would be legal (albeit bizarre) for it to generate code for a completely different ISA, and to bundle an emulator. If the spec says you've opened the door to UB, then unless your compiler documentation says otherwise, it's permitted to generate code that goes haywire, no?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: