Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are "projects" underway to clean up the spec where it's viewed as either buggy, inconsistent, or underspecified. The atomics and threads sections are a coupled of example.

There are efforts to define the behavior in cases where implementations have converged or died out (e.g., twos complement, shifting into the sign bit).

There have been no proposals to add new array types and it doesn't seem likely at the core language level. C's charter is to standardize existing practice (as opposed to invent new features), and no such feature has emerged in practice. Same for modules. (C++ takes a very different approach.)



> no such feature has emerged in practice

Arrays with length constantly emerge among C users and libraries. They are just all incompatible because without standardization there is no convergence.


I think the problem is that C is simply ill-suited for these "high level" constructs. The best you're likely to get is an ad-hoc special library like for wchar_t and wcslen and friends. Do we really want that?

I'd argue that linked list might make a better candidate for inclusions, because I've seen the kernel's list.h or similar implementations in many projects and that's stuff is trickier to get right than stuffing a pointer and a size_t in a struct.


Sounds like a good use of standardization. If there is existing implementation practice, please go ahead and submit a proposal. I would be happy to champion such a proposal if you can't attend in person.


It was an observation, not suggestion.

When the language standardization body has not managed to add arrays with length in 48 years, I don't think it should be added at this point. The culture is backward looking and incompatible with modern needs and people involved are old and incompatible with the future (no offense, so am I).

C standardization effort should focus on finishing the language, not developing it to match modern world. I have programmed with C over 20 years, since I was a teenager. It's has long been the system programming language I'm most familiar with. For the last 10 years I have never written an executable. Just short callable functions from other languages. Python, Java, Common Lisp, Matlab, and 'horrors or horrors' C++.

I think Standard C's can live next 50 years in gradual decline as portable assembler called from other languages and compilation target.

If I would propose new extension to C language, I would propose completely new language that can be optionally compiled into C and works side by side with old C code.


> If I would propose new extension to C language, I would propose completely new language that can be optionally compiled into C and works side by side with old C code.

There are a few somewhat popular languages that fit that description already, and none of them are suitable replacements for C (as far as I've seen). That's not to say there couldn't be a suitable replacement -- just that nobody in a position to do something about it wants the suitable replacement enough for it to have emerged, apparently.

I suspect the first really suitable complete replacement for C would be something like what Checked C [1] tried to be, but a little more ambitious and willing to include wholly new (but perhaps backward-compatible) features (like some of those you've proposed) implemented in an interestingly new enough way to warrant a whole new compile-to-C implementation. Something like that could greatly improve the use cases where a true C replacement would be most appreciated, and still fit "naturally" into environments where C is already the implementation language of choice via a piecemeal replacement strategy where the first step is just using the new language's compiler as the project compiler front end's drop-in replacement (without having to make any changes to the code at all for this first step).

1: https://www.microsoft.com/en-us/research/project/checked-c/


Sounds like you are describing Zig. https://ziglang.org


I haven't looked at Zig too closely yet (only started just a few minutes ago), but it immediately appears to me that this violates one of the requirements I suggested, as demonstrated by this use-case wish from my previous comment:

> > using the new language's compiler as the project compiler front end's drop-in replacement (without having to make any changes to the code at all for this first step)

I'll look into Zig more, though. Maybe I'll like it.

---

I stand corrected, given my phrasing. I should have specified that it needs to also support incrementally adding the new language's features while most of the code is still unaltered C, rather than (for instance) having to suddenly replace all the includes and function prototypes just because you want to add (in the case of Zig) an error "catch" clause.


You can use the Zig compiler to compile C with no modifications, and easily call C from Zig or Zig from C, so I'm not sure what more you're hoping for. A language that allows you to mix standard C and "improved C" in the same file sounds like a mess to me.


It depends on whether you're talking about an actual whole new, radically different language or something that is essentially C "with improvements". My point is not that C "with improvements" is the ideal approach, only that (at this time, for almost purely social reasons) I don't think C is really subject to replacement except by something that allows you to mix standard C and the "new language" because, apart from specific improvements, they are the same language.

This might come with huge drawbacks, but it still seems like the only socially acceptable way to fully replace C at this time; make it so you can replace it one line of code at a time in existing projects.


typedef struct {uint8_t *data; size_t len;} ByteBuf; is the first line of code I write in a C project.


Could you add some extra information why this is so helpful or handy to have? Think it will benefit readers that are starting out with C etc.


In C, dynamically-sized vectors don’t carry around size information with them, often leading to bugs. This struct attempts to keep the two together.


Memory corruption in sudo password feedback code happened because length and pointer sit as unrelated variables and have to be manipulated by two separate statements every time like some kind of manually inlined function. For comparison putty slice API handles slice as a whole object in a single statement keeping length and pointer consistent.


Another option is a struct with a FAM at the end.

  typedef struct {
      size_t len;
      uint8_t data[];
  } ByteBuf;
Then, allocation becomes

  ByteBuf *b = malloc(sizeof(*b) + sizeof(uint8_t) * array_size);
  b->len = array_size;
and data is no longer a pointer.


Well, your ByteBuf is still a pointer. You also now need to dereference it to get the length. It also can't be passed by value, since it's very big. You can also not have multiple ByteBufs pointing at subsections of the same region of memory.

Thing is, you rarely want to share just a buffer anyway. You probably have additional state, locks, etc. So what I do is embed my ByteBuf directly into another structure, which then owns it completely:

    typedef struct {
        ...
        ByteBuf mybuffer;
        ...
    } SomeThing;
So we end up with the same amount of pointers (1), but with some unique advantages.


Right, totally depends on what you're doing. My example is not a good fit for intrusive use cases.


sizeof(ByteBuf) == sizeof(size_t), and you can pass it by value; I just don't think you can do anything useful with it because it'll chop off the data.


This will an alignment problem on any platform with data types larger than size_t. You'd need an alignas(max_align_t) on the struct. At which point some people are going to be unhappy about the wasteful padding on a memory constrained target.


Why not typedef struct {uint8_t *data, dataend} ?

Makes it easier to take subranges out of it


should be

  typedef struct {uint8_t *data, *dataend} 
if I'm not mistaken :)


What are the advantages of saving the end as a pointer? Genuinely curious. Seems like a length allows the end pointer to be quickly calculated (data + len), while being more useful for comparisons, etc.


You can remove the first k elements of a view with data += k.

With the length you would need to do data += k; length -= k

Especially if you want to use it as safe iterator, you can do data++ in a loop


> ...You can remove the first k elements of a view with data += k.

How would you safely free(data) afterwards? You'd need to keep an alloc'ed pointer somehow.


Got it. That is really neat, going to add to my bag of tricks...


Right. I always think the pointer declaration is part of the type. (that is why I do not use C. Is there really a good reason for this C syntax?)


That's a really bizarre layout for your struct. Why don't you put the length first?


Why would it matter? The bytes aren't inline, this is just a struct with two word-sized fields.

A possible tiny advantage for this layout is that a pointer to this struct can be used as a pointer to a pointer-to-bytes, without having to adjust it. Although i'm not sure that's not undefined behaviour.


I don't think that's undefined behavior. That's how C's limited form of polymorphism is utilized. For example, many data structures behind dynamic languages are implemented in this way. A concrete example would be Python's PyObject which share PyObject_HEAD.

https://github.com/python/cpython/blob/master/Include/object...


I'm not sure if it matters. It might be better for some technical reason, such as speeding up double dereferences, because you don't need to add anything to get to the pointer. But to be honest I just copied it out of existing code.


Most platforms have instructions for dereferencing with a displacement.


The "existing practice" qualification refers to existing compiler extensions I'd guess. Then lobbying about the feature should be addressed to eg LLVM and GCC developers.


> C's charter is to standardize existing practice (as opposed to invent new features)

Passing a pair of arguments (pointer and a length) is surely one of the more universal conventions among C programmers?


When they say "existing practice" they mean things already implemented in compilers -- not existing practice among developers.


This seems like a poor way to establish criteria for standardization. It essentially encourages non-standard practice and discourages portable code by saying that to improve the language standard we have to have mutually incompatible implementations.

It has been said that design patterns (not just in the GOF sense of the term) are language design smells, implying that when very common patterns emerge it is a de facto popular-uprising call for reform. That, to me, is a more ideal criterion for updating a language standard, but practiced conservatively to avoid too much movement too fast or too much language growth.

On the other hand, I think you might be close to what they meant by "existing practice". I'm just disappointed to find that seems like the probable case (though I think it might also include some convergent evolutionary library innovations by OS devs as well as language features by compiler devs).


One of the principles for the C language is that you should be able to use C on pretty much any platform out there. This is one of the reasons that other languages are often written in C.

In order to uphold that principle, it's important that the standard consider not just "is this useful" but "is this going to be reasonably straightforward for compiler authors to add". Seeing that people have already implemented a feature helps C to avoid landing in the "useful feature which nobody can use because it's not widely available" trap. (For example, C99 made the mistake of adding floating-point complex types in <complex.h> -- but these ended up not being widely implemented, so C11 backed that out and made them an optional feature.)


Different implementations are used for different purposes. If 20% of implementations are used for purposes where a feature would be useful, which of the following would be best:

1. Have 10% of implementations support the feature one way, and 10% support it in an incompatible fashion.

2. Require that all compiler writers invest the time and necessary to support the feature without regard for whether any of their customers would ever use it.

3. Specify that implementations may either support the feature or report that they don't do so, at their leisure, but that implementations which claim to support the feature must do so in the manner prescribed by the Standard.

When C89 was written, the Committee decided that rather than recognizing different categories of implementation that support different sets of features, it should treat the question of what "popular extensions" to support as a Quality of Implementation which could be better resolved by the marketplace than by the Committee.

IMHO, the Committee should recognize categories of Safely Conforming Implementation and Selectively Conforming Program such that if an SCI accepts an SCP, and the translation and execution environments satisfy all documented requirements of the SCI and SCP, the program will behave as described by the Standard, or report in Implementation-Defined fashion an inability to do so, period. Any other behavior would make an implementation non-conforming. No "translation limit" loopholes.


That's obviously true, but at the same time the specifics of how one chooses to set criteria for inclusion in the standard should probably keep in mind the social consequences. If the intended consequence (e.g. ensuring that implementation is easy enough and desired enough to end up broadly included for portability) and the likely consequence (e.g. reduced standardization of C capabilities in practice, with rampant relianced by developers on implementation-specific behavior to the point almost nobody writes portable code any longer) differ too much, it's time to revisit the mechanisms that get us there.


What is meant by "portable code"? Should it refer only to code that should theoretically be usable on all imaginable implementations, or should it be expanded to include code which may not be accepted by all implementations, but which would have an unambiguous meaning on all implementations that accept it?

Historically, if there was some action or construct that different implementations would process in different ways that were well suited to their target platforms and purposes, but were incompatible with each other, the Standard would simply regard such an action as invoking Undefined Behavior, so as to avoid requiring that any implementations change in a way that would break existing code. This worked fine in an era where people were used to examining upon precedent to know how implementations intended for certain kinds of platforms and purposes should be expected to process certain constructs. Such an approach is becoming increasingly untenable, however.

If instead the Standard were to specify directives and say that if a program starts with directive X, implementations may either process integer overflow with precise wrapping semantics or refuse to process it altogether, if it starts with directive Y, implementations may either process it treating "long" as a 32-bit type or refuse to process it altogether, etc. this would make it much more practical to write portable programs. Not all programs would run on all implementations, but if many users of an implementation that targets a 64-bit platform need to use code that was designed around traditional microcomputer integer types, a directive demanding that "long" be 32 bits would provide a clear path for the implementation to meet its customers' needs.


> What is meant by "portable code"? Should it refer only to code that should theoretically be usable on all imaginable implementations, or should it be expanded to include code which may not be accepted by all implementations, but which would have an unambiguous meaning on all implementations that accept it?

That's a good question. I'm not sure I know. I could hazard a guess at what would be "best", but I'm not particularly confident in my thoughts on the matter at this time. As long as how that is handled is thoughtful, practical, consistent, and well-established, though, I think we're much more than halfway to the right answer.

> Historically, if there was some action or construct that different implementations would process in different ways that were well suited to their target platforms and purposes, but were incompatible with each other, the Standard would simply regard such an action as invoking Undefined Behavior, so as to avoid requiring that any implementations change in a way that would break existing code.

If I understand correctly, that would actually be "implementation-defined", not "undefined".

> a directive demanding that "long" be 32 bits would provide a clear path for the implementation to meet its customers' needs

There are size-specific integer types specified in the C99 standard (e.g. `uint32_t`). I use those, except in the most trivial cases (e.g. `int main()`), and limit myself to those size-specific integer types that are "guaranteed" by the standard.


> If I understand correctly, that would actually be "implementation-defined", not "undefined".

That is an extremely common myth. From the point of view of the Standard, the difference between Implementation Defined behavior and Undefined Behavior is that implementations are supposed to document some kind of behavioral guarantee with regard to the former, even in cases where it would be impractical for a particular implementation to guarantee anything at all, and nothing that implementation could guarantee in those cases would be useful.

The published Rationale makes explicit an intention that Undefined Behavior, among other things, "identifies areas of conforming language extension".

> There are size-specific integer types specified in the C99 standard (e.g. `uint32_t`). I use those, except in the most trivial cases (e.g. `int main()`), and limit myself to those size-specific integer types that are "guaranteed" by the standard.

A major problem with the fixed-sized types is that their semantics are required to vary among implementations. For example, given

    int test(uint16_t a, uint16_t b, uint16_t c) { return a-b > c }
some implementations would be required to process test(1,2,3); so as to return 1, and some would be required to process it so as to return 0.

Further, if one has a piece of code which is written for a machine with particular integer types, and a compiler which targets a newer architecture but can be configured to support the old set of types, all one would need to do to port the code to the new platform would be to add a directive specifying the required integer types, with no need to rework the code to use the "fixed-sized" types whose semantics vary among implementations anyway.


What is your definition of "portable"? Are you using that term to mean "code I write for one platform can run without modification on other platforms" or "the language I use for one platform works on other platforms"?

I think when you get down to the level of C you're looking at the latter much more than the former. C is really more of a platform-agnostic assembler. It's not a design smell to have conventions within the group of language users that are de-facto language rules. For reference, see all the PEP rules about whitespace around different language constructs. These are not enforced.

The whole point of writing a C program is to be close to the addressable resources of the platform, so you'd probably want to expose those low-level constructs unless there's a compelling reason not to. Eliminating an argument from a function by hiding it in a data structure is not that compelling to me since I can just do that on my own. And then I can also pass other information such as the platforms mutex or semaphore representation in the same data structure if I need to.

By the way, that convenient length+pointer array requires new language constructs for looping that are effectively syntactic sugar around the for loop. Or you need a way to access the members of the structure. And syntactic sugar constrains how you can use the construct. So I'm not sure that it adds anything to the language that isn't already there. And the fact that length+pointer is such a common construct indicates that most people don't have any issues with it at all once they learn the language.


> And the fact that length+pointer is such a common construct indicates that most people don't have any issues with it at all once they learn the language.

Given the prevalence of buffer overflow bugs in computing, I'd say that there are quite a few programmers who have quite a few issues with this concept in practice.

The rest of your arguments are quite sound, but I have to disagree with that one.


> What is your definition of "portable"?

In that particular statement at the beginning of my preceding comment, I meant portability across compiler implementations.

> Eliminating an argument from a function by hiding it in a data structure is not that compelling to me since I can just do that on my own.

I meant to refer more to the idea that, when doing it on your own in a particular way, the compiler could support applying a (set of) constraint(s) to prevent overflows (as an example), such that any constraint couldn't be bypassed except by very obviously intentional means. Just automating the creation of the very, very simply constructed "plus a numeric field" struct seems obviously not worth including as a new feature of the standardized language.

> the fact that length+pointer is such a common construct indicates that most people don't have any issues with it

I think you're measuring the wrong kind of problem. Even C programmers with a high level of expertise may have problems with this approach, because it's when programmer error causes a problem not caught by code review or the compiler via buffer overflows (for instance) that we see a need for more.


>There have been no proposals to add new array types and it doesn't seem likely at the core language level.

One alternative to adding types is to allow enforcing consistency in some structs with the trailing array:

    struct my_obj {
      const size_t n;
      //other variables
      char text[n];
    };
where for simplicity you might only allow the first member to act as a length (and it must of course be constant). The point is that then the initializer:

    struct my_obj b = {.n = 5};
should produce an object of the right size. For heap allocation you could use something like:

    void * vmalloc(size_t base, size_t var, size_t cnt) {
      void *ret = malloc(base + var * cnt);
      if (!ret) return ret;
      * (size_t *) ret = cnt;
      return ret;
    }


What should happen if you reassign the object?


What do you mean "reassign"?

You can't reassign the length variable since it's marked `const`. You should see something like "warning: assignment discards `const` qualifier from pointer target type" if you pass it to `realloc`, which tells you that you're breaking consistency (I guess this might be UB). You could write `vrealloc` to allow resizing such structs, which would probably be called like:

    my_obj *tmp = vrealloc(obj, sizeof(obj), sizeof(obj->text), obj->n, newsize);


What would you do with the old text? Delete it?


Could you please be more specific about what you're trying to say? I have no idea what your actual objection is.


I would love this.


Actually there was no need to disenfranchise non-twos-complement architectures. Now that SIMH has a CDC-1700 emulation, I had planned on producing a C system for it as an example for students who have never seen such a model.


Rather than trying to decide whether to require that all implementations must use two's-complement math, or suggest that all programs should support unusual formats, the Standard should recognize some categories of implementations with various recommended traits, and programs that are portable among such implementations, but also recognize categories of "unusual" implementations.

Recognizing common behavioral characteristics would actually improve the usability of arcane hardware platforms if there were ways of explicitly requesting the commonplace semantics when required. For example, if the Standard defined an intrinsic which, given a pointer that is four-byte aligned, would store a 32-bit value with 8 bits per byte little-endian format, leaving the any bits beyond the eighth (if any) in a state which would be compatible with using "fwrite" to an octet-based stream, an octet-based big-endian platform could easily process that intrinsic as a byte-swap instruction followed by a 32-bit store, while a compiler for a 36-bit system could use a combination of addition and masking operations to spread out the bits.


This sounds like something memcpy would do already for you?


A 36-bit system with (it sounds like) 9-bit bytes stores bit 8 of a int in bit 8 of a char, and bit 9 of the int in bit 0 of the next char; memcpy won't change that. They're asking for somthing like:

  unsigned int x = in[0] + 512*in[1] + 512*512*in[2] + 512*512*512*in[3];
  /* aka x = *(int*)in */
  
  out[0] = x & 255; x>>=8;
  out[1] = x & 255; x>>=8;
  out[2] = x & 255; x>>=8;
  out[3] = x & 255;
  /* *not* aka *(int*)out = x */


The amount of effort for a compiler to process optimally all 72 variations of "read/write a signed/unsigned 2/4/8-byte big/little-endian value from an address that is aligned on a 1/2/4/8-byte boundary" would be less than the amount of effort required to generate efficient machine code for all the ways that user code might attempt to perform such an operation in portable fashion. Such operations would have platform-independent meaning, and all implementations could implement them in conforming fashion by simply including a portable library, but on many platforms performance could be enormously improved by exploiting knowledge of the target architecture. Having such functions/intrinsics in the Standard would eliminate the need for programmers to choose between portability and performance, by making it easy for a compiler to process portable code efficiently.


I'm not disagreeing, just showing code to illustrate why memcpy doesn't work for this. Although I do disagree that writing a signed value is useful - you can eliminate 18 of those variations with a single intmax_t-to-twos-complement-uintmax_t function (if you drop undefined behaviour for (unsigned foo_t)some_signed_foo this becomes a no-op). A set of sext_uintN functions would also eliminate 18 read-signed versions. Any optimizing compiler can trivially fuse sext_uint32(read_uint32le2(buf)), and minimal implementations would have less boilerplate to chew through.


> Although I do disagree that writing a signed value is useful

Although the Standard defines the behavior of signed-to-unsigned conversion in a way that would yield the same bit pattern as a two's-complement signed number, some compilers will issue warnings if a signed value is implicitly coerced to unsigned. Adding the extra 18 forms would generally require nothing more than defining an extra 24 macros, which seems like a reasonable way to prevent such issues.


Fair point; even if the combinatorical nature of it is superficially alarming, that's probably not a productive area to worry about feature creep in.


72 static in-line functions. If a compiler does a good job of handling such things efficiently, most of them could be accommodated by chaining to another function once or twice (e.g. to read a 64-bit value that's known to be at least 16-bit aligned, on a platform that doesn't support unaligned reads, read and combine two 32-bit values that are known to be 16-bit likewise).

Far less bloat than would be needed for a compiler to recognize and optimize any meaningful fraction of the ways people might write code to work around the lack of portably-specified library functions.


Ah, I see.


>clean up the spec

Would this involve further specification of bitfields? Feel implementation defined nature of bitfields limits potential


What parts of bitfields are implementation defined?


looking here https://en.cppreference.com/w/c/language/bit_field seems quite a bit. My main thought was how field's laid out in memory. Know would be big change with endianness but thought a standard check might be useful...?


> C's charter is to standardize existing practice (as opposed to invent new features), and no such feature has emerged in practice. Same for modules. (C++ takes a very different approach.)

One thing that I'd really like to see would be some new categories of compliance. At present, the definition of "conforming C program" makes it possible to accomplish any task that could be done in any language with a "conforming C program", since the only thing necessary for something to be a conforming C program would be for there to exist some conforming implementation in the universe that accepts it. Unfortunately, the Standard says absolutely nothing useful about the effect of attempting to use an arbitrary conforming C program with an arbitrary conforming C implementation. It also fails to define a set of programs where it even attempts to say much of anything useful about the behavior of a freestanding implementation (since the only possible observable behavior of a strictly conforming program on a freestanding implementation would be `while(1);`).

I would propose defining the terms "Safely Conforming Implementation" and "Selectively Conforming Program" such that feeding any SCP to any SCI, in circumstances where the translation and execution environments satisfy all requirements documented for the program and implementation, would be required not to do anything other than behave as specified, or indicate in documented fashion a refusal to do so. An implementation that does anything else when given a Selectively-Conforming Program would not be Safely Conforming, and a program which a Safely Conforming Implementation could accept without its behavior being defined thereon would not be a Selectively Conforming Program.

While it might seem awkward to have many implementations support different sets of features, determining whether a Safely Conforming Implementation supports all the features needed for a Selectively Conforming Program would be trivially easy: feed the program to the implementation and see if it accepts it.

I think there's a lot of opposition to "optional" features because of a perception that features that are only narrowly supported are failures. I would argue the opposite. If 20% of compilers are used by people who would find a feature useful, having the feature supported by that 20% of compilers, while the maintainers of the other 80% direct their effort toward things other than support for the feature, should be seen as a superior outcome to mandating that compiler writers waste time on features that won't benefit their customers.

Realistically speaking, it would be impossible to define a non-trivial set of programs that all implementations must process in useful fashion. Instead of doing that, I'd say that the question of whether an implementation can usefully process any program is a Quality of Implementation issue, provided that implementations reject all programs that they can't otherwise process in any other conforming fashion.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: