Buttons840
4 days ago
After many years of programming in other languages, I finally learned C, and came to realize that there aren't actually any compilers that implement all of the C spec. Even GCC and Clang have their grey areas and their bugs.
Before this, I had thought that C was a simple language. An idea propped up by articles likes this, as well as the oft touted fact that nearly every embedded system has a C compiler; no matter what you'll always have a C compiler.
This point was driven home by part of a blog post that simply states "you can't actually parse a C header"[0]. The blog makes a good supporting case for their claim. They link to a paper that says[1]:
> There exist many commercial and academic tools that can parse C.... Unfortunately, these parsers are often either designed for an older version of the language (such as C89) or plain incorrect. The C11 parsers found in popular compilers, such as GCC and Clang, are very likely correct, but their size is in the tens of thousands of lines.
And sure enough, in the OP linked blog post, they state they are only implementing a subset of the language. Of course, it still has value as a teaching tool; this is just a tangential fact about C I wanted to discuss.
[0]: https://faultlore.com/blah/c-isnt-a-language/#you-cant-actua...
flohofwoe
4 days ago
> I finally learned C, and came to realize that there aren't actually any compilers that implement all of the C spec.
I think the main reason for this is that the C spec was always just an attempt to somewhat harmonize the already existing features of different C compilers, e.g. implementations come first and then after one or two decades, the C committee tries to standardize the features that have survived. That doesn't mean that all C compiler vendors immediatedly hop on board.
But then there's of course MSVC which after a few hopeful years of modernizing their C frontend now seems to have abandondend it again (I wonder though if MSVC as a whole has been abandondend looking at the abundance of red for MSVC in the C++23 and C++26 tables: https://en.cppreference.com/w/cpp/compiler_support.html)
While looking like a weird approach at first, it has definitely worked better than the C++ committee's approach to accept 'ideas' into the standard without even a proof-of-concept implementation - e.g. the one good thing about C++ is that it acts as a filter to stupid ideas before they can make it into C ;)
pjmlp
4 days ago
I think it is a side effect of SFI (Secure Safety Initiative) at Microsoft, Azure and Windows development guidelines to use managed safe languages or Rust, leaving C and C++ for existing code bases.
Even though Microsoft employees tend to dismiss this at Reddit discussions, the lack of resources is quite visible.
flohofwoe
4 days ago
In that case they should really just deprecate MSVC and point C and C++ devs to Clang. Would make life a lot easier for library authors.
pjmlp
4 days ago
Clang is included on Visual Studio installer for ages.
It was the official answer back when they decided only to do C++ and leave C behind, until the change of heart with C11/C17 support.
Also note Apple and Google aren't any longer the nice sponsors of clang, hence it has also lost steam as other contributors aren't at the same level as they used to be.
4gotunameagain
4 days ago
The first link is full of youthful anger (and a bit of cringe) not recognising the immense technical debt C is carrying. It is a 50 year old language written on a PDP.
For example the author rages about things like integer sizes, while every single serious C programmer does not use ambiguously sized types.
Sure, C has issues. A lot. But it is the cornerstone of our technological marvel. Everything around you runs C, everything in space runs C as well.
Do we have much better options for some applications nowadays ? Of course.
Will C go away ? No.
dlcarrier
4 days ago
C has a lot of feature creep, and C++ is just C with extra feature creep.
The original C compiler ran on a PDP-11, which usually had just kilobytes of RAM. The syntax was written around compiling with such limited resources, hence the need for headers, primitives, semicolons, linkers, and so on.
It has changed a lot over time, but seems to be adding baggage, not removing it.
Joker_vD
4 days ago
The original C compiler had no need for headers or function prototypes/forward declarations. Of course, it also was not a single-pass compiler: it had two (and a half) passes and generated assembly that would then be assembled by a two-pass assembler.
fuzztester
4 days ago
Yes, function prototypes were introduced in the first ANSI version of C, IIRC, which came some years after the original C. The prototype feature was described in the second version of the classic K&R C book, The C Programming Language.
pjmlp
4 days ago
Function prototypes came to ANSI/ISO C via the ongoing work on ISO C++.
fuzztester
4 days ago
Doesn't matter where it came from - in the context of my previous comment.
I was talking w.r.t. the earlier version of C, and not in connection with C++.
pjmlp
3 days ago
It does from historical purposes, how the C standard came to be, what are the decisions that turned K&R C into C89.
pjmlp
4 days ago
Only for C89 versus C++98, in current days of C2y versus C++23, they are two worlds apart.
1vuio0pswjnm7
4 days ago
"Before this, I had thought that C was a simple language."
It was a simple language. It can still be used that way
As hobbyist I write simple programs that can be compiled with -std=c89
I use these programs every day. They are faster than their equivalents in python, smaller than their equivalents in go, and require less resources or dependencies to compile than their equivalents in rust
It is easy to take something simple and make it complex
Software developers do this consistently; software/language "by committee" faciltates it
Generally developers commenting publicly do not like "simple", they prefer "easy"
C89 is still useful and there are lots of things that rely on it
wredcoll
4 days ago
C the language is simple until you actually have to do something useful with it then you have to memorize the apis of every library you import.
1vuio0pswjnm7
4 days ago
I have done something useful (to me) with "C the language"
Generally, I do not use any third party libraries besides the "standard" ones that come with UNIX-llike OS
I do not _have_ to use other third party libraries, it's a choice; I choose not to use them for personal reasons that may or may not apply to anyone else
sylware
4 days ago
More real life C compilers is always a good thing. But remember: to be able to build a linux kernel, you will need zillions of gcc extensions... (I have a sweet spot for the alignment attribute of stack variables in printk).
That said, "C" (C99 with benign bits of c11 required for modern hardware programming) is only the "less worse" compromise for a computer language: its syntax is already way too rich (I let you think about the beyond sanity computer language syntax out there, yep... even for currently hyped ones).
For instance: C should have only one loop keyword loop{}, finally a real hard compile time constant definition, no integer promotion, no implicit cast (unless void* or some generic number literals), probably explicit static/dynamic casts (but certainly not with the c++ syntax), sized types should be primitives and not the other way around (s32,u32,f64...), no typedef/typeof/generic/etc, but should include inline support for modern hardware architecture programming aka memory barriers/"atomics"/explicit memory unaligned access, etc.
The benchmark is a small team of average devs should be able to develop a real-life compiler in reasonable amount of time.
Ofc, the more the merrier.
1718627440
4 days ago
> C should have only one loop keyword loop{}
How do you implement do while loops? Also I find the niceties of for loops a good improvement. You are able to limit the scope of variables to only the loop, while still being able to use it in the condition, and it separates the transition from the real loop code. I think idiomatic in C is the overuse of for, resulting in while used really seldomly.
What else do you think is really excessive in syntax alone?
lpribis
4 days ago
> How do you implement do while loops?
loop {
...body
if (condition)
break;
}
1718627440
4 days ago
What is the benefit over this:
loop: {
...body
if (!conditon)
goto loop;
}
What separates loop syntax from goto is explicit syntax for the condition. When you give that up, why do you have loop at all?rkomorn
4 days ago
Shouldn't your body be after the condition check?
Otherwise you get one iteration even if your condition was false to begin with?
1718627440
4 days ago
I specifically asked for a do while loop.
rkomorn
4 days ago
Oh wow. My brain entirely ignored the second do in that sentence.
pjmlp
4 days ago
That is a myth often spread by folks that think K&R C book is everything there is to know, never opened the ISO C draft PDF, learned the differences between POSIX and standard library, tried to make their code portable outside GCC or nowadays clang, or even checked the extensions chapter on the compiler.
anta40
4 days ago
Who told that? I still hear K&R being recommended as introduction material.
If you want to write portable/production-grade C code, well definitely need to study another references.
Arch-TK
4 days ago
There are very few resources for learning C which aren't themselves full of terrible C.
If you want a short introduction with the caveat that it only covers C89, only covers parts of it, and doesn't cover e.g. POSIX or anything outside of standard C then K&R2 + errata is fine.
If you want a long book on C which has a more modern approach then there is K. N. King's C a Modern Approach.
Jens' book is at least vouched for by https://iso-9899.info/wiki/Books . So I have to assume it's also okay.
adamors
4 days ago
Exactly, I was looking into refreshing my C knowledge recently and K&R is still heavily recommended.
flohofwoe
4 days ago
The problem with K&R is that there never was a third edition that covered C99, which compared to C89 is almost a new language.
K&R is an interesting historical artifact about the basic design decisions of the C language and definitely recommended reading material, but you're not doing yourself a favour using it as reference or for learning the language. For that it is vastly outdated, C ist a much more enjoyable and powerful language since C99.
pjmlp
4 days ago
Go with such a book instead,
adamors
4 days ago
How does this compare to https://nostarch.com/effective-c-2nd-edition ?
pjmlp
4 days ago
I guess it is also a good one, both authors are still active on WG14, if I am not mistaken.
pjmlp
4 days ago
Because most people know no better, and recommend their UNIX heros.
This is a more useful book for modern days,
https://www.manning.com/books/modern-c
flohofwoe already put it out clearly on his comment.
1718627440
4 days ago
> [0]: https://faultlore.com/blah/c-isnt-a-language/#you-cant-actua...
This blog post is full of misconceptions.
It starts by asserting, that C defines an ABI and then complains that everything is so complicated, because actually C doesn't define it. C is defined in terms of behaviour of the abstract C machine. As far as C is concerned, there is no ABI. C only prescribes meaning to the language, it does not prescribe how you actually implement this; the compiler is free to do as it pleases, including choosing the ABI in some limits.
What defines the ABI is the platform consisting of the machine and the OS. The OS here includes at least the kernel (or some bare metal primitives) and the compiler. And the standard C library really IS part of the compiler. That's why GCC vs Clang or glibc vs muslc always comes with incompatibilities. Because these ARE different OSs. They can choose to do things the same, but this is because of some formal (POSIX, the platform vendor) or some informal (GCC and Clang) standards.
Yes a lot of ABIs are defined with C syntax, but this is, because C has terms for that and isn't too new. You can specify this in the language of your choice and it will describe the same ABI. Yes, int doesn't have a size independent of the platform. But if the specification wouldn't use C as a syntax, it would just write "this argument has the same size as described in the 'Appendix X Definitions' under 'Platform's default integer size'". Writing "int" is just a shorter notation for exactly this.
> You Can’t Actually Parse A C Header
I don't know why the choice to use the compiler to implement parsing a C header is framed as a bad thing. Is relying on write(2) from the kernel a bad thing instead of trying to bypass the kernel? The compiler is what defines the meaning of a header, why don't ask it about the result? If you don't feel like reimplementing the C preprocessor, you can also just parse preprocessed headers. These are self-contained, i.e. don't need knowing the include path. But of course this approach comes with the caveat that when the user updated the C compiler, your knowledge has become outdated or wrong. I don't know why it is framed as weird, that you need a C parser to parse C code. This is the definition of a C parser. You can't just write code that parses C and is somehow not a C parser.
> 176 triples. I was originally going to include them all for the visual gag/impact but it’s literally too many for even that.
No, they are ONLY 176 target triples (this is the LLVM term, other terms are "gnu tuple" or "gnu type") that your tool supports. There is also not the definite list, it's a syntax to describe the major components of a platform. There are decades of vendors improving their platform in incompatible ways, of course the description of this is messy.
See for example: https://news.ycombinator.com/item?id=43698363
And this is the test data for the source of GNU types: https://cgit.git.savannah.gnu.org/cgit/config.git/tree/tests... See that this contains 1180 types, but of course that's also not definite.
> pub type intmax_t = i64;
> A lot of code has completely given up on keeping C in the loop and has started hardcoding the definitions of core types. After all, they’re clearly just part of the platform’s ABI! What are they going to do, change the size of intmax_t!? That’s obviously an ABI-breaking change!
There is a reason it is called intMAX_t! It does not HAVE a definite size, it is the MAXimal size of an integer on that platform. Yes, they are problems nowadays due to ossification, but they come exactly from people like that blog author. When you want your program to have a stable ABI, that doesn't change when your platform supports larger integer types, you just don't use intMAX_t!
> And even then you have the x64 int problem: it’s such a fundamental type, and has been that size for so long, that countless applications may have weird undetectable assumptions about it. This is why int is 32-bit on x64 even though it was “supposed” to be 64-bit: int was 32-bit for so long that it was completely hopeless to update software to the new size even though it was a whole new architecture and target triple!
That is called ossification. When you program C you are not supposed to care about the sizes. When your program does, your program is broken/non-portable. Yes, this limits the compilers, because they don't want programs to be broken. But is really the same as e.g. MS Windows catering to a specific program's bugs. This is not a design mistake of C:
> sometimes you make a mistake so bad that you just don’t get to undo it.
aw1621107
4 days ago
> I don't know why the choice to use the compiler to implement parsing a C header is framed as a bad thing.
Not sure I agree with this interpretation, though maybe I'm focusing on a different part of the article than you are. Where you are getting the negative sense from?
That being said, I don't think it's too hard to imagine why someone might be a bit hesitant to use a C/C++ compiler to parse C/C++ headers - for example, it can be a pretty big dependency to take on, may add friction for devs/users, and integration with your own tool may be awkward and/or an ongoing time sink especially if you're crossing an FFI boundary or if the API you're using isn't stable (as I believe it is the case for LLVM).
> There is a reason it is called intMAX_t! It does not HAVE a definite size, it is the MAXimal size of an integer on that platform.
I think this somewhat misses the point of the bit you quoted. In context, it's basically saying that grabbing "real" C type info for interop is so painful that people will hard-code "reasonable" assumptions instead.
> When you want your program to have a stable ABI, that doesn't change when your platform supports larger integer types, you just don't use intMAX_t!
My impression is that the problem is less intmax_t changing and more that intmax_t can change out of sync. Even if you assume every use of intmax_t in a public API corresponds to an intentional desire for the bit width to evolve over time, you can still run into nasty issues if you can't recompile everything at once (which is a pretty strong constraint on the C/C++ committees if history is any indication).