Lesser known tricks, quirks and features of C

93 pointsposted 7 hours ago
by rramadass

26 Comments

fuhsnn

2 hours ago

My recent favorite is glibc's hack to implement _Static_assert under C99: https://codebrowser.dev/glibc/glibc/misc/sys/cdefs.h.html#56...

It uses the constant expression to create a bitfield of size -1 when failed, and leaves the compiler to error on that as the intended assertion. The actual statement is an extern pointer to a function returning a pointer to an array which has sizeof the aforementioned bitfield struct as its size.

Another one encountered in Toybox is (0 || "foo") being a const expression that evaluates to 1. Apparently the string literal must have been soundly created in data section, so its pointer address is safely assumed to be non-zero.

wolfspaw

an hour ago

Really liked the trick of defining the struct in the return part of the function.

Array pointers: Array to pointer decay is extremely annoying, if it was implemented as Array to "slice" decay it would be great.

Static array indices in function parameter declarations: awesome, a shame that C++ (and Tiny C) do not support it >/

flexible array member: extremely useful, and now there are good compiler flags for ensuring correct flexible array member usage

X-Macro: nice, no-overhead enum to string name. Didn't know the trick

Combining default, named and positional arguments: Named-arguments/default-arg, C version xD. It would be cool if it was added to C language as a native feature, instead of having to do the struct hiding macro.

Comma operator: really useful, specially in macros

Digraphs, trigraphs and alternative tokens: di/tri/graphs rarely useful, alternatives synonims of iso646.h are awesome, love using and/or instead of &&/||

Designated initializer: super awesome, could not use if you wanted C++ portability. Now C++ supports some part of it.

Compound literals: fantastic, but in C++ it will explode due to stack deallocation in the same line. C++ should fix this and allow the C idiom >/

Bit fields: nice for more control of structs layout

constant string concat: "MultiLine" String, C version xD

Ad hoc struct declaration in the return type of a function: didn't know this trick, "multi value" return, C version xD

Cosmopolitan-libc: incredible project. Already knew of it, its awesome to offer a binary that runs in all S.Os at the same time.

Evaluate sizeof at compile time by causing duplicate case error: ha, nice trick for debugging the size of anything.

fuhsnn

an hour ago

>Static array indices in function parameter declarations: awesome, a shame that C++ (and Tiny C) do not support it >/

The first array size is actually always decayed to a pointer, supporting it in a compiler without analysis passes like TCC is just a matter of skipping the "static" token and the size.

saagarjha

3 hours ago

Mentioning %n without explaining that it is overwhelmingly used for exploits is a little reckless IMO.

_kst_

an hour ago

Background: A %n format specifier in a printf call stores the number of characters written so far into a specified variable. For example:

    #include <stdio.h>
    int main(void) {
        int count;
        printf("%s%n\n", "hello, world", &count);
        printf("count = %d\n", count);
    }
The output is:

    hello, world
    count = 12
%n can be exploited to write data to an arbitrary memory location, but only if the format string is something other than a string literal.

%n can be exploited, but it's entirely possible to use it safely.

greiskul

2 hours ago

I'm curious about this, didn't know about %n before. What are the common pitfalls and exploits using this enables?

mananaysiempre

2 hours ago

You would expect a printf call with a user-controlled format string to be, at worst, an arbitrary read. Thanks to %n, it can be a write as well.

lights0123

2 hours ago

If the user can control the formatting string, they can write to pointers stored on the stack. It's important to use printf("%s", str) instead of printf(str).

rep_lodsb

an hour ago

Useless use of printf; what's wrong with "puts(str)"?

shawn_w

an hour ago

puts() adds a newline at the end. gcc will happily turn printf("%s\n", str) into puts(str), though.

I've never tested to see if printf("%s", str) becomes the equivalent fputs(str, stdout)

coreyp_1

6 hours ago

That's a nice list!

I've been digging into cross-platform (Windows and Linux) C for a while, and it has been fascinating. On top of that, I've been writing a JIT-ted scripting (templating) language, and the ABI differences (not just fastcall vs stdcall vs cdecl) are often not easy to find documentation about.

I've decided that if I ever get to teach a University class on C again, I wanted to cover some of these things that I feel are often left out, and this list is a helpful reference! Thanks!

jonathrg

2 hours ago

Multi character constants is one of the many things in C that would be nice to use if the language would just choose some well-defined behaviour for it. It doesn't really matter which.

mananaysiempre

2 hours ago

Mainstream compilers agree on multicharacter literals being big endian; that is, 'AB' is usually 'A' << CHAR_BIT | 'B'. The exception is MSVC, which also works like that as long as you don't use character escapes, but if you do it emits some sort of illogical, undocumented mess that looks like an ancient implementation bug fossilized into a compatibility constraint.

ranger_danger

3 hours ago

> quirks and features

Someone is a fan of Doug DeMuro.

golergka

2 hours ago

    switch (n % 2) {
        case 0:
            do {
                ++i;
        case 1:
                ++i;
            } while (--n > 0);

    }
Someone is really ought to record a "WAT" video about C.

mananaysiempre

2 hours ago

The switch statement in C is not a very limited pattern match. The switch statement in C is a very ergonomic jump table. Do not think ML’s case-of with only integer literals for patterns; think FORTRAN’s computed GO TO with better syntax. And it will cease to be a WAT. (For a glimpse of the culture before pattern matching was in programmers’ collective consciousness, try the series on designing a CASE statement for Forth that ran for several issues of Forth Dimensions.)

russellbeattie

2 hours ago

I don't think there's any confusion of how it works, it's the deep horror in discovering that it's possible in the first place, and a morbid curiosity of the chaos it could cause if abused.

mananaysiempre

an hour ago

At least for me, the feelings you describe are characteristic of a footgun, not a WAT. A WAT is rather a desperate bewilderment as to who could ever design the thing that way and why, and for switch statements computed gotos are the answer to that question.

As for the footgun side, I mean, it could be one in theory, sure. But I don’t think I’ve ever seen it actually fired. And I can’t really appreciate the Javaesque “abuse” thinking—it is to some extent the job of the language designer to prevent the programmer from accidentally doing something bad, but I don’t see how it is their job to prevent a programmer from deliberately doing strange things, as long as the result looks appropriately strange as well.

(There are reasons to dislike C’s switch statement, I just don’t think the potential for “abuse” is one.)

tom_

an hour ago

This sort of thing is pretty handy sometimes. Don't forget you can have code (e.g., start of the loop) before any of the cases too!

PhilipRoman

2 hours ago

Just think of the "case" statements like any other label, despite the misleading indentation. Then it becomes perfectly natural to jump in the middle of a loop.

agumonkey

an hour ago

I wonder if there's any other instance (in programming or else) of intersecting grammar constructs being accepted.

o11c

an hour ago

Bah, those are all well-known.

What value does the following program return?

    int main()
    {
        int *p = 0;

    loop:
        if (p)
            return *p;

        int v = 1;
        p = &v;
        v = 2;
        goto loop;
        return 3;
    }
Also, rather than doing `sizeof` via one error at a time, it's better to just emit them to a char array {'0' + sz/10, '0' + sz%10, '\0'}. Generalizing this to signed numbers of arbitrary size is left as an exercise for the reader.

_kst_

17 minutes ago

It returns 2.

The only reason that might be surprising is that the "return *p;" statement refers to the value of an object at a point (textually) before its definition. But the lifetime of the object named "v" begins on entry to the innermost compound statement enclosing its definition -- in this case the body of "main".

Space for "v" is allocated on entry to "main". It's initialized to 1 when its definition is reached. The "return *p;" statement appears before the definition of "v" in the program source, but is executed after its definition was reached at run time.

It's important to remember that scope and lifetime are two different things. The scope of an identifier is the region of program text in which the identifier is visible; for "v" it extends from the definition to the closing "}". The lifetime of an object is the time span during execution in which it exists; for "v" it extends from the time when execution reaches the opening "{" to the time when execution reaches the closing "}". Formally, storage for "n" is allocated at the beginning of its lifetime and deallocated at the end of its lifetime. Compilers can and do optimize allocation and deallocation, as long as the visible behavior is consistent.

Aside: If "v" were a VLA (variable length array, introduced in C99, made optional in C11) its lifetime would begin when execution reaches its definition.