fuhsnn
7 days ago
The recent #def #enddef proposal[1] would eliminate the need for backslashes to define readable macros, making this pattern much more pleasant, finger crossed for its inclusion in C2Y!
[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3531.txt
cb321
7 days ago
While long-def's might be nice, you can even back in ANSI C 89 get rid of the backslash pattern (or need to cc -E and run through GNU indent/whatever) by "flipping the script" and defining whole files "parameterized" by their macro environment like https://github.com/c-blake/bst or https://github.com/glouw/ctl/
Add a namespacing macro and you have a whole generics system, unlike that in TFA.
So, it might add more value to have the C std add an `#include "file.c" name1=val1 name2=val2` preprocessor syntax where name1, name2 would be on a "stack" and be popped after processing the file. This would let you do types/functions/whatever "generic modules" with manual instantiation which kind of fits with C (manual management of memory, bounds checking, etc.) but preprocessor-assisted "macro scoping" for nested generics. Perhaps an idea to play with in your slimcc fork?
fuhsnn
6 days ago
> `#include "file.c" name1=val1 name2=val2`
That's an interesting idea! I think D or Zig's C header importer had similar syntax, I'm definitely gonna do it.
cb321
6 days ago
Glad you like it! Not sure what separator you would pick between the name(args)=expansion stuff. I could imagine some generic files/modules might have enough or long enough params that people might want to backslash line continue. So, maybe '@' or '`' { depending on if you want many or few pixels ;-) } ?
#include "file.c" _(_x)=myNamePrefix ## _x `\
KEY=charPtr VAL=int `\
....
The idea being that inside any generic module your private / protected names are all spelled _(_add)(..).By doing that kind of namespacing, you actually can write a generic module which allows client code manual instantiators a lot of control to select "verb noun" instead of "noun verb" kinds of schemes like "add_word" instead of "word_add" and potentially even change snake_case to camelCase with some _capwords.h file that does `#define _get Get` like moves, though of course name collisions can bite. That bst/ thing I linked to does not have full examples of all the optionality. E.g., to support my "stack popping" of macro defs, without that but just with ANSI C 89 you might do something like this instead to get "namespace nesting":
#ifndef CT_ENVIRON_H
#define CT_ENVIRON_H
/* This file establishes a macro environment suitable for instantiation of
any of the Map/Set/Seq/Pool or other sorts of generic collections. */
#ifndef _
/* set up a macro-chain using token pasting *inside* macro argument lists. */
#define _9(x) x /* an identity macro must terminate the chain. */
#define _8(x) _9(x)
#define _7(x) _8(x) /* This whole chain can really be as long as */
#define _6(x) _7(x) /* you want. At some extreme point (maybe */
#define _5(x) _6(x) /* dozens of levels) expanding _(_) will start */
#define _4(x) _5(x) /* to slow-down the Cpp phase. */
#define _3(x) _4(x) /* Also, definition order doesn't matter, but */
#define _2(x) _3(x) /* I like how top->bottom matches left->right */
#define _1(x) _2(x) /* in the prefixing-expansions. */
#define _0(x) _1(x)
#define _(x) _0(x) /* _(_) must start the expansion chain */
#endif
#ifndef CT_LNK
# define CT_LNK static
#endif
#endif /* CT_ENVIRON_H */
and then with a setup like that in place you can do: #define _8(x) _9(i_ ## x) /* some external client decides "i_" */
_(_foo) /* #include "I" -> i_foo at nesting-level 8 */
#define _6(x) _7(e_ ## x) /* impl of i_ decides "e_" */
_(_foo) /* #include "E" -> i_e_foo at level 6 */
#define _3(x) _4(c_ ## x) /* impl of e_ decides "c_" */
_(_foo) /* #include "C" -> i_e_c_foo at level 3 */
#define _0(x) _1(l_ ## x) /* impl of c_ decides "l_" */
_(_t)
_(_foo) /* #include "L" -> i_e_c_l_foo at level 0 */
#define _0(x) _1(x) /* c impl uses _(l_foo) to define _(bars) */
_(_foo) /* i_e_c_foo at nesting level 3 again */
#define _3(x) _4(x) /* e impl uses _(c_foo) to define _(bars) */
_(_foo) /* i_e_foo at nesting level 6 again */
#define _6(x) _7(x) /* i impl now uses _(e_foo) to define _(bars) */
_(_foo) /* i_foo at nesting level 8 again */
Yes, yes. All pretty hopelessly manual (as is C in so many aspects!). But that smarter macro def semantics across parameterized includes I mentioned above could go a long way towards a quality of life improvement "for client code" with good "library code" file organization. I doubt it will ever be easy enough to displace C++ much, though.Personally, I started doing this kind of thing in the mid-1990s as soon as I saw people shipping "code in headers" in C++ template libraries and open source taking off. These days I think of it as an example of how much you can achieve with very simple mechanisms and the trade-offs of automating instantiation at all. But people sure seem to like to "just refer" to instances of generic types.
glouwbug
7 days ago
I've been thinking of maybe doing CTL2 with this. Maybe if #def makes it in.
cb321
7 days ago
I think the #include extension could make vec_vec / vec_list / lst_str type nesting more natural/maybe more general, but maybe just my opinion. :-)
I guess ctags-type tools would need updating for the new possible definition location. Mostly someone needs to decide on a separation syntax for stuff like `name1(..)=expansion1 name2(..)=expansion2` for "in-line" cases. Compiler programs have had `cc -Dname(..)=expansion` or equivalents since the dawn of the language, but they actually get the OS/argv idea of separation from whatever CL args or Windows APIs or etc.
Anyway, might makes sense to first get experience with a slimcc/tinycc/gcc/clang cpp++ extension. ;-) Personally, these days I mostly just use Nim as a better C.
hyperbolablabla
7 days ago
I really don't think the backslashes are that annoying? Seems unnecessary to complicate the spec with stuff like this.
cb321
7 days ago
FWIW, https://www.cs.cornell.edu/andru/ Andrew Myers had some patch to gcc to do this back in the late 90s.
Anyway, as is so often the case, it's about the whole ecosystem not just of tooling but the ecosystem of assumptions about & around tooling.
As I mentioned in my other comment, if you want you can always cc -E and re-format the code somehow, although the main times you want to do that are for line-by-line stepping in debuggers or maybe for other cases of "lines as source coordinates" like line-by-line profilers.
Of course, a more elegant solution might be just having more "adjustable step size/source coordinates" like "single ';'-statement or maybe single sequence control point in debuggers than just "line orientation". This is, in fact, so natural an idea that it seems a virtual certainty some C debugger has an "expressional step/next", especially if written by a fan more of Lisp than assembly. Of course, at some point a library is just debugged/trusted, but if there are "user hooks" those can be buggy. If it's performance important, it may never be unwelcome to have better profile reports.
While addr2line has been a thing forever, I've never heard of an addr2expr - probably because "how would you label it?" So, pros & cons, but easy for debugger/profilers is one reason I think the parameterized file way is lower friction.
kreco
6 days ago
This Facebook repository also use a new "extension" to do a similar thing:
https://github.com/facebookresearch/CParser#multiline-macros
core-explorer
6 days ago
debugging information is more precise than line numbers, it usually conveys line and column in a source file.
Some debuggers make use of it when displaying the current program state, the major debuggers do not allow you to step into a specific sub-call on a line (e.g. skip function arguments and go straight to the outermost function call). This is purely a UI issue, they have enough information. I believe the nnd debugger has implemented selecting the call to step into.
Addr2line could be amended. I am working on my own debugger and I keep re-implementing existing command line tools as part of my testing strategy. A finer-grained addr2line sounds like a good exercise.
cb321
6 days ago
Our exact context here is not just column numbers, but also about backslash line continuations joined by the C preprocessor. That makes the #line directives emitted refer to columns within a (large) "virtual line assembled by the tooling", not an "actual source" coordinate.
So, a column number would not be very meaningful to a programmer (relative to some ';' or '{}' expressional label leveraging internal language syntax/bracketing which would definitely still be a bit to muck about with). As per my Lisp mention, it is really be a >1 dimensional idea, and there are various ways to flatten/marshal that parse tree. "next/over" and "step/into" are enough "incrementally/dynamically/interactively" to build up that 2d navigation, but also harder to work with "cumulatively" and with more complex than lisp grammars. Maybe most concretely, how "subexpression numbers" (in addr2x or other senses) are enumerated might still be a thing programmers need to "learn" from their "debugger UI".
Another option might be to "reverse preprocess it" or maintain forward-meta-data to go from the "virtual line column number" back to the "true source (line,column)".
I don't mean to discourage you, but just explain more what problem I meant to refer to by "how to label it" and highlight limitations of your new test. { But many are probably limited somehow! :-) }
kreco
6 days ago
The backslashes itself make the preprocessor way more complicated for no real advantage (apart when it's unavoidable like in macros).
For every single symbol you need to actually check if there is a splice (backslash + new line) in it. For single pass compiler, this contribute to a very slow lexing phase as this splice can appear anywhere in a C/C++ code.
jcelerier
6 days ago
I don't think this is optimizing for the right thing, I've sat in front of hundreds of gcc & clang compiler time traces and lexing is a minuscule percentage of the time spent in the compiler
kreco
6 days ago
My point is that it would make simpler for the lexer and for the human being.