Translate Fortran to C++ with AI and RAG

43 pointsposted 10 months ago
by LosAlamosNerd

27 Comments

jedimastert

10 months ago

I'm trying to think of a reason this couldn't be done more directly with a pretty run-of-the-mill transpiler. Like I understand if this is a technical demo and there is a LOT of Fortran code, but...?

I've actually had to do this with a couple of different Fortran projects when I was in college, I translated them to C for various reasons.

Maybe it's because it was specifically code written by scientists (i.e somewhat brute force and very straightforward) but there really wasn't very many features that I can recall that didn't have a direct C counterpart, other than column major ordering and arrays staring at 1.

Was I just blissfully unaware?

AndrewGaspar

10 months ago

What you want isn't really "output C++ code that is pedantically equivalent to this Fortran code but with the array indexing fixed up", it's usually more like "translate this Fortran module into something that I can offload to a GPU using CUDA/ROCm/etc. with the same high level semantics, but GPU-friendly low level optimizations", and the exact composition of those low level bits probably don't look exactly like a loop-by-loop translation.

IshKebab

10 months ago

Yeah I've used FORTRAN to C transpilers before too and they worked fine. There were some downsides though like it has to add and subtract 1 everywhere to deal with FORTRAN's 1-based indexing.

In theory AI could do a more idiomatic translation, but I think I would still prefer the janky but correct translation over the looks nice but probably subtly buggy AI one.

SanjayMehta

10 months ago

I don’t know what the transpiled code would look like vs that rendered by an LLM, but maybe the hope is that latter will be more readable?

nevi-me

10 months ago

Microsoft demoed a version of their GraphRAG that translated C code to (I believe) mostly idiomatic Rust, and it ran without errors.

I tried to find reference to how they did it, does anyone know?

It sounds like this approach of translating old code could help speed up teams that are looking at rewrites. I also have some old code that's in Kotlin that I'd like to move to something else. I had a bad NullPointerException take down my application after missing a breaking change in a Kotlin update.

reve893

10 months ago

In my experience such methods work really well for small/simple code snippets. Attempting large code snippets is not functional.

vrighter

10 months ago

I do not believe it ran without errors on all cases.

musicale

10 months ago

Good idea. I'd much rather write

   do concurrent (i = 1:n) 
     y(i) = y(i) + a*x(i)
   enddo
and then let the a compiler translate it into

    std::transform(par, x, x+n, y, y,
      [=](float x, float y){ return y + a*x; }
    );
if C++ is required for some reason.

WalterBright

10 months ago

A member of our community accidentally discovered that the D compiler could translate C code to D code.

D has a module system, where you import a module and it pulls the global declarations out of it. To speed up this process, there's a compiler switch to output just the global declarations to another file, a .di file, that functions much like a .h file in C.

Then there came along ImportC, where a C lexer/parser was welded onto the D compiler logic.

aaaand it wasn't long before the switch was thrown to generate a .di file, and voila, the C code got translated to D!

This resulted in making it dramatically easier to use existing C headers and source files with D.

jll29

10 months ago

As a slight tangent, a re-write in another language is also an opportunity for the human engineer to re-design parts of the software that was clunky before or so that in the new target language idioms can be used.

Using automatic tools - whether AI-based or transpilers - leaves that opportunity unused, and both approaches are likely to create some additional technical debt (errors in translation, odd, non-idiomatic ways of doing things introduced by the automatism etc.).

Surac

10 months ago

there is no place for AI or C++ in this game. Just use a Fortran to C Transpiler . But i get it anything AI sounds modern and C++ because of reasons

almostgotcaught

10 months ago

What is the point of this? Fortran is both faster than cpp and easier to write than cpp. It's also by no means a dead or dying or whatever language. Smells like literally "your scientists were so busy they forgot to ask why".

jandrewrogers

10 months ago

Seems pretty obvious to me, and I’ve written my fair share of both Fortran and C++. I think it is mostly that very few people know Fortran anymore and even fewer people want to maintain it. A vast number of people in 2025 will happily work in C++ and are skilled at it.

Fortran also hasn’t been faster than C++ for a very long time. This was demonstrable even back when I worked in HPC, and Fortran can be quite a bit worse for some useful types of performance engineering. The only reason we continued to use it was that a lot of legacy numerics code was written in it. Almost all new code was written in C++ because it was easier to maintain. I actually worked in Fortran before I worked in HPC, it was already dying in HPC by the time I got there. Nothing has changed in the interim. If anything, C++ is a much stronger language today than it was back then.

jabl

10 months ago

Some people at LANL seem to be on a holy crusade to replace Fortran with C++. They occasionally produce stuff like papers saying Fortran is dying and whatever. Perhaps it makes sense for their in-house applications and libraries, but one shouldn't read too much into it outside their own bubble.

jeffbee

10 months ago

I wonder if they feel that the toolchains are just rotting.

mkoubaa

10 months ago

This. If someone can't correctly articulate the advantages of Fortran they shouldn't be migrating away from it. This is not to say that migrations should never happen.

pankajdoharey

10 months ago

LLM as translators for Cobol code to Java or Go should be attempted. And Shut down the IBM mainframe rent seek business for good permanently.

jabl

10 months ago

The soon to be GCC 15 release will contain a COBOL frontend. Also other non mainframe compilers have existed for a long time, both proprietary and FOSS.

Thus, availability of a compiler is but a small piece of the puzzle. The real problem is the spider web of dependencies on the mainframe environment, as the enterprises business processes have been intertwined into the mainframe system over decades.

creatonez

10 months ago

No, not for the foreseeable future. In fact, this is the absolute hardest possible code translation task you can give an LLM.

COBOL varies greatly, the dialect depends on the mainframe. Chatbots will get quite confused about this. AI training data doesn't have much true COBOL, the internet is polluted with GnuCOBOL which is a mismash of a bunch of different dialects, minus all the things that make a mainframe a mainframe. So it will assume the COBOL code is more modern than it is. In terms of generating COBOL (e.g. for adding some debugging code to an existing system to analyze its behavior) it won't be able to stay within the 80 column limit due to tokenization, it will just be riddled with syntax errors.

Data matters, and mainframes have a rather specific way they store and retrieve data. Just operating the mainframe to get the data out of an old system and into a new database in a workable & well-architected format will be its own chore.

Finally, the reason these systems haven't been ported is because requirements for how the system needs to work are tight. The COBOL holdouts are exclusively financial, government, and healthcare -- no one else is stuck on old mainframes for any other reason. The new system to replace it needs to exactly match the behavior of the old system, the developer has to know how to figure out the exact confines of the laws and regulations or they are not qualified to do the task of porting it. All an LLM will do is hallucinate a new set of requirements and ignore the old ones. And aside from just knowing the requirements on paper, you'd need to spend a good chunk of time just checking what the existing system is even doing, because there will be plenty of surprises in such an old system.

pjmlp

10 months ago

There are COBOL compilers that target JVM and .NET for as long as these technologies exist.

There are also modern compilers to IBM mainframes, including Go, C++, Java, PHP,..

Also outside DevOps and CNCF application space very few people bother with Go, specially not the kind of customers that buy IBM mainframes.

KaiserPro

10 months ago

Apart from cobol is only part of the reason for running on a mainframe. The other part is the orchestration and "resilience" of the mainframe platform

You can run cobol on x86, there are at least two compilers.

hulitu

10 months ago

> Translate Fortran to C++ with AI and RAG

f2c ? But yeah, 1 level of abstraction sucks. We need around 10 to be satisfied.

cpgxiii

10 months ago

f2c produces pretty sketchy C code. It's very easy for reasonable thread-safe Fortran code to go through f2c and end up as C code with globals and other thread-unsafe constructs. You have to be prepared to completely rewrite the generated C code to make it usable, possibly more unpleasantly than just doing the port by hand.

user

10 months ago

[deleted]