PaulHoule
3 days ago
"Scripts" in Python, Java and other conventional programming languages (e.g. whatever it is you already use)
Not Bash, not Excel, not any special-purpose tool because the motto of those is "you can't get there from here". Maybe you can get 80% of the way there, which is really seductive, but that last 20% is like going to the moon. Specifically, real programming languages have the tools to format dates correctly with a few lines of code you can wrap into a function, fake programming languages don't. Mapping codes is straightforward, etc.
dataflowmapper
3 days ago
Yeah programming definitely offers most flexibility if you have that skillset. I'm particularly interested in your 'last 20% is like going to the moon' analogy for special-purpose tools or even Excel/Bash. Do you have any examples off the top of your head of the kinds of transformation or validation challenges that you find fall into that really difficult 20%, where only a 'real programming language' can effectively get the job done?
PaulHoule
3 days ago
For one thing a lot of tools like Excel do unwanted data transformations, such as importing from a CSV they try to guess whether text is meant to be a string or a number and sometimes guess wrong. You can spend a lot of time disabling this behavior or preventing it or fixing it up afterwards, but frequently bad data just plain corrupts the analysis or target system.
You can't really trust the time handling functions on any platform which is some of the reason why languages like Python, and Java, might have two or three libraries for working with dates and times in the standard library because people realized the old one was unfixable. Plenty of Javascript date handling libraries have told people "look, this is obsolete, it's time to move on" not just because that's the Javascript way, but because the libraries really were error prone.
In a real programming language it's straightforward to fix those problems, in a fake programming language it is difficult or impossible.
If you've got a strange index structure in the source or destination data, for instance, many tools fall down. For instance if you want to convert nested set trees
https://blog.uniauth.com/nested-set-model
to something more normal like an adjacency list (or vice versa) a lot of simple tools are going to fall down.
dataflowmapper
3 days ago
Gotcha, totally agree on those points. I think everyone's dealt with the Excel typing crap. My team uses Workato for some integrations and we use scripts any time we need math because of questionable precision so I see your take on the unreliable functions part.
chaos_emergent
3 days ago
for the longest time I envisioned some sort of configuration specification that could retrieve URLs, transform and map data, handle complex conditional flows...and then I realized that I wanted a Normal Programming Language for Commerce and started asking o3 to write me Python scripts.
aaronbrethorst
3 days ago
Hell, for me, would be what you described and implemented in Yaml.
dlachausse
3 days ago
It’s not sexy, but Perl is the purpose built tool for this job.
PaulHoule
3 days ago
Got a lot of bustedness though. I was writing cgi-bin scripts for Perl and I remember the urldecode in Perl being busted fundamentally, like it would rewrite '%20'-> SP but not rewrite '+' -> SP so I had to write my own urldecode.
My impression is that people use Python to do most of the things that we used to do with Perl.
Circa 2001 I wrote a cookie-based authentication system for the web which had an "authentication module" of roughly 100 lines that I wound up rewriting in at least ten different languages such as PHP, Java, Cold Fusion, Tango, etc. The OO Perl version was the shortest and my favorite.