hackernews client

What about K?

206 pointsposted 5 months ago

(xpqz.github.io)

142 Comments

airstrike

5 months ago

Side note, but some people have such an evident talent for writing that it makes reading about _any_ topic a worthwhile experience. This author, Stefan Kruger, seems to be one of them.

I almost wish this link was to a blog rather than to a book about K, for which I only have a perennial curiosity.

Here's to hoping they consider writing said blog. I notice they have one but it only has 3 posts, all of which are about past Advent of Code puzzles.

skruger

5 months ago

That’s nice of you to say so.

bear8642

5 months ago

The about section has links to other things he's written/presented.

jyscao

5 months ago

>for which I only have a perennial curiosity.

Guessing you meant to say "peripheral curiosity" here? Perennial would mean you have a long-lasting and/or continued interest/curiosity.

airstrike

5 months ago

I was wondering if anyone would find that unusual! It's long lasting but never rises to anything more than curiosity, so I wanted to juxtapose the two to convey the mixed feelings I have for it.

pie_flavor

5 months ago

> Readability is a property of the reader, not the language.

Uiua[0]'s stack model is much more annoying to work with, but I really appreciate its embrace of unicode glyphs. Every other derivative of APL throws those out at the first opportunity, but when you have a lot of glyphs, you stop being so tempted to make different arities cause the same glyph to mean wildly different things, when the arity is not actually written down explicitly and depends on whether the next thing to the left is a parameter or another function. Once you can See The Matrix, this is the chief thing that still does make K and friends objectively unreadable in a way they don't have to be.

[0]: https://uiua.org

xg15

5 months ago

I appreciate the idea (and Uiua's examples indeed look beautiful, almost like visual programming) but I'd at least like some obvious way how to pronounce the code.

RodgerTheGreat

5 months ago

All of the symbols in Uiua have short english words as alternate names, and the online editor allows you to type them by alias.

K has "traditional names" for all the primitive operators which appear in reference cards and which are typically used when discussing code aloud with other K programmers. Q and Lil, which are both K descendants, outright replace some symbols with those named keywords. Named keywords can make the primitives superficially easier to remember, at the cost of making idiomatic patterns in the language less visually apparent.

xg15

5 months ago

Ah, that makes a lot more sense. Thanks!

gweinberg

5 months ago

Original quote is just mind-bogglingly stupid. If If I can't read any programming language, that's a defect on my part. If I can read pretty much any language except k and brainfuck, that's a sign k is particularly difficult to read.

skruger

5 months ago

How’s your Korean?

rrgok

5 months ago

Korean is not a programming language

Larrikin

5 months ago

It's also a terrible comparison since Korea's writing system was created specifically to be easy to learn, write, and read.

BoiledCabbage

5 months ago

> Original quote is just mind-bogglingly stupid.

To be frank, your quote is mind-bogglingly stupid. How easy do you think Java is to read to a native Greek speaker with no English language knowledge? Would the Java standard library be as easy for you to read if it were written in Greek? Would java still be an easy to read language if all the keywords and library were in Greek?

If your only definition of a good programming language is one written in your native language and a PL becomes bad if written in a different language, then your criteria is terrible/useless. And right now that's your criteria.

russellbeattie

5 months ago

> "there is no single definitive k, but instead a sequence of slightly incompatible versions. If you decide to stick with k, you’ll see mentions of k4, k5 etc."

I don't know about the qualities of k itself, but I think the idea of having a common practice for experimental programming languages to be grouped under a single name like "E" with a number is quite attractive.

There are lots of students, hobbyists, researchers, professional devs and companies who are developing their own working programming language. There are a million of them, all with their own names. 99.9% of them are ignored, or criticized unfairly by others expecting fully fleshed out features.

I can imagine a GitHub repo where you can register a new language "En" (with n being a number) rather than it living in obscurity on a random website. Then others can jump in and experiment with the language and give it feedback, fork it, etc.

This isn't just for toy languages, but for big organizations like Google. Instead of naming a not-fully-baked C++ successor as "Carbon" and getting flak for it not being ready for real world code yet, they could simply call it "E321" and the status of the language would be self-explanatory.

Then if one of the E languages gains enough traction, it could "graduate" to its own named language.

I also like the cred that an "official" E language could get when a dev talks about it to others. Everyone would immediately know it was experimental and where to see the code.

pjmlp

5 months ago

What seems to be a pity about most array languages is that in theory, they would be ideal DSL languages for SIMD and MIMD code exploration, but as far as I understand from ArrayCast guests, most are still interpreters at heart focusing on plain CPU execution.

dzaima

5 months ago

The big problem with using array languages for lower-level SIMD stuff is that that generally requires some amount of typedness, but tacking on types on an array language without ending up with having types be the majority of the syntax and code (or taking up a ton of mental capacity if utilizing very heavy type inference) would be rather non-trivial. And the operations you want for lower-level ops are quite different from the higher-level general-purpose ones too. (and, of course, some interpreters do make good use of SIMD and/or multithreading)

That said, some form of array language more suited for stuff like that is a somewhat common question; maybe one day someone will figure it out.

Vanessa McHale is doing some interesting work on a typed compilable array language, Apple[0].

[0]: https://github.com/vmchale/apple/?tab=readme-ov-file#apple-a...

pjmlp

5 months ago

Thanks for the hint.

While we have CUDA being polyglot, it is still pretty much C and C++, or shader languages, hence why I keep thinking why not an array language that is also a kind of shader language DSL.

dzaima

5 months ago

Oh, another interesting thing is http://beyondloom.com/tools/specialk.html - a simple k DSL for shaders. Though it still follows the shader language paradigm of running the function for each pixel separately, rather than the array language paradigm of taking & returning a list of things to process.

oxavier

5 months ago

Not exactly the approach your describing, but Futhark[0] offers an alternative to CPU execution, it compiles to CUDA or OpenCL to run on GPU (or multi-threaded CPU).

[0] https://futhark-lang.org/

pjmlp

5 months ago

That one I am aware of, as language geek, I am always curious to GPU alternatives that aren't always the same old story only C and C++ get to party, which is actually an advantage of CUDA's polyglot approach.

Thanks for the heads up nonetheless.

Pompidou

5 months ago

Maybe codfns for apl will solve this ? That's what I understood.. but maybe I'm wrong.

fc417fc802

5 months ago

I am wondering about this as well. What is the gap between what you're envisioning and codfns?

kvdveer

5 months ago

The linked document only contains a warning about how versioning is weird, and a description of the syntax. No examples beyond trivial one-liners.

What problem is K trying to solve? What does a K program look like?

FjordWarden

5 months ago

I've only played around with k and APL in my spare time so I can't speak to real world problems. It is a ridiculously powerful query language, where in SQL you have only started writing `SELECT ...`, in k you are already done. But you need to have very good tacit knowledge of algorithms and the weird syntax to be productive, like oh I need to calculate an integral-image of this time-series, but that just a pre-scan over addition, boom and you are done. The theory of array programming with a focus in combinators is also an interesting perspective on functional programming. IMHO not something you should write full program in, but that hasn't stopped some from trying.

bee_rider

5 months ago

This was a helpful comment. After the article, the question that popped into my head was… so ok should I try and compare this to like BLAS or something like Jax?

But, this sort of language is more about writing and reading from the disk efficiently, right? I guess SIMD type optimizations would be less of a thing.

FjordWarden

5 months ago

I think that array languages have historically used memory mapped files for IO, and treat them like a big data frame, but other versions also support streaming IO. Its up to the implementers of the runtime to use SIMD instructions if they deem this optimal but not something you would use yourself.

rak1507

5 months ago

Personally I think the best comparison would be Python+Pandas/polars+... or R+tidyverse+..., the key thing being there's less need for the "..." in a language with good table manipulation etc built in.

Pet_Ant

5 months ago

I feel like measuring things in characters is not meaningful, but only in tokens. Replacing "SELECT" with "SEL" would not improve SQL in the slightest.

Thorrez

5 months ago

A one-liner in k tends to be equivalent to a much larger program in another language.

Here's a program in k. I'm not sure exactly what it does. I think it might be a json encoder/decoder:

https://github.com/KxSystems/kdb/blob/master/e/json.k

saghm

5 months ago

It says a lot that the name of the file for is more informative about what the code does than the entirety of the file itself. "Readability is a property of the reader" indeed, but also the writer...

bregma

5 months ago

Dialup modems on a bad connection used to generate more readable code.

cubefox

5 months ago

It appears you accidentally linked to log where someone fell on his keyboard.

andai

5 months ago

I think Whitney's greatest achievement isn't even any of his languages—though they are very impressive—but that he convinced banks to pay him millions of dollars to write IOCCC style code!

poulpy123

5 months ago

The problem solved by K is the long-term employment of people writing K. You can't be fired if you're the only one understanding more or less the codebase

dboreham

5 months ago

This is true about more software development than you realize.

bear8642

5 months ago

K is a fast vector language, used (primarily) for time series data analysis.

>What does a K program look like?

You might want to check out https://news.ycombinator.com/item?id=40335921

beagle3 and geocar both have various comments you might want to search for.

mananaysiempre

5 months ago

> a fast vector language

With an Oracle-style DeWitt clause[1] prohibiting public benchmarks.

[1] https://mlochbaum.github.io/BQN/implementation/kclaims.html

rustc

5 months ago

Shakti (the latest K implementation by the author of K) claims [1] to load a 50gb csv in 1.6 seconds which according to them takes 265 seconds with Polars. Has anyone independently verified these claims? Is Polars really leaving 2 orders of magnitude performance on the table?

[1]: https://shakti.com/ -> Compare -> h2o.k

orlp

5 months ago

Disclaimer: I work for Polars inc.

As a sanity check I just cloned https://github.com/h2oai/db-benchmark, ran the data generation script and ran on a 64 core AMD EPYC (AWS c7a.16xlarge):

    import polars as pl
    lf = pl.scan_csv("G1_1e9_1e2_0_0.csv")
    print(lf.select(pl.col.v1.sum()).collect())

The above script ran in 7.58 seconds.

If I change the collect() to collect(new_streaming=True) to use the new streaming engine I've been working on, it runs in 6.90 seconds.

I can't realistically time the full "read CSV to memory" with this 50 GB file on this machine as we start swapping (this machine has 128GiB memory) and/or evicting data from disk cache (this machine has a slow EC2 SSD attached to it), so we do have a blow-up of memory usage (which could be as simple as loading small integers into an 8-byte Uint64 column). I think it's likely that on K's machine the "read full CSV to memory" approach also started swapping, giving the large runtime. However, in Polars you'd typically write your query using LazyFrames, which means we don't actually have to load the full CSV into memory.

EDIT: running on a m7a.16xlarge with twice the memory (256GiB) once the CSV file is in disk cache Polars can parse the full CSV file into an in-memory dataframe in 7.68 seconds.

K's claim that it parses the full 50GB CSV in 1.6 seconds if true is very impressive regardless.

mananaysiempre

5 months ago

Honestly 7 seconds even just to parse the CSV is already pretty impressive, 7GB/s would be simdjson speeds if you did it on a single core. Do you have a single-threaded parser with really well-tuned SIMD, or a speculative parallel one, or ..?

orlp

5 months ago

We have a single-threaded chunker that scans serially over the file. This chunker exclusively finds unquoted newlines (using SIMD) to find clean parallelization boundaries, it doesn't do any further parsing. Those parallelization boundaries are then used to feed worker threads chunks of data to properly parse into our in-memory representation (which mostly follows Arrow).

LegionMammal978

5 months ago

Would you know how much of the total runtime is devoted to the initial chunking process? Amdahl's law would prefer an entirely speculative approach in the limit, but I could imagine that the 2x overhead might not be worth it for reasonable file sizes and core counts.

(But even then, 1.6 s would be quite a feat. It makes me wonder if the K implementation is partially lazy, as you say typical Polars usage is.)

orlp

5 months ago

It seems from a profile that on the eager engine the serial scanner is able to feed ~32 threads worth of decoding: https://share.firefox.dev/4hS1eJa.

It might be worth speculating, or at least optimizing the serial chunker more. You could theoretically start a second serial chunker from the end working backwards but that would not be wise with our ordered streams, as the decoded data would have to be buffered for a long time.

Similarly on the new streaming engine, each thread is active ~half of the time, except the thread running the chunking task: https://share.firefox.dev/3WQV9og.

Note that in a lot of realistic workloads on the streaming engine compute can happen in between decodes, completely hiding the bottleneck. Also all of the above is with the file being completely in file cache, if fed from a slow SSD it's not a bottleneck whatsoever.

mlochbaum

5 months ago

Seems easy enough to use a parallel scan if you're willing to accept a little work inefficiency, right? Assign each scanner thread a block, first each one counts/xors how many quotes are in its block, exclusive scan on those (last thread's result is unused), and you have the quoting state at the start of each block. And hopefully that block's still in the core's cache.

Or since newlines in strings should be rare, maybe it works to save the index of every newline and tag it with the parity of preceding quotes in the block. Then you get the true parity once each thread's finished its block and filter with that, which is faster than going back over the block unless there were tons of newlines.

orlp

5 months ago

Yes, I did already propose (at the office) a parity-agnostic chunker (we only need the number of lines + a splitpoint from the chunker) that can do parallel work and only needs a small moment of synchronization to find out which of the two parities it is to lock in a final answer. There would still be a global serial dependency, but on blocks rather than on bytes.

But we only have a finite amount of time and tons and tons of work, so no one has gotten around to it yet. At least now we know that it might be worthwhile for >= ~32 core machines. PRs welcome :)

mlochbaum

5 months ago

All right, just threw me off a little that you'd consider speculating or backwards decoding as I wouldn't expect them to be easier, or significantly faster (or maybe you consider parity-independence to be speculation? I can see it).

orlp

5 months ago

Yes, I meant parity-independence with speculation. Essentially you assume either you are or are not within a string at the start and do your computation based on that assumption, then throw away the result with the unsound assumption. Both assumptions can share most of their computation I believe, so I can understand one might see it from the other perspective where you'd start with calling it parity-independence rather than speculation with shared computation.

dzaima

5 months ago

There might also be the option of just optimistically assuming that, for points in a file with a sequence of like >4K bytes of proper newlines with proper comma counts in each, that here probably isn't in the middle of a multiline string, and parsing it as such (of course with proper fallback if this turns out false; but you'll at least know that this whole run is in the middle of a multiline string).

Also, if you encounter a double-quote character anywhere with a comma on one side and neither a newline, double-quote nor comma on the other, you immediately know 100% whether it starts or ends a string.

bear8642

5 months ago

> [1]: https://shakti.com/ -> Compare -> h2o.k

You can link to the subsections: https://shakti.com/compare/h2o.k

vessenes

5 months ago

Some snark in here, I'll try and give it a fair shake. Whitney's site mentions '300 spartans' as the rough number of people using k, although it's probably more than that.

Two reasons k folks like k: first, if you believe that programmer working memory, as in the number of chars or lines of code you personally can hold in your head is limited, then it might make sense to be as terse as possible -- this will significantly increase the range of things you can reason about.

Second, if such a language were to focus more on array and vector-level manipulation, then for certain sorts of math tasks, you might be pretty close to grad student nirvana -- programming looks like using a chalkboard to work out a strategy for some processing, and then straightforwardly translating this strategy without mucking around with all the 100s of lines of weird shit say python or java make you do to process something in bulk and in parallel.

On top of this, whitney is a mad genius, and his k interpreters tend to be SCREAMING fast, and, like a couple of hundred kilobytes compiled. Over time the language has built connections to large-scale data processing jobs (as in, you run a microsend-or-shorter-timeframe strategy based on realtime depth data from 500 different stocks, say), and it has benefitted from the path dependence you get there.

Anyway back to the top - it exists as both a rallying cry for and a great tool for a certain sort of engineer that wants to make millions of dollars and refer to him/herself as a "Spartan" of coders.

reedf1

5 months ago

K solves the problem of bank account for two groups of people, kX Systems and quants.

sz4kerto

5 months ago

Absolutely not being sarcastic: one problem it solves is that it is very hard to read as a beginner, so it can be intimidating (although it becomes much easier to read a bit later). This, coupled with the general arrogance of k/q practitioners (again, not really saying this in a negative way) and that k, kdb, etc. deliberately doesn't give you guardrails makes people who write k/q seem a bit 'mythical' and make them feel very clever.

So I think k, q and kdb are fun to work with, but one of the major components of its success is that it allowed a community (in finance) to evolve that can earn 50-150% more than their peer groups who do the same work in Java or C++. 10 years ago a kx course cost $1500 per person per day.

pjmlp

5 months ago

To note that those are typical prices for enterprise level certifications, including some products that some Java or C++ devs might need to interact with, when working on those kind of environments.

cppandjava

5 months ago

Hmm. I work in finance writing C++ and Java and I doubt other people in finance make 50-150% more than me because they know `q`.

gitonthescene

5 months ago

I don't know. If you're writing Java you may not be working on the same types of problems.

user

5 months ago

[deleted]

BoiledCabbage

5 months ago

It's the first page of a 5 page post/book. Make sure to check out the other 4 pages linked at the footer.

swiftcoder

5 months ago

This is kind of the problem with every introductory text to an APL-family language.

I get the idea that one either already knows one needs an array programming language, or doesn't grok why anyone would need one

skruger

5 months ago

Yeah—true. I wrote it as “the missing manual” for ngn/k, enough to get someone over the initial hump. It’s not a “Mastering k” tome.

sebg

5 months ago

A companion guide that I always recommend if interested in K is: Q for mortals, found here - https://code.kx.com/q4m3/

Note, from wikipedia: Q serves as the query language for kdb+, a disk based and in-memory, column-based database. Kdb+ is based on the language k, a terse variant of the language APL. Q is a thin wrapper around k, providing a more readable, English-like interface.

mwexler

5 months ago

Pulled from above:

  Coding Style The q gods have no need for explanatory error messages or comments since their q code is perfect and self-documenting. Even experienced mortals spend hours poring over cryptic q error messages such as the ones above. Moreover, many mortals eschew comments in misanthropic coding macho. Don’t.

A more enjoyable read than the parent post.

nialv7

5 months ago

There's also Nial: https://github.com/danlm/qnial7 which is (pardon the oversimplification) APL but with words instead of symbols.

steveBK123

5 months ago

A set of links with good examples of common problems solved in Q

https://code.kx.com/phrases/wikipage/

https://code.kx.com/q/kb/programming-idioms/

https://code.kx.com/phrases/

poulpy123

5 months ago

I'm somewhat convinced that there is a middle ground between corporate java and languages like K

jamal-kumar

5 months ago

I always thought it sounded super cool but it just doesn't exist in the problem spaces I work in. Like kdb+ was specifically designed to be run on bare metal without a full OS in the way of things going fast, and in quant environments where you're trying to shave off nanoseconds on the computations because your company's gone and invested in a dedicated fibre line to trading servers.

eudhxhdhsb32

5 months ago

That's actually not true at all. No one who cares about nanoseconds is using kdb+ for a production trading system.

It's primarily used for trading research and surveillance, not live trading. And I've never heard of anyone running it without an OS.

bear8642

5 months ago

> And I've never heard of anyone running it without an OS.

kOS is in development though current status is unknown.

(https://gist.github.com/chrispsn/da00835bb122c42f429a084df83...)

7thaccount

5 months ago

I think that got abandoned ages ago.

jamal-kumar

5 months ago

""Arthur succeeded in getting it to run, but eventually got drowned in device drivers."

heh

WorkerBee28474

5 months ago

> No one who cares about nanoseconds is using kdb+ for a production trading system.

For those curious, what they're actually using is FPGAs and custom silicon.

jamal-kumar

5 months ago

Super interesting thanks for the tidbit

tempodox

5 months ago

IDK, I'd rather have a language that compiles to native code, isn't quite as write-only as that, and doesn't cost an arm and a leg, even when using a DB.

blablablerg

5 months ago

"K is a general-purpose programming language that excels as a tool for data wrangling, analytics and transformation."

How does it compare to R/tidyverse?

7thaccount

5 months ago

It's mainly for quants where you couple the array language with a time series database of all your stock quotes. Once you understand the language you can do a ton of analysis with extremely little code. Think of it as a mathematical SQL dialect I guess.

In my opinion, it's very cool, but Python's ecosystem (and R's) is just so much better with scientific libraries and charting and all that. Kdb+ (the database) and K the language are likely much faster than R for general analysis type stuff. R is also free and Kdb+ is not.

IshKebab

5 months ago

> The same baseless accusations of “unreadable”, “write-only” and “impossible to learn” are leveled at all Iversonian languages, k included.

I'd be really curious to know if they really are baseless. It's very very difficult to imagine that K developers can really read a mess like this as easily as one might read Go or whatever.

https://github.com/KxSystems/kdb/blob/master/e/json.k

Has anyone tested this? Take a K program and ask a K developer to explain it? Or maybe introduce a deliberate bug and see how long they take to fix it compared to other languages. You could normalise the results based on how long it takes them to write some other code.

Free research project for any compsci researchers out there... (though good luck finding skilled K programmers).

geocar

5 months ago

> It's very very difficult to imagine that K developers can really read a mess like this as easily as one might read Go or whatever.

水落石出。

> Has anyone tested this? Take a K program and ask a K developer to explain it?

I am not sure what you're asking. Do you want me to read it to you?

Here is me reading some other people's code:

https://news.ycombinator.com/item?id=8476633

https://news.ycombinator.com/item?id=22010223

Do you want me to read to you the JSON encoder (written twice) and the decoder in this way?

> Or maybe introduce a deliberate bug and see how long they take to fix it compared to other languages.

https://news.ycombinator.com/item?id=27209093#27223086

> You could normalise the results based on how long it takes them to write some other code.

https://news.ycombinator.com/item?id=22459661#22467866

https://news.ycombinator.com/item?id=31361023#31364262

IshKebab

5 months ago

Thanks that was very interesting.

It seems to me that it has a lot of the same properties as regex. Looks like gobbledygook at first glance, but after learning it I can write them, and read them with some effort (depending on the complexity). However nobody would describe regexes as "readable", and they're quite error-prone. I definitely wouldn't want to write a whole program in regex.

Regexes shine most when they're used interactively, e.g. in one-off greps, or editors. There readability doesn't matter at all, error-proneness doesn't really matter, and terseness is important. The problems start when people put those grep commands in scripts where the output isn't supervised by humans.

I wonder if the same is true for K - it started as a query language for one-off queries & investigations, and then people started saving those queries and making bigger programs?

geocar

5 months ago

> However nobody would describe regexes as "readable"

I would, and do, and I urge you to be less judgemental about things you do not know anything about because you will never learn anything new with that attitude.

> I wonder if the same is true for K - it started as a query language for one-off queries & investigations,

Why do you wonder this? I don't think it's true, but so what? Did you not read what I wrote? Seriously: Why do you put so much effort trying to talk yourself out of learning how to do something that is obviously amazing to you?

I am telling you I can read this. I like this. I am not nobody, just someone you did not think existed. And I am telling you it is possible for you too.

IshKebab

5 months ago

> I would, and do, and I urge you to be less judgemental about things you do not know anything about because you will never learn anything new with that attitude.

The thing is, I am extremely familiar with regexes (I've even written a regex engine), so I know exactly how readable they are - even after knowing them really well. So the fact that you think they are still readable suggests to me that your judgement of K's readability is also suspect.

> Why do you wonder this?

It would be a reasonable explanation of why K exists.

geocar

5 months ago

> The thing is, I am extremely familiar with regexes (I've even written a regex engine), so I know exactly how readable they are - even after knowing them really well.

Your experience writing a "regex engine" once upon a time led you to believe regular expressions are difficult to read.

My experience maintaining a few million lines of perl over a couple of decades has led me to believe that I can read regular expressions with no discomfort.

The Real™ thing is you can get better at anything with practice, even this, but listen I also think K is more useful than regular expressions and I would have used less perl had I learned K sooner.

> So the fact that you think they are still readable suggests to me that your judgement of K's readability is also suspect.

It should make you suspect whether or not you have any idea what an expert actually is. I mean, the inventor of regular expressions tinkered with them for decades, and new advancements are still happening sixty years later!

You don't know what you don't know, and there is very little you can do about that except pay attention to people who can do things you do not know how to do yet, and reserve your judgement about how they do it until you can do it better.

The only thing k has in common with regular expressions is your claim they are both difficult, a claim I disagree with.

> > Why do you wonder this?

> It would be a reasonable explanation of why K exists.

You misunderstand me, perhaps on purpose, but I hope you and others will think about this: Why do you care why it exists when I have shown you something so much more amazing than an opinionated history lesson?

I think k exists to make programs that make money. Forever. Because a little bit of money from a lot of programs over a long time is worth a lot, k is fast to write it. Because sometimes getting the answer faster makes more money, k runs fast too. Because people are trusting their money with it, k runs very predictably. Because sometimes your vendor just changes the input format on a Friday night, it's important that it is easy to read and make changes to k programs.

Arthur said it was the keys to the kingdom.

michaelg7x

5 months ago

It's entirely possible, have done it at few times. For example, the `fby` verb[?] annoyed me one too many times, so I pulled it apart to see what was going on. In contrast to json.k it's quite short. I usually split each separable idea into a new line and introduce a bunch of new variables to track state that would otherwise be passed from right to left. Lengthy end-of-line comments are my chosen way of understanding q or k when I come back to anything later.

rak1507

5 months ago

You have been talking about array languages on HN for at least 6 years, and still refuse to believe that anyone can read them!

What do you have to gain from this stance, and why don't you believe people who tell you otherwise?

IshKebab

5 months ago

It's not a question of can they read them, it's are they readable. I obviously don't believe things just because people say them. Do you?

rak1507

5 months ago

When it comes to subjective opinions, what other choice do you have? Your idea of doing studies is interesting but that's not even been done for most languages, yet claiming "python is unreadable" would clearly be laughable.

Either everyone who uses array languages does actually find them readable, or they're all persistently lying for... what reason? And forcing themselves to use something they don't find readable? Why would anyone do that! Especially considering a lot of array language users are hobbyists, who have chosen to use them, it's not like they're forced to.

jbritton

5 months ago

code_report does a bunch of videos on this https://youtu.be/pq1k5USZZ9A?si=hmzucWWdNnGrI0Os

jbritton

5 months ago

Link to the video before the one above that compares APL, BQN, J, R, Julia, Numpy on the same problem. https://youtu.be/8ynsN4nJxzU?si=EeAZEZzA2kd0mw1y

hcfman

5 months ago

I build a language called K for my masters thesis in 1984. Who was first ?

mlochbaum

5 months ago

You win! Whitney was just out of graduate school at the time, and had worked some with APL at I.P. Sharp but was implementing "object-oriented languages, a lot of different LISPs, Prolog"[0]. Next was the more APL-like A around 1985 and K only in 1992.

[0] https://queue.acm.org/detail.cfm?id=1531242

sl0thentr0py

5 months ago

i've been doing the last 3 years advent of codes in q/kdb+, it's a lot of fun https://github.com/sl0thentr0py/aoc/blob/main/aoc2023/3/foo....

sega_sai

5 months ago

I had to learn it for the interview homework for a quant job. It is a fun language to to solve small-ish problems. I somehow got a feeling it's a good language to show off clever solutions, but in the same time I can't comprehend how one could use such a language for any serious large code.

rak1507

5 months ago

I had a peek at your github profile, and noticed "sqlutilpy", "Python module to efficiently query SQL databases and return numpy arrays". K is like if a language just did that by default.

khazhoux

5 months ago

Developers act like they forgot about K

HexDecOctBin

5 months ago

What is the difference between APL and all the various APL-like languages like BQN, J and K? Which one should a beginner start with? Which has the best tooling for debugging, type checking, etc.?

skruger

5 months ago

Depends. APL is the OG. Try a few and see what you like. If you learn one Iverson language, it's pretty easy picking up the others.

Here's a gentle guide to APL by the same author (me):

https://xpqz.github.io/learnapl/

Dyalog APL is likely the best supported in terms of tooling, debugging etc. If you're looking for static typing, you're in the wrong place.

rscho

5 months ago

The main difference separating them is the array model. APL has the so-called 'nested array' model, meaning that everything is an array. J has a 'flat array model' meaning scalars are distinct from arrays. Both models introduce typing inconsistencies preventing efficiency. BQN tried to remedy this and use an efficient compiler. What sets K apart is that it does not have multidimensional arrays, but just lists of 1D arrays. This makes K ideal for financial work, while the others are more non-financial math-oriented.

radiator

5 months ago

I think the best today cannot be APL, because it carries so much historical baggage and because commercial implementations dominate it. So start with BQN, it is free, has the tooling and it also has succeeded in building a community.

tomku

5 months ago

There's several ways to look at the differences.

The one that will jump out at most programmers who are familiar with mainstream languages is that J, k, q and Nial use ASCII characters while APL, BQN and Uiua prefer glyphs. q and Nial additionally favor words rather than shortened abbreviations, and Uiua has plain words that auto-format to its glyphs to aid in typing. The other glyph-based languages rely on custom (software) keyboard layouts or input methods to let you type the symbols they need. You do not need a special keyboard to program in any of these languages. ASCII-or-not is not a decision that any of the array languages have made lightly or for purely aesthetic reasons, it has deep consequences for how the languages feel that won't really make sense until you get some hands-on experience. As a beginner you'll probably gravitate towards one of the sides without understanding those deeper implications, and that's totally okay, but please keep an open mind.

If access to a high-quality open-source implementation is important for you, your options narrow a bit. J, BQN, Uiua and Nial all have a primary implementation that's open source. k has implementations that are open-source but the official versions of k that most people use "in anger" are commercial products with a limited free trial, and afaik there's no mature open-source versions of kdb+/q, which are kind of k's killer app. There are many implementations of APL but Dyalog is the clear leader and it's a closed-source commercial product with a personal/non-commercial free version. I wish this was less of a factor because it's so hard to get people interested in languages when the best versions aren't available to them, but it has gotten better in recent years.

Regarding tooling, you should go in with minimal expectations. Some of the tooling is quite good (particularly J and Dyalog APL, in my opinion) but it's heavily biased towards the specific type of iterative, interactive development that nearly all array programmers favor. Debuggers are sometimes present but usually not a primary tool. None of the major array languages have static typing. There are some array-adjacent languages like Futhark and Dex that do, but they're very different than the "Iversonian" array languages you asked about, and are also active research projects.

(Edit: Also worth mentioning that package managers and build systems are not common in the array world.)

There are many other differences that matter immensely to the array community but you won't have context for as a beginner, so I'm not going to go too deep into them, but if you're curious, https://github.com/codereport/array-language-comparisons has some comparison tables and example code written in a variety of languages. code_report/Conor's Youtube channel at https://www.youtube.com/@code_report/ is also an excellent place to get exposure to various array languages and concepts.

All that said, in my opinion the easiest languages to recommend to get started are BQN and J, depending on whether you want glyphs or not. If you're comfortable using a closed-source tool with restrictive licensing, Dyalog APL is also an excellent choice. Any of the three will show you both the joys and pains of array programming if you put time into learning it, and give you enough context to make an informed decision about going deeper or finding another array language more to your taste.

cess11

5 months ago

J has an Android interpreter, which for me as a non-professional dabbler is the killer app since it means I can study and play on my handheld devices when I'm on a break from work or family.

The documentation is pretty decent compared to the other members of the Iverson gang and the libraries one can install with the desktop version makes it somewhat batteries included, at least it's easy to suck in a file and start rendering plots.

Maybe BQN can compete on these things nowadays, I'm not sure.

dzaima

5 months ago

You can run BQN in termux on Android pretty well. A list of libraries is available at https://github.com/pellertson/awesome-bqn. https://mlochbaum.github.io/BQN/ has pretty good documentation.

dzaima

5 months ago

err, on the Android termux thing - building CBQN is annoyingly slightly non-trivial - have to `pkg install libandroid-spawn`, and then `make for-build lf=-landroid-spawn; make lf=-landroid-spawn`, because android doesn't come with posix_spawn.

cess11

5 months ago

Thanks, I'll try it out.

shric

5 months ago

My first thought was "weird, they made a language called k even though there is already a language called K". I then realized it's actually talking about K.

Thoroughout the article it's spelled k consistently except at the start of a sentence. This is weird. The language is K not k. Nobody spells the C language as c.

cess11

5 months ago

It's actually rather common to spell its as k, as well as K. I think q is more common than Q.

sedatk

5 months ago

"k" was used in lowercase throughout the article, including the title.

Nihilartikel

5 months ago

Implement all this jazz with s-expressions and I am on board!

liveranga

5 months ago

You're in luck: https://github.com/phantomics/april

James_K

5 months ago

> Strings are just vectors of characters

I hope not.

lytedev

5 months ago

Can you elaborate? Why not?

James_K

5 months ago

A character could be 1 byte long, in which case the language cannot properly handle unicode; it could be 4 bytes long in which care there is lot of wasted space storing text and it cannot properly handle extended grapheme clusters; or a character could be arbitrary length at which point strings no longer have a flat representation in memory. None of these are good. The exact properties of a string can really only be encoded efficiently with a flat linear access data-type.

dzaima

5 months ago

1-byte characters (i.e. what k's typically have) handle ASCII just fine, for which doing reversing/splitting/uppercase/lowercase/iteration/etc is actually meaningful (stock symbols, stringified dates, identifiers, etc).

And if you have to handle arbitrary language user input, there's basically no operations you can/should actually do anyway. Uppercasing/lowercasing? Doesn't make sense on CJK languages. Reversing? Completely meaningless. Trimming to the first N chars for some visual display/summary/preview? Even grapheme clusters won't help avoiding a character with ten thousand combining components, and you'll have to do language-specific logic to not cut in the middle of a word for languages where the display of a prefix of a word may change depending on later letters! And forget about spaces meaning anything.

Basically the only string ops I can think of that make sense for non-ASCII generally would be splitting/joining on newlines and escaping for JSON/HTML or whatever, which'll work completely fine on a byte list anyway.

There's perhaps some middle-ground of doing things for a specific set of languages, but even for such you won't care about the storage format anyways, as what matters for you is just whether operations you use (presumably using some library; and even if you write a manual uppercase for French specifically or whatever, you'd notice if you implemented it wrongly) do the thing they should.

So a list of byte chars is just fine for anything one would actually do, providing optimal access to ASCII, and not actually making things worse for non-ASCII.

James_K

5 months ago

Not true at all! Extended grapheme clusters are defined by Unicode for a reason and include relevant combining marks following a letter[1]. The point more generally is that a programming language shouldn't preferentially choose one character definition over another. The decision of whether to iterate by bytes, points, or clusters is a significant one which the language shouldn't force upon users. For many common operations, bytes are a sufficient representation, but then one must be precise about encoding. A list of UTF-8 bytes is very easy to deal with but the bytes of a UTF-16 string are highly problematic. Inserting a single byte character at the start of such a string would destroy it's entire content. There is no situation where "give me the characters of this string" is a sufficiently precise statement, so it should not be made available by programming languages. Likewise, the idea of indexing a string is not well defined at all. The only consistent interface for accessing strings requires users to specify both encoding and separation, and this can only be done performantly in the general case with a linear scan.

[1] http://unicode.org/reports/tr29/

mlochbaum

5 months ago

I think it's worth considering that application development and GUIs really aren't K's thing. For those, yes, you want to be pretty careful about the concept of a "character", but (as I understand it) in K you're more interested in analyzing numerical data, and string handling is just for slapping together a display or parsing user commands or something. So a method that lets the programmer use array operations that they already have to know instead of learning three different non-array ways to work with strings is practical. Remember potential users are already very concerned about the difficulty of learning the language!

dzaima

5 months ago

I meant the combining mark point as a thing you would want to cut off; a 50-char chopped-off "summary" of a thing should not include a character with ten thousand combining marks ever. Of course it'd be preferred to cut to cut before and not in the middle, but certainly not after, which is what you'd get if taking the first 50 extended grapheme clusters, the 20000-byte glyph counting as one. Point being, you still just want to use a library that has properly thought out the question. And that applies to most (all?) sane fully-Unicode-aware operations.

Places where ASCII-only is a known expectation and there are meaningful per-char operations are plenty; that's what using a list of bytes provides. Indeed you'd probably want to use another abstraction if you have non-ASCII. And for such you could use something to do the form of iteration or operation you want just fine, even if the input/output is a list of byte-chars representing plain UTF-8.

James_K

5 months ago

Well in that case, the way you get a 50 char summary is by iterating grapheme clusters, then counting up to 50 points and discarding the broken cluster. It's quite trivial if the language exposes an interface for iterating both clusters and points, and without such an interface the problem is much harder to notice. Hence why the language shouldn't prefer clusters to points or points to clusters. It should expose all relevant representations without prejudice.

Even if ASCII is appropriate in some situation, this should be stated within the program. Requiring people to be explicit about the data they produce and consume is important and useful. A user might decide that UTF-16 best serves their need (or be working on the Windows platform) in which case code which works with strings as linear sequences will be able to operate on their strings without issue. Code which assumes a UTF-8 byte representation will require an the entire string to be allocated, converted, then reallocated and converted back. Huge overhead and potential incompatibility for no reason.

dzaima

5 months ago

> It's quite trivial if the language exposes an interface for iterating both clusters and points, and without such an interface the problem is much harder to notice

I assure you, 99% of people won't handle this correctly even if given a cluster-based interface (if they even bother using it). And this still doesn't handle the question of cutting words in the middle of some languages resulting in broken display of the non-cut part (or languages without space-based word boundaries to cut on). So the preferred thing is still to use a library.

I don't think anyone in k would use UTF-16 via a character list of 2 chars per code unit; an integer list would work much nicer for that (and most k interpreters should be capable of storing such with 16-bit ints; there's still some preference for using UTF-8 char lists, namely, such get pretty-printed as strings); and you'd have to convert on some I/O probably anyway. Never mind the world being basically all-in on UTF-8.

Even if you have a string type that's capable of being backed by either UTF-8 or UTF-16, you'll still need conversions between those at some points; you'd want the Windows API calls to have a "str.asNullTerminatedUTF16Bytes()" or whatnot (lest a UTF-8-encoded string makes its way here), which you can trivially have an equivalent of for a byte list. And I highly doubt that overhead of conversion would matter anywhere you need a UTF-16-only Windows API.

I doubt all of those fancy operations you'll be doing will have optimized impls for all formats internally either, so there's internal conversions too. If anything, I'd imagine that having a unified internal representation would end up better, forcing the user to push the conversions to the I/O boundaries and allowing focus on optimizing for a single type, instead of going back-and-forth internally or wasting time on multiple impls.

fc417fc802

5 months ago

Python uses UTF-8. A Python string is iterable. It is generally reasonable to describe any iterable as a vector (at least in terms of the API). The result of such iteration might not be a character in any formal sense, but it's a reasonable description nonetheless.

I'm really not seeing the issue here.

ZeroCool2u

5 months ago

I cannot warn folks against using q/kdb+ enough. Use Polars or DuckDB, get the job done, and enjoy your life.

jerjerjer

5 months ago

Eh, no need. Author states in the first two paragraphs that there are 9 versions of k, each developed from scratch and incompatible with each other. Anyone who develops software for money should and would leave immediately. I do appreciate the honesty, though.

boothby

5 months ago

> and enjoy your life.

As somebody who hacks on, around and in esoteric languages for fun; I must object.

ZeroCool2u

5 months ago

And as someone that has written an interpreter from scratch in F#, and since there's a free trial version, I'd say go for it and have fun and live your best life! Just perhaps reconsider allowing your livelihood to be dependent on it :)

7thaccount

5 months ago

What has your experience been like? What are the drawbacks besides the cost and proprietary nature?

ZeroCool2u

5 months ago

I don't want to be too disparaging, so I will just say that the language is exotic. Otherwise, the licensing model is Oracle-esque based on host and core counts etc. The software is fast, that you cannot deny, though it does critically depend on the speed of the storage attached to the host. Also, it's written in C++ and it shows. Had to do multiple (paid) upgrades due to memory leaks.

I'm sure there was a time it was best in class and even now maybe it's the best for a few niche use cases, but unless you're absolutely certain you need it, I would flee from it and save your sanity.

7thaccount

5 months ago

I thought it was written in just plain C based off old Arthur Whitney stories.

Yeah...Oracle licensing sounds scary and having to pay to fix their own memory leaks sounds frustrating.

Thanks for the experience.

ZeroCool2u

5 months ago

You know what, I can't place where exactly I heard it was C++ rather than C and they just ship you a binary blob executable, so I may be wrong and it absolutely could be plain C.

nottorp

5 months ago

> As you’ve landed here, you’ve clearly somehow sought out k, and you likely have an idea what it’s about.

Author didn't expect to end up on HN then :)

z5h

5 months ago

> Readability is a property of the reader, not the language.

Similarly, the inability of a person to write machine code directly is a property of the person, not the hardware. Yet some of these people admit their limitations and use K.

pyrale

5 months ago

Silicon computers are a crutch for the people too flawed to run their calculations in their head.

Ygg2

5 months ago

"It is by will alone I set my mind in motion. It is by the juice of Sapho that the thoughts acquire speed, the lips acquire stains the stains become a warning. It is by will alone I set my mind it motion..."

user

5 months ago

[deleted]

user

5 months ago

[deleted]

unit149

5 months ago

[dead]