hackernews client

haolez

3 months ago

On a side note, what tools that leverage Datalog are in use by the HN crowd?

I know that Datomic[0] is very popular. I've also been playing with Clingo[1] lately.

blurbleblurble

3 months ago

Check out CozoDB, the embedded datalog-queried hybrid relational+vector+graph database written in Rust: https://www.cozodb.org/

I used it in a toy application and it was awesome.

This appears to be a dream database from the future.

huevosabio

3 months ago

It seems like the project has been abandoned? Last commit a year ago.

blurbleblurble

3 months ago

Fair point but what if it's just really solid already! :D

Idk, I'm not too worried about that, I'm eager to help out on a project like this if something came up.

anonzzzies

3 months ago

Yep, bit of a shame, many nice things in it and interesting to learn from but not active.

I have some local-first/client-side applications using datascript in ClojureScript. Used datahike (FOSS Datomic alternative) some times on the backend too, but mostly tend to use XTDB nowadays, which used to have a Datalog API but I think they removed in favor of SQL-like way instead, which was kind of a shame.

manoDev

3 months ago

I guess SQL is a requirement if they want to market their technology to normies.

zozbot234

3 months ago

SQL can express Datalog-like queries rather easily using recursive CTE's, and even more so via the recently-added Property Graph Query syntax.

jitl

3 months ago

recursive CTEs suck usability-wise compared to the usual Datalog horn clause syntax. I won't speak to usability of the "datomic" kind of datalog though, that thing I haven't been able to wrap my head around.

arxanas

3 months ago

Just gave a talk about this: https://blog.waleedkhan.name/what-if-sql-were-good/

- Recommend Ascent (Rust only, but supports targeting WASM)

- Soufflé: good, but too hard to integrate into existing systems; lots of ergonomic problems in comparison to Ascent (can elaborate)

- CozoDB: really cool but seems to be abandoned

- Logica: have not tried it yet

themk

3 months ago

Would like to hear about the ergonomic problems you have with souffle. We integrate it into our rust tools quite well, and generate typesafe rust bindings to our souffle programs, allowing us to insert facts and iterate over outputs.

arxanas

3 months ago

It's quite possible that I have different, smaller-scale problems than you have! So my feedback might not be as relevant

I wrote detailed commentary here: https://github.com/s-arash/ascent/discussions/72

Re Rust bindings and your specific comment:

- Deploying Soufflé and doing FFI is much more difficult for me in practice, just in terms of the overhead to set up a working build. (I'm not going to be able to justify setting up a Soufflé ruleset for Bazel, and then adding Rust-Soufflé binding generation, etc. at my workplace.)

- User-defined functors, or integrating normal data structures/functions/libraries into your Soufflé program, seems painful. If you're doing integrations with random existing systems, then reducing the friction here is essential. (In slide 16 of the talk, you can see how I embedded a constructive `Trace` type and a `GlobSet` into an actual Ascent value+lattice.)

- On the other hand, you might need Soufflé's component system for structuring larger programs whereas I might not (see above GitHub discussion).

Non-specifically:

- Several features like generative clauses, user-defined aggregations, lattices, etc. seem convenient in practice.

- I had worse performance with Soufflé than Ascent for my program for some query-planning reason that I couldn't figure out. I don't really know why; see https://github.com/souffle-lang/souffle/discussions/2557

kmicinski

3 months ago

> - I had worse performance with Soufflé than Ascent for my program for some query-planning reason that I couldn't figure out. I don't really know why; see https://github.com/souffle-lang/souffle/discussions/2557

I think the basic issue is that ADTs are simply not indexed--so to the degree that you write a query that would necessitate an index on a subtree of an ADT, you will face asymptotic blowup, as the way ADTs work will force you to scan-then-test across all ADTs (associated with that top-level tag). The issue is discussed in Section 5.2 of this paper here: https://arxiv.org/pdf/2411.14330

arxanas

3 months ago

Ah, yes, but I think Ascent also doesn't index ADTs. In this case, based on some other information, it seems like Soufflé _can_ plan the queries better if it has profiling data. It seems like Ascent just happened to pick a better query plan in my case without the profiling data.

Thanks for the link to the paper!

kmicinski

3 months ago

It's true that Ascent does not index ADTs either, but there are some tricks that you can use when you control the container type to get similar performance by, e.g., storing a pre-computed hash. I believe Arash, the main author of Ascent, was exploiting this trick for Rc<...> members and seeing good performance gains. It is a bit nuanced, you're right that Ascent doesn't pervasively index ADTs out of the box for sure.

jitl

3 months ago

For a while the Rust compiler's borrow checker "Polonius" was implemented with datalog using the `datafrog` engine. However, it appears to me that the in-tree version of polonius is moving away from datafrog (not enough of a rustc expert to say for sure which version of the borrow checker engine is in use)

chc4

3 months ago

CodeQL compiles to the Souffle datalog engine and I use it for static analysis. I've also used ascent for a few random side projects in Rust which is very convenient.

touisteur

3 months ago

The work done/supervised by Kristopher Micinski on using HPC hardware (not only GPUs but clusters) for formal methods is really encouraging. I hope we reach a breakthrough of affinity between COTS compute hardware and all kinds of formal methods, as GPUs found theirs with deep learning and subsequent large models.

One possible answer to 'what do we do with all the P100s, V100s, A100s when they're decomissionned from their AI heyday (apart from 'small(er) models'.

ux266478

3 months ago

Curious, why use cuda and hip? These frameworks are rather opinionated about kernel design, they seem suboptimal for implementing a language runtime when SPIR-V is right there, particularly in the case of datalog.

lmeyerov

3 months ago

(have been a big fan of this work for years now)

From the nearby perspective of building GFQL, an embeddable oss GPU graph dataframe query language somewhere between cypher and duckdb/pandas/spark, at an even higher-level on top of pandas, cudf, etc:

It's nice using higher-level languages with rich libraries underneath so we can focus on the foundational algorithm & data ecosystem problems while still achieving crazy numbers

cudf gives us optimized GPU joins, so jumping from cheap personal CPU or GPU boxes to 80GB server GPUs and deep 2B edge whole-graph queries running in a second without work has been nice :) we want our focus on getting regular graph operations fully data parallel in the way we want while being easy for users, figuring out areas like bigger-than-memory and data lakes, etc, so we want to defer lower-level efforts to when the rust etc rewrite is more merited. I do see value in starting low when the target value and workload is obvious for building our (eg, vector indexes / DBs), but when breaking new ground at every point, value to going where you can roll & extend faster.

touisteur

3 months ago

From their publication history, they want to use all HPC niceties, to use most/any available HPC installations.

Nowadays that means mostly CUDA on NVIDIA and HIP on AMD on the device side. Curious how the spirv support is on NVIDIA GPUs, including nsight tooling and the maturity/performance of libraries available (if only the cub-stuff for collective operations).

embedding-shape

3 months ago

Why is cuda sub-optimal compared to SPIR-V? I don't think I know the internals enough to understand if it's supposed to be obvious why one is better than the other.

I'm currently sitting and learning cuda for ML purposes, so happy to get more educated :)

jb1991

3 months ago

Just depends on how the manufacturer of the GPU handles code written in different languages. For example, what level of API access, what level of abstraction, and how is the source compiled i.e. how optimized is it. For example, on an apple GPU, you’ll see benchmarks that openCL and metal can vary depending on the tasks.

embedding-shape

3 months ago

Right, but that'd depend a lot on the context, task, hardware and so on.

What parent said seemed more absolute and less relative, almost positing it as there is no point in using cuda (since it's "sub-optimal" and people should use SPIR-V obviously. I was curious in the specifics about that.

sigbottle

3 months ago

I mean, nvidia exposes some pretty low level primitives, and you can always fiddle with the PTX as deepseek did.

zozbot234

3 months ago

What kind of SPIR-V? The SPIR-V used for compute shaders (Vulkan Compute) is totally different to the one for compute kernels (OpenCL and SYCL)...

Optimizing Datalog for the GPU

26 Comments

haolez

blurbleblurble

huevosabio

blurbleblurble

anonzzzies

embedding-shape

manoDev

zozbot234

jitl

arxanas

themk

arxanas

kmicinski

arxanas

kmicinski

jitl

chc4

touisteur

ux266478

lmeyerov

touisteur

embedding-shape

jb1991

embedding-shape

sigbottle

zozbot234