Sparrow: C++20 Idiomatic APIs for the Apache Arrow Columnar Format

32 pointsposted 4 days ago
by tanelpoder

14 Comments

levzettelin

14 hours ago

  // You are responsible for releasing the structure in the end
  arrow_array.release(&arrow_array);
This doesn't look like RAII. How is this idiomatic for C++20? Why do you have to pass a pointer to "this" again as an explicit argument.

rfoo

14 hours ago

This is the extracted Arrow C data interfaces as documented in https://arrow.apache.org/docs/format/CDataInterface.html

It's not how you interact with the data in your own C++ code, it's for passing this data to other in-process consumers (libraries etc). While in the example it calls the release function, this is usually just passed to a downstream consumer and it's their responsibility to call it.

I agree that having such an example as the first one is confusing. Given that a large part of the point of Apache Arrow is passing data columnar data between libraries in different languages in memory, it makes some sense.

CyberDildonics

9 hours ago

It's not how you interact with the data in your own C++ code, it's for passing this data to other in-process consumers (libraries etc). While in the example it calls the release function, this is usually just passed to a downstream consumer and it's their responsibility to call it.

This seems like a strange rationalizations when you don't need to have explicit release to be able to pass it to something else.

pjmlp

10 hours ago

RAII predates C++98, I was already used to it in Turbo C++ for MS-DOS, and is pity we need to keep advocating for it as something extraordinary.

ender341341

4 hours ago

I think you're partly making the point for them, RAII has been idiomatic C++ since before c++ was standardized. It wasn't even idiomatic c++98 to be missing it, so to be missing it in c++20 library definitely still isn't.

CyberDildonics

9 hours ago

This doesn't have anything to do with what they said, they didn't say RAII was new.

pjmlp

8 hours ago

Might be misunderstood by others not skilled in C++ when reading,

> This doesn't look like RAII. How is this idiomatic for C++20?

CyberDildonics

7 hours ago

You can try to be insulting if you want but if you could explain the connection I think you would have already.

pjmlp

5 hours ago

I wasn't.

CyberDildonics

2 hours ago

You weren't what? Who are you saying "isn't skilled in C++" here and why would that matter?

mgaunard

15 hours ago

The official arrow implementation is already in C++11, not sure what the value proposition of this is.

rfoo

13 hours ago

<rant>The official Arrow C++ implementation is just ergonomic warts, full of `const std::shared_ptr<T>&` bs. Trying to use it to manipulate data always give me headache telling apart WTH is an Array, ArrayData, Buffer, and the typed Array interfaces are barely usable. The original official Rust port inherited all the mis-designs too. On the Rust side someone created arrow2 [0] to fix it.</rant>

And I'm glad there's a good C++ impl too.

[0] https://github.com/jorgecarleitao/arrow2

mgaunard

3 hours ago

that's because a given Arrow column is actually several arrays of arrays.

Array, ArrayData and Buffer map to different layers of the abstraction.