jofer
4 months ago
One of my biggest gripes around typing in python actually revolves around things like numpy arrays and other scientific data structures. Typing in python is great if you're only using builtins or things that the typing system was designed for. But it wasn't designed with scientific data structures particularly in mind. Being able to denote dtype (e.g. uint8 array vs int array) is certainly helpful, but only one aspect.
There's not a good way to say "Expects a 3D array-like" (i.e. something convertible into an array with at least 3 dimensions). Similarly, things like "At least 2 dimensional" or similar just aren't expressible in the type system and potentially could be. You wind up relying on docstrings. Personally, I think typing in docstrings is great. At least for me, IDE (vim) hinting/autocompletion/etc all work already with standard docstrings and strictly typed interpreters are a completely moot point for most scientific computing. What happens in practice is that you have the real info in the docstring and a type "stub" for typing. However, at the point that all of the relevant information about the expected type is going to have to be the docstring, is the additional typing really adding anything?
In short, I'd love to see the ability to indicate expected dimensionality or dimensionality of operation in typing of numpy arrays.
But with that said, I worry that typing for these use cases adds relatively little functionality at the significant expense of readability.
hamasho
4 months ago
I also had a very hard time to annotate types in python few years ago. A lot of popular python libraries like pandas, SQLAlchemy, django, and requests, are so flexible it's almost impossible to infer types automatically without parsing the entire code base. I tried several libraries for typing, often created by other people and not maintained well, but after painful experience it was clear our development was much faster without them while the type safety was not improved much at all.
HPsquared
4 months ago
Have you looked at nptyping? Type hints for ndarray.
https://github.com/ramonhagenaars/nptyping/blob/master/USERD...
kccqzy
4 months ago
What is this project actually for?
Its FAQ states:
> Can MyPy do the instance checking? Because of the dynamic nature of numpy and pandas, this is currently not possible. The checking done by MyPy is limited to detecting whether or not a numpy or pandas type is provided when that is hinted. There are no static checks on shapes, structures or types.
So this is equivalent to not using this library and making all such types np.ndarray/np.dtype etc then.
So we expend effort to coming up with a type system for numpy, and tools cannot statically check types? What good are types if they aren't checked? Just a more concise documentation for humans?
jofer
4 months ago
That one's new to me. Thanks! (With that said, I worry that 3rd party libs are a bad place for types for numpy.)
nerdponx
4 months ago
Numpy ships built-in type hints as well as a type for hinting arrays in your own code (numpy.typing.NDArray).
The real challenge is denoting what you can accept as input. `NDArray[np.floating] | pd.Series[float] | float` is a start but doesn't cover everything especially if you are a library author trying to provide a good type-hinted API.
jofer
4 months ago
I'm aware of that. As explained in the original comment, not being able to denote dimensionality in those is a major limitation.
dwohnitmok
4 months ago
This isn't static, but jaxtyping gives you at least runtime checks and also a standardized form of documenting those types. https://github.com/patrick-kidger/jaxtyping
jofer
4 months ago
It actually doesn't, as far as I know :) It does get close, though. I should give it a deeper look than I have previously, though.
"array-like" has real meaning in the python world and lots of things operate in that world. A very common need in libraries is indicating that things expect something that's either a numpy array or a subclass of one or something that's _convertible_ into a numpy array. That last part is key. E.g. nested lists. Or something with the __array__ interface.
In addition to dimensionality that part doesn't translate well.
And regardless, if the type representation is not standardized across multiple libraries (i.e. in core numpy), there's little value to it.
dwohnitmok
4 months ago
Does it not?
E.g. `UInt8[ArrayLike, "... a b"]` means "an array-like of uint8s that has at least two dimensions". You are opting into jaxtyping's definition of an "array-like", but even though the general concept as you point out is wide spread, there isn't really a single agreed upon formal definition of array-like.
Alternatively, even more loosely as anything that is vaguely container-shaped, `UInt8[Any, "... a b"]`.
jofer
4 months ago
Ah, fair enough! I think I misread some things around the library initially awhile back and have been making incorrect assumptions about it for awhile!
nerdponx
4 months ago
I wonder if we should standardize on __array__ like how Iterable is standardized on the presence of __iter__, which can just return self if the Iterable is already an Iterator.
davnn
4 months ago
I am using runtime type and shape checking and wrote a tiny library to merge both into a single typecheck decorator [1]. It‘s not perfect, but I haven‘t found a better approach yet.
stared
4 months ago
Some time ago I created a demo of named dimensions for Pytorch, https://github.com/stared/pytorch-named-dims
In the same line, I would love to have more Pandas-Pydantic interoperability at the type level.
user
4 months ago
relatall
4 months ago
wisdom in the community. people who can excel at things don’t go to big tech companies because they won’t be appreciated , they will get pipped if they can’t wear any of those chains shackles well
efavdb
4 months ago
Would a custom decorator work for you?
jofer
4 months ago
Unless I'm missing something entirely, what would that add? You still can't express the core information you need in the type system.
efavdb
4 months ago
I meant only that you can insist a parameter has some quality when passed.
jofer
4 months ago
Good point, but I think we're talking past each other a bit.
Typing in python within the scientific world isn't ever used to check types. It's _strictly_ only documentation.
Yes, MyPy and whatnot exist, but not meaningfully. You literally can't use them for anything in this domain (they wont' run any of the code in question).
Types (in this subset of python) are 100% about documentation, 0% about enforcement.
We're setting up a _documentation_ system that can't express the core things it needs to. That worries me. Setting up type _checks_ is a completely different thing and not at all the goal.
efavdb
4 months ago
I see makes sense. Thanks.