Should the vectorized geo_to_h3 stay as unstable? #205

thomasaarholt · 2021-11-09T11:49:49Z

#147 introduced vectorized support for geo_to_h3, which sped things up by about 50% over a python for loop. It's currently under unstable, which raises a warning. It seems that geo_to_h3 gives accurate results, and that its API won't retroactively break in the future. @ajfriend do you agree?

Can I suggest that the functionality gets moved to api instead of unstable? To me, the warning from unstable implied that it may not give correct results, which does not seem to be the case.

The text was updated successfully, but these errors were encountered:

ajfriend · 2021-11-09T17:51:45Z

Maybe we should update the warning, as unstable was meant to imply that the API will probably change in the future. (Happy to accept a PR on this, btw!)

Also, we're planning on a 4.0 release when the core library also moves to 4.0, so many of the function names will change anyway and break backwards compatibility. For 4.0, I would like to settle on how exactly we'd like to implement the vectorized functions. For example, it'd be nice if we could have them work like numpy functions and dispatch correctly to both vector and scalar arguments. (Also happy to accept PRs to play around in the unstable namespace to try out different approaches for this.)

kylebarron · 2022-04-11T18:27:01Z

For 4.0, I would like to settle on how exactly we'd like to implement the vectorized functions.

I think 4.0 is coming up pretty soon 😄 . I'd love to get more of the H3 API to have a vectorized counterpart. My current thoughts on this are:

Add more .pxd files to prevent duplicating Cython wrapper code for vectorized functions. The existing approach for the few vectorized functions calls into h3lib directly.

h3-py/src/h3/_cy/unstable_vect.pyx

Lines 59 to 60 in 034a41e

c = deg2coord(lat[i], lng[i])

out[i] = h3lib.geoToH3(&c, res)

This isn't horrible for simple functions like geo_to_h3, but it's not ideal for other functions that require more Cython wrapper code around h3lib. Adding .pxd definition files (that define headers for our "exported" cpdef functions) for our .pyx implementation files will allow us to import those .pyx files from other Cython files. (This would also be relevant for Example using cimport h3 in external code for speed #152).

So instead of calling h3lib directly, we'd call our own internal wrapper functions:
```
 from .geo import geo_to_h3
 
 ...
 with nogil:
     for i in range(len(lat)):
         out[i] = geo_to_h3(lat[i], lng[i])
```
Manually defining vectorized functions is unavoidable. Unlike the API functions file in Python, the vectorized functions must be defined in Cython. I assume function reuse won't be possible and these functions will just have to be copied to the vectorized API.
Focus on supporting scalar and 1-D numpy data. Pure numpy ufuncs can be used on data of any dimension, but I think the extra complexity of trying to support more than 1D input might not be worth it. I.e. h3_to_geo with 2D input would require 3D output.
Add vectorized functions where easiest, first. Functions with uniform input and output dimension are easiest to support. A vectorized h3_to_geo_boundary would be harder because I assume the number of tuples returned per h3 cell is not constant.
Don't shy away from depending on Numpy headers if necessary. If the vectorized Cython code would be a lot simpler to implement by importing Numpy Cython code, we should embrace that. (I'm not currently sure whether depending on Numpy would make things easier). There's surely a way to modify CMakeLists.txt to compile the vectorized code only if Numpy headers are available, no? And Numpy will easily be available on CI

dfellis · 2022-04-11T18:33:46Z

* Don't shy away from depending on Numpy headers if necessary. If the vectorized Cython code would be a lot simpler to implement by importing Numpy Cython code, we should embrace that. (I'm not currently sure whether depending on Numpy would make things easier). There's surely a way to modify `CMakeLists.txt` to compile the vectorized code only if Numpy headers are available, no? And Numpy will easily be available on CI

This last point has the potential issue of requiring anyone installing h3-py the "normal" way to have numpy installed (unless there's an "optional dependency" concept in Python, too?) It feels like it might break things for users in memory-constrained environments or that have some requirement to continue using an older numpy, etc.

Not sure how big of a problem that is, but one way to avoid it would be to have this repo publish two packages, one that's "pure" h3-py and another that includes the vectorization/numpy/etc stuff?

kylebarron · 2022-04-11T18:50:09Z

This last point has the potential issue of requiring anyone installing h3-py the "normal" way to have numpy installed (unless there's an "optional dependency" concept in Python, too?)

I don't think so. Not sure if by "normal" you mean via wheels or from source.

From wheels: I'm pretty sure that a wheel compiled against Numpy headers would not require Numpy at runtime if that file isn't imported. It would be essentially the same as our current numpy_int API where Numpy is an optional dependency, and so Numpy is only required if you import h3.api.numpy_int.
From source: I'm less familiar with scikit-build, but in a "normal" Cython setup it would be possible to only compile the Numpy Cython files if Numpy headers can be found at compile time. I.e. in my previous experience, in the setup.py file, you would import numpy and find the path to numpy's headers, then pass a list of Cython modules to ext_modules. It wouldn't be hard to simply remove a file from what's passed to ext_modules if numpy couldn't be imported. I assume scikit-build has similar functionality.

kylebarron · 2022-04-11T18:57:02Z

I'm less familiar with scikit-build

From a cursory glance at the scikit-build docs it looks like it would be simple to update this CMakeLists.txt to conditionally add a Cython module only if Numpy was found at build time:

find_package(NumPy MODULE)

...

if (NumPy_FOUND)
    add_cython_file(unstable_vect)
endif()

dfellis · 2022-04-11T19:02:02Z

Got it, though Python being even less strict on optional dependencies than Node is surprising.

rfc-hv · 2022-11-29T23:31:56Z

it looks like the v4.0.0b2 pre release has removed unstable.vect.geo_to_h3. the scalar version of the function has been renamed from geo_to_h3 to latlng_to_cell. unclear if there are vectorised versions anywhere, or if this is outside v4.0 scope

kylebarron mentioned this issue Apr 11, 2022

Add .pxd defs to expose Cython functions #224

Closed

This was referenced Aug 11, 2022

[ENH] Auto-generation of ufuncs cython/cython#4758

Closed

Is it possible to make the Cython functions callable from Cython or Numba? Missing __pyx_capi__ attribute. #273

Open

benbovy mentioned this issue Nov 9, 2023

H3 backend xarray-contrib/xdggs#15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should the vectorized geo_to_h3 stay as unstable? #205

Should the vectorized geo_to_h3 stay as unstable? #205

thomasaarholt commented Nov 9, 2021

ajfriend commented Nov 9, 2021

kylebarron commented Apr 11, 2022

dfellis commented Apr 11, 2022

kylebarron commented Apr 11, 2022

kylebarron commented Apr 11, 2022

dfellis commented Apr 11, 2022

rfc-hv commented Nov 29, 2022 •

edited

Should the vectorized geo_to_h3 stay as unstable? #205

Should the vectorized geo_to_h3 stay as unstable? #205

Comments

thomasaarholt commented Nov 9, 2021

ajfriend commented Nov 9, 2021

kylebarron commented Apr 11, 2022

dfellis commented Apr 11, 2022

kylebarron commented Apr 11, 2022

kylebarron commented Apr 11, 2022

dfellis commented Apr 11, 2022

rfc-hv commented Nov 29, 2022 • edited

rfc-hv commented Nov 29, 2022 •

edited