dtoolkit.geoaccessor.geoseries.geodistance_matrix#

dtoolkit.geoaccessor.geoseries.geodistance_matrix(s: GeoSeries, /, other: GeoSeries | GeoDataFrame | None = None, radius: float = 6371008.771415059) DataFrame[source]#

Returns a DataFrame containing the great-circle distances matrix between in s and other via haversine formula.

\[D(x, y) = 2 \arcsin [ \sqrt{ \sin^2 ((y_2 - y_1) / 2) + \cos(y_1) \cos(y_1) \sin^2 ((x_2 - y_1) / 2) } ]\]
Parameters:
otherGeoSeries, or GeoDataFrame, default None

If None, uses other=s.

radiusfloat, default 6371008.7714150598

Great-circle distance uses a spherical model of the earth, using the mean earth radius as defined by the International Union of Geodesy and Geophysics, (2a + b)/3 = 6371008.7714150598 meters for WGS-84.

Returns:
DataFrame
  • The index and columns are the same as the index of s and other.

  • The values are the great-circle distances and its unit is meters.

Raises:
ModuleNotFoundError

If don’t have module named ‘sklearn’.

ValueError

If the CRS is not ESGP:4326.

TypeError

If the other is not a GeoSeries, GeoDataFrame, or None type.

Notes

  • Currently, only supports Point geometry.

  • The great-circle distance is the angular distance between two points on the surface of a sphere. As the Earth is nearly spherical, the haversine formula provides a good approximation of the distance between two points of the Earth surface, with a less than 1% error on average.

Examples

>>> import dtoolkit.geoaccessor
>>> import pandas as pd
>>> df = pd.DataFrame(
...     {
...         "x": [120, 122, 100],
...         "y":[30, 55, 1],
...     },
... ).from_xy("x", "y", crs=4326)
>>> df
     x   y                    geometry
0  120  30  POINT (120.00000 30.00000)
1  122  55  POINT (122.00000 55.00000)
2  100   1   POINT (100.00000 1.00000)
>>> other = pd.DataFrame(
...     {
...         "x": [120, 110],
...         "y":[30, 40],
...     },
... ).from_xy("x", "y", crs=4326)
>>> other
     x   y                    geometry
0  120  30  POINT (120.00000 30.00000)
1  110  40  POINT (110.00000 40.00000)
>>> df.geodistance_matrix(other)
              0             1
0  0.000000e+00  1.435335e+06
1  2.784435e+06  1.889892e+06
2  3.855604e+06  4.453100e+06