dtoolkit.transformer.GeoKMeans#

class dtoolkit.transformer.GeoKMeans(n_clusters=8, *, init='k-means++', n_init='auto', max_iter=300, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='elkan')[source]#

Spatial K-Means clustering.

The distance is calculated by haversine formula. Parameters and attributes are the same as sklearn.cluster.KMeans.

Methods

Raises:
ValueError

If the input is not in the form of [(longitude, latitude)].

See also

sklearn.cluster.KMeans

Original implementation of K-Means clustering.

Notes

algorithm is fixed to "elkan". Because only elkan algorithm can support custom distance.

Examples

>>> from dtoolkit.transformer import GeoKMeans
>>> X = [
...     [113.615822, 37.844797],
...     [113.586288, 37.917018],
...     [113.630711, 37.865369],
...     [113.590684, 37.948056],
...     [113.631483, 37.862634],
...     [113.57413, 37.968669],
...     [113.663159, 37.848446],
...     [113.586941, 37.868116],
...     [113.679381, 37.875028],
...     [113.5706, 37.973542],
...     [113.585504, 37.879261],
...     [113.584412, 37.935521],
...     [113.575964, 37.906472],
...     [113.593658, 37.848911],
...     [113.633605, 37.869107],
...     [113.582298, 37.857025],
...     [113.629378, 37.805196],
...     [113.48768, 37.872603],
...     [113.477766, 37.868846],
... ]
>>> geokmeans = GeoKMeans(n_clusters=2, random_state=0).fit(X)
>>> geokmeans.labels_
array([0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0],
      dtype=int32)
>>> geokmeans.cluster_centers_
array([[113.59979892,  37.85887223],
       [113.58034633,  37.94154633]])