dtoolkit.transformer.GeoKMeans#

class dtoolkit.transformer.GeoKMeans(n_clusters=8, *, init='k-means++', n_init='warn', max_iter=300, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='elkan')[source]#

Spatial K-Means clustering.

The distance is calculated by haversine formula. Parameters and attributes are the same as sklearn.cluster.KMeans.

Raises:

ValueError: If the input is not in the form of [(longitude, latitude)].

See also

sklearn.cluster.KMeans: Original implementation of K-Means clustering.

Notes

algorithm is fixed to "elkan". Because only elkan algorithm can support custom distance.

Examples

>>> from dtoolkit.transformer import GeoKMeans
>>> X = [
...     [113.615822, 37.844797],
...     [113.586288, 37.917018],
...     [113.630711, 37.865369],
...     [113.590684, 37.948056],
...     [113.631483, 37.862634],
...     [113.57413, 37.968669],
...     [113.663159, 37.848446],
...     [113.586941, 37.868116],
...     [113.679381, 37.875028],
...     [113.5706, 37.973542],
...     [113.585504, 37.879261],
...     [113.584412, 37.935521],
...     [113.575964, 37.906472],
...     [113.593658, 37.848911],
...     [113.633605, 37.869107],
...     [113.582298, 37.857025],
...     [113.629378, 37.805196],
...     [113.48768, 37.872603],
...     [113.477766, 37.868846],
... ]
>>> geokmeans = GeoKMeans(n_clusters=2, random_state=0, n_init="auto").fit(X)
>>> geokmeans.labels_
array([1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0],
      dtype=int32)
>>> geokmeans.cluster_centers_
array([[113.5559405 ,  37.92384087],
       [113.62108545,  37.85671727]])

Methods

`fit`(X[, y, sample_weight])	Compute k-means clustering.
`fit_predict`(X[, y, sample_weight])	Compute cluster centers and predict cluster index for each sample.
`fit_transform`(X[, y, sample_weight])	Compute clustering and transform X to cluster-distance space.
`get_feature_names_out`([input_features])	Get output feature names for transformation.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`predict`(X[, sample_weight])	Predict the closest cluster each sample in X belongs to.
`score`(X[, y, sample_weight])	Opposite of the value of X on the K-means objective.
`set_fit_request`(*[, sample_weight])	Request metadata passed to the `fit` method.
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.
`set_predict_request`(*[, sample_weight])	Request metadata passed to the `predict` method.
`set_score_request`(*[, sample_weight])	Request metadata passed to the `score` method.
`transform`(X)	Transform X to a cluster-distance space.