dtoolkit.transformer.GeoKMeans#

class dtoolkit.transformer.GeoKMeans(n_clusters=8, *, init='k-means++', n_init='warn', max_iter=300, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='elkan')[source]#

Spatial K-Means clustering.

The distance is calculated by haversine formula. Parameters and attributes are the same as sklearn.cluster.KMeans.

Raises:
ValueError

If the input is not in the form of [(longitude, latitude)].

See also

sklearn.cluster.KMeans

Original implementation of K-Means clustering.

Notes

algorithm is fixed to "elkan". Because only elkan algorithm can support custom distance.

Examples

>>> from dtoolkit.transformer import GeoKMeans
>>> X = [
...     [113.615822, 37.844797],
...     [113.586288, 37.917018],
...     [113.630711, 37.865369],
...     [113.590684, 37.948056],
...     [113.631483, 37.862634],
...     [113.57413, 37.968669],
...     [113.663159, 37.848446],
...     [113.586941, 37.868116],
...     [113.679381, 37.875028],
...     [113.5706, 37.973542],
...     [113.585504, 37.879261],
...     [113.584412, 37.935521],
...     [113.575964, 37.906472],
...     [113.593658, 37.848911],
...     [113.633605, 37.869107],
...     [113.582298, 37.857025],
...     [113.629378, 37.805196],
...     [113.48768, 37.872603],
...     [113.477766, 37.868846],
... ]
>>> geokmeans = GeoKMeans(n_clusters=2, random_state=0, n_init="auto").fit(X)
>>> geokmeans.labels_
array([1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0],
      dtype=int32)
>>> geokmeans.cluster_centers_
array([[113.5559405 ,  37.92384087],
       [113.62108545,  37.85671727]])

Methods

fit(X[, y, sample_weight])

Compute k-means clustering.

fit_predict(X[, y, sample_weight])

Compute cluster centers and predict cluster index for each sample.

fit_transform(X[, y, sample_weight])

Compute clustering and transform X to cluster-distance space.

get_feature_names_out([input_features])

Get output feature names for transformation.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X[, sample_weight])

Predict the closest cluster each sample in X belongs to.

score(X[, y, sample_weight])

Opposite of the value of X on the K-means objective.

set_fit_request(*[, sample_weight])

Request metadata passed to the fit method.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

set_predict_request(*[, sample_weight])

Request metadata passed to the predict method.

set_score_request(*[, sample_weight])

Request metadata passed to the score method.

transform(X)

Transform X to a cluster-distance space.