dtoolkit.accessor.series.textdistance#

dtoolkit.accessor.series.textdistance(s: Series, /, other: str | Series, method: Callable = None, align: bool = True, **kwargs) Series[source]#

Return a Series containing the text distance to aligned other.

Parameters:
otherNone, str or Series
alignbool, default True

If True, automatically aligns GeoSeries based on their indices. If False, the order of elements is preserved.

methodCallable, default None

The method to calculate the distance. The first and second positional parameters will be compared. If None, rapidfuzz.fuzz.ratio(). Recommended use methods in rapidfuzz.fuzz, and rapidfuzz.distance.

**kwargs

Additional keyword arguments passed to method.

Returns:
Series

The values are the text distances.

Raises:
ModuleNotFoundError

If don’t have module named ‘rapidfuzz’.

TypeError
  • If s is not string dtype.

  • If other is not string dtype.

ValueError

If other’s length is not equal to the length of s.

See also

rapidfuzz.fuzz
rapidfuzz.distance
textdistance_matrix

Notes

The result of comparing to None or nan value is depended on the method.

Examples

>>> import dtoolkit
>>> import pandas as pd
>>> s = pd.Series(["hello", "world"])
>>> s
0    hello
1    world
dtype: object
>>> s.textdistance("python")
0    36.363636
1    18.181818
dtype: float64
>>> s.textdistance(pd.Series(["hello", "python"]))
0    100.000000
1     18.181818
dtype: float64