dtoolkit.accessor.series.textdistance_matrix#

dtoolkit.accessor.series.textdistance_matrix(s: Series, /, other: None | Series = None, method: Callable = None, **kwargs) DataFrame[source]#

Returns a DataFrame containing the text distances matrix between in s and other.

Parameters:
otherNone or Series, default None

If None, use s.

methodCallable, default None

The method to calculate the distance. The first and second positional parameters will be compared. If None, rapidfuzz.fuzz.ratio(). Recommended use methods in rapidfuzz.fuzz, and rapidfuzz.distance.

**kwargs

Additional keyword arguments passed to method.

Returns:
DataFrame

The values are the text distances.

Raises:
ModuleNotFoundError

If don’t have module named ‘rapidfuzz’.

TypeError
  • If s is not string dtype.

  • If other is not string dtype.

See also

rapidfuzz.fuzz
rapidfuzz.distance
textdistance

Notes

The result of comparing to None or nan value is depended on the method.

Examples

>>> import dtoolkit
>>> import pandas as pd
>>> s = pd.Series(["hello", "world"])
>>> s
0    hello
1    world
dtype: object
>>> s.textdistance_matrix(pd.Series(["hello", "python"]))
       0          1
0  100.0  36.363636
1   20.0  18.181818