dtoolkit.accessor.series.textdistance_matrix#
- dtoolkit.accessor.series.textdistance_matrix(s: Series, /, other: None | Series = None, method: Callable = None, **kwargs) DataFrame [source]#
Returns a
DataFrame
containing the text distances matrix between ins
andother
.- Parameters:
- otherNone or Series, default None
If None, use
s
.- methodCallable, default None
The method to calculate the distance. The first and second positional parameters will be compared. If None,
rapidfuzz.fuzz.ratio()
. Recommended use methods inrapidfuzz.fuzz
, andrapidfuzz.distance
.- **kwargs
Additional keyword arguments passed to
method
.
- Returns:
- DataFrame
The values are the text distances.
- Raises:
- ModuleNotFoundError
If don’t have module named ‘rapidfuzz’.
- TypeError
If
s
is not string dtype.If
other
is not string dtype.
See also
rapidfuzz.fuzz
rapidfuzz.distance
textdistance
Notes
The result of comparing to None or nan value is depended on the
method
.Examples
>>> import dtoolkit >>> import pandas as pd >>> s = pd.Series(["hello", "world"]) >>> s 0 hello 1 world dtype: object >>> s.textdistance_matrix(pd.Series(["hello", "python"])) 0 1 0 100.0 36.363636 1 20.0 18.181818