dtoolkit.accessor.dataframe.filter_in#

dtoolkit.accessor.dataframe.filter_in(df: DataFrame, condition: Union[Iterable, Series, DataFrame, dict[Hashable, list[Hashable]]], /, how: Literal['any', 'all'] = 'all', complement: bool = False) DataFrame[source]#

Filter DataFrame contents.

Similar to isin(), but the return is not bool.

Parameters
conditionIterable, Series, DataFrame or dict

The filtered result is based on this specific condition.

  • If condition is a dict, the keys must be the column names, which must be matched. And how only works on these gave keys.

  • If condition is a Series, that’s the index.

  • If condition is a DataFrame, then both the index and column labels must be matched.

how{‘any’, ‘all’}, default ‘all’

Determine whether the row is filtered from DataFrame, when there have at least one value or all value.

  • ‘any’ : If any values are present, filter that rows.

  • ‘all’ : If all values are present, filter that rows.

complementbool, default is False

If True, do operation reversely.

Returns
DataFrame

See also

pandas.DataFrame.isin

Whether each element in the DataFrame is contained in values.

pandas.DataFrame.filter

Subset the dataframe rows or columns according to the specified index labels.

dtoolkit.accessor.series.filter_in

Filter Series contents.

Examples

>>> import dtoolkit
>>> import pandas as pd
>>> df = pd.DataFrame(
...     {
...         'legs': [2, 4, 2],
...         'wings': [2, 0, 0],
...     },
...     index=['falcon', 'dog', 'cat'],
... )
>>> df
        legs  wings
falcon     2      2
dog        4      0
cat        2      0

When condition is a list check whether every value in the DataFrame is present in the list (which animals have 0 or 2 legs or wings).

Filter rows.

>>> df.filter_in([0, 2])
        legs  wings
falcon     2      2
cat        2      0

Filter any row doesn’t contain 0 or 2.

>>> df.filter_in([0, 2], how="any", complement=True)
        legs  wings
dog        4      0

When condition is a dict, we can pass values to check for each column separately.

>>> df.filter_in({'legs': [2], 'wings': [2]})
        legs  wings
falcon     2      2

When values is a Series or DataFrame the index and column must be matched. Note that ‘spider’ doesn’t match based on the number of legs in other.

>>> other = pd.DataFrame(
...     {
...         'legs': [8, 2],
...         'wings': [0, 2],
...     },
...     index=['spider', 'falcon'],
... )
>>> other
        legs  wings
spider     8      0
falcon     2      2
>>> df.filter_in(other)
        legs  wings
falcon     2      2