dtoolkit.accessor.dataframe.filter_in#

dtoolkit.accessor.dataframe.filter_in(df: DataFrame, condition: Iterable | Series | DataFrame | dict[Hashable, list[Hashable]], /, how: Literal['any', 'all'] = 'all', complement: bool = False) DataFrame[source]#

Filter DataFrame contents.

Similar to isin(), but the return is not bool.

Parameters:
conditionIterable, Series, DataFrame or dict

The filtered result is based on this specific condition.

  • If condition is a dict, the keys must be the column names, which must be matched. And how only works on these gave keys.

  • If condition is a Series, that’s the index.

  • If condition is a DataFrame, then both the index and column labels must be matched.

how{‘any’, ‘all’}, default ‘all’

Determine whether the row is filtered from DataFrame, when there have at least one value or all value.

  • ‘any’ : If any values are present, filter that rows.

  • ‘all’ : If all values are present, filter that rows.

complementbool, default is False

If True, do operation reversely.

Returns:
DataFrame

See also

pandas.DataFrame.isin

Whether each element in the DataFrame is contained in values.

pandas.DataFrame.filter

Subset the dataframe rows or columns according to the specified index labels.

dtoolkit.accessor.series.filter_in

Filter Series contents.

Examples

>>> import dtoolkit
>>> import pandas as pd
>>> df = pd.DataFrame(
...     {
...         'legs': [2, 4, 2],
...         'wings': [2, 0, 0],
...     },
...     index=['falcon', 'dog', 'cat'],
... )
>>> df
        legs  wings
falcon     2      2
dog        4      0
cat        2      0

When condition is a list check whether every value in the DataFrame is present in the list (which animals have 0 or 2 legs or wings).

Filter rows.

>>> df.filter_in([0, 2])
        legs  wings
falcon     2      2
cat        2      0

Filter any row doesn’t contain 0 or 2.

>>> df.filter_in([0, 2], how="any", complement=True)
        legs  wings
dog        4      0

When condition is a dict, we can pass values to check for each column separately.

>>> df.filter_in({'legs': [2], 'wings': [2]})
        legs  wings
falcon     2      2

When values is a Series or DataFrame the index and column must be matched. Note that ‘spider’ doesn’t match based on the number of legs in other.

>>> other = pd.DataFrame(
...     {
...         'legs': [8, 2],
...         'wings': [0, 2],
...     },
...     index=['spider', 'falcon'],
... )
>>> other
        legs  wings
spider     8      0
falcon     2      2
>>> df.filter_in(other)
        legs  wings
falcon     2      2