dtoolkit.accessor.dataframe.top_n#

dtoolkit.accessor.dataframe.top_n(df: DataFrame, /, n: int = 5, largest: bool = True, keep: Literal['first', 'last', 'all'] = 'first', prefix: str = 'top', delimiter: str = '_', element: Literal['index', 'value', 'both'] = 'index') → DataFrame[source]#

Returns each row’s top n.

Parameters:

nint, default 5

Number of top to return.

largestbool, default True

True, the top is the largest.
False, the top is the smallest.

keep{“first”, “last”, “all”}, default “first”

Where there are duplicate values:

first : prioritize the first occurrence(s).
last : prioritize the last occurrence(s).
all : do not drop any duplicates, even it means selecting more than n items.

prefixstr, default “top”

The prefix name of the new DataFrame column.

delimiterstr, default “_”

The delimiter between prefix and number.

element{“index”, “value”, “both”}, default “index”

To control the structure of return dataframe value.

index: the structure of value is only {column index}.
value: the structure of value is only {value}.
both: the structure of value is ({column index}, {value}).

Returns:

DataFrame

The structure of column name is {prefix}{delimiter}{number}.
The default structure of value is {column index} and could be controlled via element.

Raises:

ValueError: If element isn’t “both”, “index” or “value”.

See also

dtoolkit.accessor.dataframe.expand: Transform each element of a list-like to a column.

Notes

This method could be called via df.top_n or df.topn.
Q & A
Q: Any different to nlargest() and nsmallest()?

A: nlargest() and nsmallest() base one column to return all selected columns dataframe top n.

Examples

>>> import dtoolkit
>>> import pandas as pd
>>> df = pd.DataFrame(
...     {
...         "a": [1, 3, 2, 1],
...         "b": [3, 2, 1, 1],
...         "c": [2, 1, 3, 1],
...     },
... )
>>> df
   a  b  c
0  1  3  2
1  3  2  1
2  2  1  3
3  1  1  1

Get each row’s largest top 2, sorts values and returns index.

>>> df.top_n(2)
    top_1   top_2
0       b       c
1       a       b
2       c       a
3       a       b

Get each row’s largest top 2, sorts values and returns values.

>>> df.top_n(2, element="value")
   top_1  top_2
0      3      2
1      3      2
2      3      2
3      1      1

Get each row’s both index and value of largest top 2, sorts values and return both index and values.

>>> df.top_n(2, element="both")
    top_1   top_2
0  (b, 3)  (c, 2)
1  (a, 3)  (b, 2)
2  (c, 3)  (a, 2)
3  (a, 1)  (b, 1)

Get each row’s smallest top 1 and keep the duplicated values.

>>> df.top_n(1, largest=False, keep="all")
    top_1   top_2   top_3
0       a     NaN     NaN
1       c     NaN     NaN
2       b     NaN     NaN
3       a       b       c