dtoolkit.accessor.dataframe.top_n#
- dtoolkit.accessor.dataframe.top_n(df: DataFrame, /, n: int = 5, largest: bool = True, keep: Literal['first', 'last', 'all'] = 'first', prefix: str = 'top', delimiter: str = '_', element: Literal['index', 'value', 'both'] = 'index') DataFrame [source]#
Returns each row’s top n.
- Parameters:
- nint, default 5
Number of top to return.
- largestbool, default True
True, the top is the largest.
False, the top is the smallest.
- keep{“first”, “last”, “all”}, default “first”
Where there are duplicate values:
first : prioritize the first occurrence(s).
last : prioritize the last occurrence(s).
all : do not drop any duplicates, even it means selecting more than n items.
- prefixstr, default “top”
The prefix name of the new DataFrame column.
- delimiterstr, default “_”
The delimiter between prefix and number.
- element{“index”, “value”, “both”}, default “index”
To control the structure of return dataframe value.
index: the structure of value is only
{column index}
.value: the structure of value is only
{value}
.both: the structure of value is
({column index}, {value})
.
- Returns:
- DataFrame
The structure of column name is
{prefix}{delimiter}{number}
.The default structure of value is
{column index}
and could be controlled viaelement
.
- Raises:
- ValueError
If
element
isn’t “both”, “index” or “value”.
See also
dtoolkit.accessor.dataframe.expand
Transform each element of a list-like to a column.
Notes
This method could be called via
df.top_n
ordf.topn
.- Q & A
Q: Any different to
nlargest()
andnsmallest()
?A:
nlargest()
andnsmallest()
base one column to return all selected columns dataframe top n.
Examples
>>> import dtoolkit >>> import pandas as pd >>> df = pd.DataFrame( ... { ... "a": [1, 3, 2, 1], ... "b": [3, 2, 1, 1], ... "c": [2, 1, 3, 1], ... }, ... ) >>> df a b c 0 1 3 2 1 3 2 1 2 2 1 3 3 1 1 1
Get each row’s largest top 2, sorts values and returns index.
>>> df.top_n(2) top_1 top_2 0 b c 1 a b 2 c a 3 a b
Get each row’s largest top 2, sorts values and returns values.
>>> df.top_n(2, element="value") top_1 top_2 0 3 2 1 3 2 2 3 2 3 1 1
Get each row’s both index and value of largest top 2, sorts values and return both index and values.
>>> df.top_n(2, element="both") top_1 top_2 0 (b, 3) (c, 2) 1 (a, 3) (b, 2) 2 (c, 3) (a, 2) 3 (a, 1) (b, 1)
Get each row’s smallest top 1 and keep the duplicated values.
>>> df.top_n(1, largest=False, keep="all") top_1 top_2 top_3 0 a NaN NaN 1 c NaN NaN 2 b NaN NaN 3 a b c