dtoolkit.accessor.dataframe.weighted_mean#
- dtoolkit.accessor.dataframe.weighted_mean(df: DataFrame, /, weights: list[int | float] | dict[Hashable, int | float | dict[Hashable, int | float]] | Series, validate: bool = False, top: int | float = 1, drop: bool = False) Series | DataFrame [source]#
Calculate the weighted score of selected columns in the DataFrame.
The weighted score is the sum of the values in the DataFrame multiplied by the weights. A sugar syntax wraps:
(df * weights).sum(axis=1) / sum(weights)
- Parameters:
- weightslist, dict or Series
The weights of each column in the DataFrame.
list : The weights of each column in the DataFrame.
dict : Receive like
{column: score}
or{new_column: {column: score}}
.Series : The weights must be a series with the same index as the DataFrame.
- validatebool, default False
If True, require the sum of
weights
values equal to 1.- topint, default 1
If
validate
is True, require the sum ofweights
values equal totop
.- dropbool, default False
If True, drop the used columns.
- Returns:
- DataFrame or Series
- Raises:
- TypeError
If one of the
weights
values is not number type.If
weights
is not a list, a dict or a Series type.If
weights
is a dict and the value is not a number or a dict type.
- ValueError
If
weights
is list type and its length is not the same as the number of DataFrame columns.If
weights
is Series type and its labels are not in the DataFrame columns.If
weights
is Series type and its labels are duplicated.If
validate=True
and the sum ofweights
values is not equal to 1.
See also
pandas.DataFrame.mean
Calculate the mean of the DataFrame.
Examples
>>> import dtoolkit >>> import pandas as pd >>> df = pd.DataFrame({"a": [1, 1], "b": [2, 2], "c": [4, 4]}) >>> df a b c 0 1 2 4 1 1 2 4
Select all columns to calculate the score.
>>> df.weighted_mean([0, 5, 5]) 0 3.0 1 3.0 dtype: float64
Select some of columns and calculate the score.
>>> df.weighted_mean({'b': 5, 'c': 5}) 0 3.0 1 3.0 dtype: float64
Keep the original columns.
>>> df.weighted_mean({"bc": {'b': 5, 'c': 5}}) a b bc c 0 1 2 3.0 4 1 1 2 3.0 4
While
weights
is a dict and its values are also dict, it could use new generated columns to generate the score.>>> df.weighted_mean( ... { ... "ab": {"a": 1, "b": 1}, ... "bc": {"b": 1, "c": 1}, ... "ab-bc": {"ab": 1, "bc": 1}, # 'ab' and 'bc' are new generated columns ... } ... ) a ab ab-bc b bc c 0 1 1.5 2.25 2 3.0 4 1 1 1.5 2.25 2 3.0 4