dtoolkit.accessor.dataframe.weighted_mean#

dtoolkit.accessor.dataframe.weighted_mean(df: DataFrame, /, weights: list[int | float] | dict[Hashable, int | float | dict[Hashable, int | float]] | Series, validate: bool = False, top: int | float = 1, drop: bool = False) Series | DataFrame[source]#

Calculate the weighted score of selected columns in the DataFrame.

The weighted score is the sum of the values in the DataFrame multiplied by the weights. A sugar syntax wraps:

(df * weights).sum(axis=1) / sum(weights)
Parameters:
weightslist, dict or Series

The weights of each column in the DataFrame.

  • list : The weights of each column in the DataFrame.

  • dict : Receive like {column: score} or {new_column: {column: score}}.

  • Series : The weights must be a series with the same index as the DataFrame.

validatebool, default False

If True, require the sum of weights values equal to 1.

topint, default 1

If validate is True, require the sum of weights values equal to top.

dropbool, default False

If True, drop the used columns.

Returns:
DataFrame or Series
Raises:
TypeError
  • If one of the weights values is not number type.

  • If weights is not a list, a dict or a Series type.

  • If weights is a dict and the value is not a number or a dict type.

ValueError
  • If weights is list type and its length is not the same as the number of DataFrame columns.

  • If weights is Series type and its labels are not in the DataFrame columns.

  • If weights is Series type and its labels are duplicated.

  • If validate=True and the sum of weights values is not equal to 1.

See also

pandas.DataFrame.mean

Calculate the mean of the DataFrame.

Examples

>>> import dtoolkit
>>> import pandas as pd
>>> df = pd.DataFrame({"a": [1, 1], "b": [2, 2], "c": [4, 4]})
>>> df
   a  b  c
0  1  2  4
1  1  2  4

Select all columns to calculate the score.

>>> df.weighted_mean([0, 5, 5])
0    3.0
1    3.0
dtype: float64

Select some of columns and calculate the score.

>>> df.weighted_mean({'b': 5, 'c': 5})
0    3.0
1    3.0
dtype: float64

Keep the original columns.

>>> df.weighted_mean({"bc": {'b': 5, 'c': 5}})
   a  b   bc  c
0  1  2  3.0  4
1  1  2  3.0  4

While weights is a dict and its values are also dict, it could use new generated columns to generate the score.

>>> df.weighted_mean(
...     {
...         "ab": {"a": 1, "b": 1},
...         "bc": {"b": 1, "c": 1},
...         "ab-bc": {"ab": 1, "bc": 1},  # 'ab' and 'bc' are new generated columns
...     }
... )
   a   ab  ab-bc  b   bc  c
0  1  1.5   2.25  2  3.0  4
1  1  1.5   2.25  2  3.0  4