dtoolkit.accessor.dataframe.weighted_mean#

Calculate the weighted score of selected columns in the DataFrame.

The weighted score is the sum of the values in the DataFrame multiplied by the weights. A sugar syntax wraps:

(df * weights).sum(axis=1) / sum(weights)

Parameters:

weightslist, dict or Series

The weights of each column in the DataFrame.

list : The weights of each column in the DataFrame.
dict : Receive like {column: score} or {new_column: {column: score}}.
Series : The weights must be a series with the same index as the DataFrame.

validatebool, default False

If True, require the sum of weights values equal to 1.

topint, default 1

If validate is True, require the sum of weights values equal to top.

dropbool, default False

If True, drop the used columns.

Returns:

DataFrame or Series

Raises:

TypeError

If one of the weights values is not number type.
If weights is not a list, a dict or a Series type.
If weights is a dict and the value is not a number or a dict type.

ValueError

If weights is list type and its length is not the same as the number of DataFrame columns.
If weights is Series type and its labels are not in the DataFrame columns.
If weights is Series type and its labels are duplicated.
If validate=True and the sum of weights values is not equal to 1.

See also

pandas.DataFrame.mean: Calculate the mean of the DataFrame.

Examples

>>> import dtoolkit
>>> import pandas as pd
>>> df = pd.DataFrame({"a": [1, 1], "b": [2, 2], "c": [4, 4]})
>>> df
   a  b  c
0  1  2  4
1  1  2  4

Select all columns to calculate the score.

>>> df.weighted_mean([0, 5, 5])
0    3.0
1    3.0
dtype: float64

Select some of columns and calculate the score.

>>> df.weighted_mean({'b': 5, 'c': 5})
0    3.0
1    3.0
dtype: float64

Keep the original columns.

>>> df.weighted_mean({"bc": {'b': 5, 'c': 5}})
   a  b   bc  c
0  1  2  3.0  4
1  1  2  3.0  4

While weights is a dict and its values are also dict, it could use new generated columns to generate the score.

>>> df.weighted_mean(
...     {
...         "ab": {"a": 1, "b": 1},
...         "bc": {"b": 1, "c": 1},
...         "ab-bc": {"ab": 1, "bc": 1},  # 'ab' and 'bc' are new generated columns
...     }
... )
   a   ab  ab-bc  b   bc  c
0  1  1.5   2.25  2  3.0  4
1  1  1.5   2.25  2  3.0  4