dtoolkit.accessor.dataframe.weighted_mean#
- dtoolkit.accessor.dataframe.weighted_mean(df: DataFrame, /, weights: list[int | float] | dict[Hashable, int | float | dict[Hashable, int | float]] | Series, validate: bool = False, top: int | float = 1, drop: bool = False) Series | DataFrame[source]#
Calculate the weighted score of selected columns in the DataFrame.
The weighted score is the sum of the values in the DataFrame multiplied by the weights. A sugar syntax wraps:
(df * weights).sum(axis=1) / sum(weights)
- Parameters:
- weightslist, dict or Series
The weights of each column in the DataFrame.
list : The weights of each column in the DataFrame.
dict : Receive like
{column: score}or{new_column: {column: score}}.Series : The weights must be a series with the same index as the DataFrame.
- validatebool, default False
If True, require the sum of
weightsvalues equal to 1.- topint, default 1
If
validateis True, require the sum ofweightsvalues equal totop.- dropbool, default False
If True, drop the used columns.
- Returns:
- DataFrame or Series
- Raises:
- TypeError
If one of the
weightsvalues is not number type.If
weightsis not a list, a dict or a Series type.If
weightsis a dict and the value is not a number or a dict type.
- ValueError
If
weightsis list type and its length is not the same as the number of DataFrame columns.If
weightsis Series type and its labels are not in the DataFrame columns.If
weightsis Series type and its labels are duplicated.If
validate=Trueand the sum ofweightsvalues is not equal to 1.
See also
pandas.DataFrame.meanCalculate the mean of the DataFrame.
Examples
>>> import dtoolkit >>> import pandas as pd >>> df = pd.DataFrame({"a": [1, 1], "b": [2, 2], "c": [4, 4]}) >>> df a b c 0 1 2 4 1 1 2 4
Select all columns to calculate the score.
>>> df.weighted_mean([0, 5, 5]) 0 3.0 1 3.0 dtype: float64
Select some of columns and calculate the score.
>>> df.weighted_mean({'b': 5, 'c': 5}) 0 3.0 1 3.0 dtype: float64
Keep the original columns.
>>> df.weighted_mean({"bc": {'b': 5, 'c': 5}}) bc a b c 0 3.0 1 2 4 1 3.0 1 2 4
While
weightsis a dict and its values are also dict, it could use new generated columns to generate the score.>>> df.weighted_mean( ... { ... "ab": {"a": 1, "b": 1}, ... "bc": {"b": 1, "c": 1}, ... "ab-bc": {"ab": 1, "bc": 1}, # 'ab' and 'bc' are new generated columns ... } ... ) ab bc ab-bc a b c 0 1.5 3.0 2.25 1 2 4 1 1.5 3.0 2.25 1 2 4