dtoolkit.accessor.dataframe.fillna_regression#

dtoolkit.accessor.dataframe.fillna_regression(df: pd.DataFrame, /, method: RegressorMixin, columns: dict[Hashable, Hashable | list[Hashable] | pd.Index], how: Literal['na', 'all'] = 'na', **kwargs) → pd.DataFrame[source]#

Fill na value with regression algorithm.

Parameters:

methodRegressorMixin: Regression transformer.
columnsdict, {y: X}: A series of column names pairs. The key is the y (or target) column name, and values are X (or feature) column names.
how{‘na’, ‘all’}, default ‘na’: Only fill na value or apply regression to entire target column.
**kwargs: See the documentation for method for complete details on the keyword arguments.

Returns:

DataFrame

Raises:

ValueError: If how isn’t “na” or “all”.

See also

Examples

>>> import dtoolkit
>>> import pandas as pd
>>> from sklearn.linear_model import LinearRegression

\[y = 1 \times x_0 + 2 \times x_1 + 3\]

>>> df = pd.DataFrame(
...     [
...         [1, 1, 6],
...         [1, 2, 8],
...         [2, 2, 9],
...         [2, 3, 11],
...         [3, 5, None],
...     ],
...     columns=['x1', 'x2', 'y'],
... )
>>> df
   x1  x2     y
0   1   1   6.0
1   1   2   8.0
2   2   2   9.0
3   2   3  11.0
4   3   5   NaN

Use ‘x1’ and ‘x2’ columns to fit ‘y’ column and fill the value.

>>> df.fillna_regression(LinearRegression, {'y': ['x1', 'x2']})
   x1  x2     y
 1   1   6.0
 1   2   8.0
 2   2   9.0
 2   3  11.0
 3   5  16.0