dtoolkit.accessor.dataframe.fillna_regression#

dtoolkit.accessor.dataframe.fillna_regression(df: pd.DataFrame, /, method: RegressorMixin, columns: dict[Hashable, Hashable | list[Hashable] | pd.Index], how: Literal['na', 'all'] = 'na', **kwargs) pd.DataFrame[source]#

Fill na value with regression algorithm.

Parameters:
methodRegressorMixin

Regression transformer.

columnsdict, {y: X}

A series of column names pairs. The key is the y (or target) column name, and values are X (or feature) column names.

how{‘na’, ‘all’}, default ‘na’

Only fill na value or apply regression to entire target column.

**kwargs

See the documentation for method for complete details on the keyword arguments.

Returns:
DataFrame
Raises:
ValueError

If how isn’t “na” or “all”.

Examples

>>> import dtoolkit
>>> import pandas as pd
>>> from sklearn.linear_model import LinearRegression
\[y = 1 \times x_0 + 2 \times x_1 + 3\]
>>> df = pd.DataFrame(
...     [
...         [1, 1, 6],
...         [1, 2, 8],
...         [2, 2, 9],
...         [2, 3, 11],
...         [3, 5, None],
...     ],
...     columns=['x1', 'x2', 'y'],
... )
>>> df
   x1  x2     y
0   1   1   6.0
1   1   2   8.0
2   2   2   9.0
3   2   3  11.0
4   3   5   NaN

Use ‘x1’ and ‘x2’ columns to fit ‘y’ column and fill the value.

>>> df.fillna_regression(LinearRegression, {'y': ['x1', 'x2']})
   x1  x2     y
0   1   1   6.0
1   1   2   8.0
2   2   2   9.0
3   2   3  11.0
4   3   5  16.0