dtoolkit.transformer.OneHotEncoder#

class dtoolkit.transformer.OneHotEncoder(*, sparse: bool = False, sparse_output: bool = False, categories_with_parent: bool = False, categories='auto', drop=None, dtype=<class 'numpy.float64'>, handle_unknown: ~typing.Literal['error', 'ignore', 'infrequent_if_exist'] = 'error', min_frequency: int | float = None, max_categories: int = None)[source]#

Encode categorical features as a one-hot numeric array.

Parameters
categories_with_parentbool, default False

Returned column would hook parent labels if True else would be categories.

sparsebool, default False

Will return sparse matrix if True else will return an array.

Other parameters

See sklearn.preprocessing.OneHotEncoder.

Notes

Different to sklearn.preprocessing.OneHotEncoder. The result would return a DataFrame which uses categories as columns.

Examples

Given a dataset with two features, we let the encoder find the unique values per feature and transform the data to a binary one-hot encoding.

DataFrame in, DataFrame out with categories as columns.

>>> from dtoolkit.transformer import OneHotEncoder
>>> import pandas as pd
>>> X = [['Male', 1], ['Female', 3], ['Female', 2]]
>>> df = pd.DataFrame(X, columns=['gender', 'number'])
>>> df
    gender  number
0    Male       1
1  Female       3
2  Female       2
>>> enc = OneHotEncoder()
>>> enc.fit_transform(df)
   Female  Male    1    2    3
0     0.0   1.0  1.0  0.0  0.0
1     1.0   0.0  0.0  0.0  1.0
2     1.0   0.0  0.0  1.0  0.0

The encoded data also could hook parent labels.

>>> enc = OneHotEncoder(categories_with_parent=True)
>>> enc.fit_transform(df)
   gender_Female  gender_Male  number_1  number_2  number_3
0            0.0          1.0       1.0       0.0       0.0
1            1.0          0.0       0.0       0.0       1.0
2            1.0          0.0       0.0       1.0       0.0
Attributes
infrequent_categories_

Infrequent categories for each feature.

Methods

fit(X[, y])

Fit OneHotEncoder to X.

fit_transform(X[, y])

Fit to data, then transform it.

get_feature_names_out([input_features])

Get output feature names for transformation.

get_params([deep])

Get parameters for this estimator.

inverse_transform(X)

Convert the data back to the original representation.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform X using one-hot encoding.