dtoolkit.transformer.OneHotEncoder#

class dtoolkit.transformer.OneHotEncoder(*, sparse_output: bool = False, categories_with_parent: bool = False, categories='auto', drop=None, dtype=<class 'numpy.float64'>, handle_unknown: ~typing.Literal['error', 'ignore', 'infrequent_if_exist'] = 'error', min_frequency: int | float = None, max_categories: int = None)[source]#

Encode categorical features as a one-hot numeric array.

Parameters:
categories_with_parentbool, default False

Returned column would hook parent labels if True else would be categories.

sparse_outbool, default False

Will return sparse matrix if True else will return an array.

Other parameters

See sklearn.preprocessing.OneHotEncoder.

Notes

Different to sklearn.preprocessing.OneHotEncoder. The result would return a DataFrame which uses categories as columns.

Examples

Given a dataset with two features, we let the encoder find the unique values per feature and transform the data to a binary one-hot encoding.

DataFrame in, DataFrame out with categories as columns.

>>> from dtoolkit.transformer import OneHotEncoder
>>> import pandas as pd
>>> X = [['Male', 1], ['Female', 3], ['Female', 2]]
>>> df = pd.DataFrame(X, columns=['gender', 'number'])
>>> df
    gender  number
0    Male       1
1  Female       3
2  Female       2
>>> enc = OneHotEncoder()
>>> enc.fit_transform(df)
   Female  Male    1    2    3
0     0.0   1.0  1.0  0.0  0.0
1     1.0   0.0  0.0  0.0  1.0
2     1.0   0.0  0.0  1.0  0.0

The encoded data also could hook parent labels.

>>> enc = OneHotEncoder(categories_with_parent=True)
>>> enc.fit_transform(df)
   gender_Female  gender_Male  number_1  number_2  number_3
0            0.0          1.0       1.0       0.0       0.0
1            1.0          0.0       0.0       0.0       1.0
2            1.0          0.0       0.0       1.0       0.0
Attributes:
infrequent_categories_

Infrequent categories for each feature.

Methods

fit(X[, y])

Fit OneHotEncoder to X.

fit_transform(X[, y])

Fit to data, then transform it.

get_feature_names_out([input_features])

Get output feature names for transformation.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

inverse_transform(X)

Convert the data back to the original representation.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform X using one-hot encoding.