dtoolkit.transformer.OneHotEncoder#
- class dtoolkit.transformer.OneHotEncoder(*, sparse_output: bool = False, categories_with_parent: bool = False, categories='auto', drop=None, dtype=<class 'numpy.float64'>, handle_unknown: ~typing.Literal['error', 'ignore', 'infrequent_if_exist'] = 'error', min_frequency: int | float = None, max_categories: int = None)[source]#
Encode categorical features as a one-hot numeric array.
- Parameters:
- categories_with_parentbool, default False
Returned column would hook parent labels if
True
else would becategories
.- sparse_outbool, default False
Will return sparse matrix if
True
else will return an array.- Other parameters
Notes
Different to
sklearn.preprocessing.OneHotEncoder
. The result would return aDataFrame
which uses categories as columns.Examples
Given a dataset with two features, we let the encoder find the unique values per feature and transform the data to a binary one-hot encoding.
DataFrame
in,DataFrame
out with categories as columns.>>> from dtoolkit.transformer import OneHotEncoder >>> import pandas as pd >>> X = [['Male', 1], ['Female', 3], ['Female', 2]] >>> df = pd.DataFrame(X, columns=['gender', 'number']) >>> df gender number 0 Male 1 1 Female 3 2 Female 2 >>> enc = OneHotEncoder() >>> enc.fit_transform(df) Female Male 1 2 3 0 0.0 1.0 1.0 0.0 0.0 1 1.0 0.0 0.0 0.0 1.0 2 1.0 0.0 0.0 1.0 0.0
The encoded data also could hook parent labels.
>>> enc = OneHotEncoder(categories_with_parent=True) >>> enc.fit_transform(df) gender_Female gender_Male number_1 number_2 number_3 0 0.0 1.0 1.0 0.0 0.0 1 1.0 0.0 0.0 0.0 1.0 2 1.0 0.0 0.0 1.0 0.0
- Attributes:
infrequent_categories_
Infrequent categories for each feature.
Methods
fit
(X[, y])Fit OneHotEncoder to X.
fit_transform
(X[, y])Fit to data, then transform it.
get_feature_names_out
([input_features])Get output feature names for transformation.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
Convert the data back to the original representation.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform X using one-hot encoding.