dtoolkit.transformer.OneHotEncoder#
- class dtoolkit.transformer.OneHotEncoder(*, sparse: bool = False, sparse_output: bool = False, categories_with_parent: bool = False, categories='auto', drop=None, dtype=<class 'numpy.float64'>, handle_unknown: ~typing.Literal['error', 'ignore', 'infrequent_if_exist'] = 'error', min_frequency: int | float = None, max_categories: int = None)[source]#
Encode categorical features as a one-hot numeric array.
- Parameters
- categories_with_parentbool, default False
Returned column would hook parent labels if
Trueelse would becategories.- sparsebool, default False
Will return sparse matrix if
Trueelse will return an array.- Other parameters
Notes
Different to
sklearn.preprocessing.OneHotEncoder. The result would return aDataFramewhich uses categories as columns.Examples
Given a dataset with two features, we let the encoder find the unique values per feature and transform the data to a binary one-hot encoding.
DataFramein,DataFrameout with categories as columns.>>> from dtoolkit.transformer import OneHotEncoder >>> import pandas as pd >>> X = [['Male', 1], ['Female', 3], ['Female', 2]] >>> df = pd.DataFrame(X, columns=['gender', 'number']) >>> df gender number 0 Male 1 1 Female 3 2 Female 2 >>> enc = OneHotEncoder() >>> enc.fit_transform(df) Female Male 1 2 3 0 0.0 1.0 1.0 0.0 0.0 1 1.0 0.0 0.0 0.0 1.0 2 1.0 0.0 0.0 1.0 0.0
The encoded data also could hook parent labels.
>>> enc = OneHotEncoder(categories_with_parent=True) >>> enc.fit_transform(df) gender_Female gender_Male number_1 number_2 number_3 0 0.0 1.0 1.0 0.0 0.0 1 1.0 0.0 0.0 0.0 1.0 2 1.0 0.0 0.0 1.0 0.0
- Attributes
infrequent_categories_Infrequent categories for each feature.
Methods
fit(X[, y])Fit OneHotEncoder to X.
fit_transform(X[, y])Fit to data, then transform it.
get_feature_names_out([input_features])Get output feature names for transformation.
get_params([deep])Get parameters for this estimator.
Convert the data back to the original representation.
set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
transform(X)Transform X using one-hot encoding.