dtoolkit.accessor.dataframe.decompose#

dtoolkit.accessor.dataframe.decompose(df: pd.DataFrame, /, method: TransformerMixin, columns: None | dict[Hashable | tuple[Hashable], Hashable | list[Hashable] | tuple[Hashable]] | list[Hashable] | pd.Index = None, drop: bool = False, **kwargs) pd.DataFrame[source]#

Decompose DataFrame’s columns.

Parameters:
methodTransformerMixin

Decomposition transformer.

columnsdict, Series, list, tuple or None, default None

Choose columns to decompose.

  • None : Decompose all columns.

  • list or Index : Decompose the selected columns.

  • dict : Decompose and remap columns to a few, {new columns: old columns}.

dropbool, default False

If True, drop the used columns when columns is dict.

**kwargs

See the documentation for method for complete details on the keyword arguments.

Returns:
DataFrame
Raises:
ValueError

If the number of rows is less than the number of columns.

See also

sklearn.decomposition

Scikit-learn’s matrix decomposition transformer.

Examples

>>> import dtoolkit
>>> import pandas as pd
>>> from sklearn import decomposition
>>> df = pd.DataFrame(
...     [
...         [-1, -1, 1, 1],
...         [-2, -1, 2, 1],
...         [-3, -2, 3, 2],
...         [1, 1, -1, -1],
...         [2, 1, -2, -1],
...         [3, 2, -3, -2],
...     ],
...     columns=["a", "b", "c", "d"],
... )
>>> df
   a  b  c  d
0 -1 -1  1  1
1 -2 -1  2  1
2 -3 -2  3  2
3  1  1 -1 -1
4  2  1 -2 -1
5  3  2 -3 -2

Decompose all columns.

>>> df.decompose(decomposition.PCA)  
          a         b             c             d
0  1.956431  0.415183  9.009015e-17  8.100537e-18
1  3.142238 -0.355441  8.394617e-17  9.817066e-18
2  5.098670  0.059742 -8.445140e-17  1.640353e-19
3 -1.956431 -0.415183 -7.881266e-17  8.428608e-18
4 -3.142238  0.355441 -8.495664e-17  1.014514e-17
5 -5.098670 -0.059742  8.445140e-17 -1.640353e-19

Decompose the selected columns.

>>> df.decompose(decomposition.PCA, ["a", "b"])  
          a         b  c  d
0  1.383406  0.293579  1  1
1  2.221898 -0.251335  2  1
2  3.605304  0.042244  3  2
3 -1.383406 -0.293579 -1 -1
4 -2.221898  0.251335 -2 -1
5 -3.605304 -0.042244 -3 -2

Decompose and remap columns to a few.

>>> df.decompose(
...     decomposition.PCA,
...     {"A": ["a", "b"], "B": ["b", "c", "d"]},
... )  
          A         B  a  b  c  d
0  1.383406  1.694316 -1 -1  1  1
1  2.221898  2.428593 -2 -1  2  1
2  3.605304  4.122909 -3 -2  3  2
3 -1.383406 -1.694316  1  1 -1 -1
4 -2.221898 -2.428593  2  1 -2 -1
5 -3.605304 -4.122909  3  2 -3 -2
>>> df.decompose(
...     decomposition.PCA,
...     {("A", "B"): ["a", "b", "c"]}
... )  
          A         B  a  b  c  d
0  1.702037  0.321045 -1 -1  1  1
1  2.988071 -0.267273 -2 -1  2  1
2  4.690108  0.053773 -3 -2  3  2
3 -1.702037 -0.321045  1  1 -1 -1
4 -2.988071  0.267273  2  1 -2 -1
5 -4.690108 -0.053773  3  2 -3 -2