dtoolkit.accessor.dataframe.decompose#

dtoolkit.accessor.dataframe.decompose(df: pd.DataFrame, /, method: TransformerMixin, columns: None | dict[Hashable | tuple[Hashable], Hashable | list[Hashable] | tuple[Hashable]] | list[Hashable] | pd.Index = None, drop: bool = False, **kwargs) pd.DataFrame[source]#

Decompose DataFrame’s columns.

Parameters
methodTransformerMixin

Decomposition transformer.

columnsdict, Series, list, tuple or None, default None

Choose columns to decompose.

  • None : Decompose all columns.

  • list or Index : Decompose the selected columns.

  • dict : Decompose and remap columns to a few, {new columns: old columns}.

dropbool, default False

If True, drop the used columns when columns is dict.

**kwargs

See the documentation for method for complete details on the keyword arguments.

Returns
DataFrame
Raises
ValueError

If the number of rows is less than the number of columns.

See also

sklearn.decomposition

Scikit-learn’s matrix decomposition transformer.

Examples

>>> import dtoolkit
>>> import pandas as pd
>>> from sklearn import decomposition
>>> df = pd.DataFrame(
...     [
...         [-1, -1, 1, 1],
...         [-2, -1, 2, 1],
...         [-3, -2, 3, 2],
...         [1, 1, -1, -1],
...         [2, 1, -2, -1],
...         [3, 2, -3, -2],
...     ],
...     columns=["a", "b", "c", "d"],
... )
>>> df
   a  b  c  d
0 -1 -1  1  1
1 -2 -1  2  1
2 -3 -2  3  2
3  1  1 -1 -1
4  2  1 -2 -1
5  3  2 -3 -2

Decompose all columns.

>>> df.decompose(decomposition.PCA)  
          a         b             c             d
0  1.956431  0.415183  9.009015e-17  8.100537e-18
1  3.142238 -0.355441  8.394617e-17  9.817066e-18
2  5.098670  0.059742 -8.445140e-17  1.640353e-19
3 -1.956431 -0.415183 -7.881266e-17  8.428608e-18
4 -3.142238  0.355441 -8.495664e-17  1.014514e-17
5 -5.098670 -0.059742  8.445140e-17 -1.640353e-19

Decompose the selected columns.

>>> df.decompose(decomposition.PCA, ["a", "b"])  
          a         b  c  d
0  1.383406  0.293579  1  1
1  2.221898 -0.251335  2  1
2  3.605304  0.042244  3  2
3 -1.383406 -0.293579 -1 -1
4 -2.221898  0.251335 -2 -1
5 -3.605304 -0.042244 -3 -2

Decompose and remap columns to a few.

>>> df.decompose(
...     decomposition.PCA,
...     {"A": ["a", "b"], "B": ["b", "c", "d"]},
... )  
          A         B  a  b  c  d
0  1.383406  1.694316 -1 -1  1  1
1  2.221898  2.428593 -2 -1  2  1
2  3.605304  4.122909 -3 -2  3  2
3 -1.383406 -1.694316  1  1 -1 -1
4 -2.221898 -2.428593  2  1 -2 -1
5 -3.605304 -4.122909  3  2 -3 -2
>>> df.decompose(
...     decomposition.PCA,
...     {("A", "B"): ["a", "b", "c"]}
... )  
          A         B  a  b  c  d
0  1.702037  0.321045 -1 -1  1  1
1  2.988071 -0.267273 -2 -1  2  1
2  4.690108  0.053773 -3 -2  3  2
3 -1.702037 -0.321045  1  1 -1 -1
4 -2.988071  0.267273  2  1 -2 -1
5 -4.690108 -0.053773  3  2 -3 -2