dtoolkit.accessor.dataframe.expand#

dtoolkit.accessor.dataframe.expand(df: DataFrame, /, suffix: list[Hashable] = None, delimiter: str = '_', flatten: bool = False) → DataFrame[source]#

Transform each element of a list-like to a column.

Parameters:

suffixlist of Hashable, optional: New columns of return DataFrame.
delimiterstr, default “_”: The delimiter between name and suffix.
flattenbool, default False: Flatten all like-list elements or not. It would cost more time.

Returns:

DataFrame: The structure of new column name is {column name}{delimiter}{suffix}.

Raises:

ValueError

If s.name is None.
If len(suffix) less than the max size of s’s elements.

See also

pandas.Series.explode: Transform each element of a list-like to a row.
pandas.DataFrame.explode: Transform each element of a list-like to a row.
dtoolkit.accessor.series.expand: Transform each element of a list-like to a column.
dtoolkit.accessor.dataframe.expand: Transform each element of a list-like to a column.

Examples

>>> import dtoolkit
>>> import pandas as pd
>>> import numpy as np

Expand the list-like element.

>>> df = pd.DataFrame(
...     {
...         'A': [[0, 1, 2], 'foo', [], [3, 4]],
...         'B': 1,
...         'C': [['a', 'b', 'c'], np.nan, [], ['d', 'e']],
...     },
... )
>>> df.expand()
    A_0  A_1  A_2  B   C_0   C_1   C_2
0     0  1.0  2.0  1     a     b     c
1   foo  NaN  NaN  1   NaN  None  None
2  None  NaN  NaN  1  None  None  None
3     3  4.0  NaN  1     d     e  None

Expand sub-element type is list-like.

>>> df = pd.DataFrame({"col1": [1, 2], "col2": [("a", "b"), (3, (5, 6))]})
>>> df.expand(flatten=True)
   col1 col2_0 col2_1  col2_2
0     1      a      b     NaN
1     2      3      5     6.0

Set the columns of name.

>>> df = pd.DataFrame({"col1": [1, 2], "col2": [("a", 3), ("b", 4)]})
>>> df.expand(suffix=["index", "value"], delimiter="-")
   col1  col2-index  col2-value
0     1           a           3
1     2           b           4

Also could handle different lengths of element and suffix list.

>>> df = pd.DataFrame({"col1": [1, 2], "col2": [(3, 4), (5, 6, 7)]})
>>> df.expand()
   col1  col2_0  col2_1  col2_2
0     1       3       4     NaN
1     2       5       6     7.0
>>> df.expand(suffix=["a", "b", "c", "d"])
   col1  col2_a  col2_b  col2_c
0     1       3       4     NaN
1     2       5       6     7.0