Tip

This page was generated from guide/tips_about_accessor.ipynb.

Register a Method as the Original Attribute of Pandas Object#

Pandas object method chaining gives us a great coding feeling without any breaking.

In many cases, it’s ok via the original attributes of pandas object. But it’s possible to use our own function to handle some special cases.

  • Pandas accessor register, pandas.api.extensions.register_series_accessor and pandas.api.extensions.register_dataframe_accessor

  • DToolKit method register, dtoolkit.accessor.register_series_method and dtoolkit.accessor.register_dataframe_method

Pandas Accessor Register#

Pandas Register Class#

This example shows an accessor how can combine many attributes.

Just like Series.str accessor can access a lot of string attributes, count, find, and index, i.e.

Copy from pandas accessor example.

[1]:
from __future__ import annotations

import pandas as pd
import numpy as np


@pd.api.extensions.register_dataframe_accessor("geo")
class GeoAccessor:
    def __init__(self, df: pd.DataFrame):
        self._obj = df

    @property
    def center(self):
        # return the geographic center point of this DataFrame
        lat = self._obj.latitude
        lon = self._obj.longitude
        return (float(lon.mean()), float(lat.mean()))

    def plot(self):
        # plot this array's data on a map, e.g., using Cartopy
        pass
[2]:
ds = pd.DataFrame(
    {
        "longitude": np.linspace(0, 10),
        "latitude": np.linspace(0, 20),
    }
)
ds.head()
[2]:
longitude latitude
0 0.000000 0.000000
1 0.204082 0.408163
2 0.408163 0.816327
3 0.612245 1.224490
4 0.816327 1.632653
[3]:
ds.geo.center
[3]:
(5.0, 10.0)

Pandas Register Method#

What if I want to register only one method?

It need to wrap class or function.

Wrap Class via __call__#

[4]:
@pd.api.extensions.register_dataframe_accessor("col")
@pd.api.extensions.register_series_accessor("col")
class Column:
    def __init__(self, pd_obj):
        self.pd_obj = pd_obj

    def __call__(self) -> str | list[str]:
        if isinstance(self.pd_obj, pd.Series):
            return self.pd_obj.name

        return self.pd_obj.columns.tolist()
[5]:
ds.col()
[5]:
['longitude', 'latitude']

Wrap function#

[6]:
@pd.api.extensions.register_dataframe_accessor("col")
@pd.api.extensions.register_series_accessor("col")
def column(pd_obj) -> str | list[str]:
    def wrapper():
        if isinstance(pd_obj, pd.Series):
            return pd_obj.name
        return pd_obj.columns.tolist()

    return wrapper()
/tmp/ipykernel_2969/3505089096.py:2: UserWarning: registration of accessor <function column at 0x7f6692393600> under name 'col' for type <class 'pandas.core.series.Series'> is overriding a preexisting attribute with the same name.
  @pd.api.extensions.register_series_accessor("col")
/tmp/ipykernel_2969/3505089096.py:1: UserWarning: registration of accessor <function column at 0x7f6692393600> under name 'col' for type <class 'pandas.core.frame.DataFrame'> is overriding a preexisting attribute with the same name.
  @pd.api.extensions.register_dataframe_accessor("col")
[7]:
ds.col()
[7]:
['longitude', 'latitude']

Pandas Accessor Register Conclusion#

For class pandas accessor register (pd.api.extensions.register_*_accessor) would be great. But for single method it would be a little bit weird.

DToolKit Method Register#

To hook single method easier.

DToolKit Register Method#

[8]:
from dtoolkit.accessor import register_dataframe_method
from dtoolkit.accessor import register_series_method


@register_dataframe_method("col")
@register_dataframe_method
@register_series_method("col")
@register_series_method
def column(pd_obj) -> str | list[str]:
    if isinstance(pd_obj, pd.Series):
        return pd_obj.name
    return pd_obj.columns.tolist()
/home/docs/checkouts/readthedocs.org/user_builds/my-data-toolkit/conda/latest/lib/python3.12/site-packages/dtoolkit/accessor/register.py:46: UserWarning: registration of accessor <function column at 0x7f66923931a0> under name 'col' for type <class 'pandas.core.series.Series'> is overriding a preexisting attribute with the same name.
  register_accessor(name)(method_accessor)
/home/docs/checkouts/readthedocs.org/user_builds/my-data-toolkit/conda/latest/lib/python3.12/site-packages/dtoolkit/accessor/register.py:46: UserWarning: registration of accessor <function column at 0x7f6692393060> under name 'col' for type <class 'pandas.core.frame.DataFrame'> is overriding a preexisting attribute with the same name.
  register_accessor(name)(method_accessor)

Use custom accessor name col.

[9]:
ds.col()
[9]:
['longitude', 'latitude']

Use accessor name column.

[10]:
ds.column()
[10]:
['longitude', 'latitude']

Extend to Pandas-like Object#

To extend quickly hook method as pandas-like object ability.

There are a another decorator dtoolkit.accessor.register_method_factory.

@register_method_factory
def object_accessor(name: str | None = None):
    return pandas_like_object_accessor(name)

Transform Pandas Accessor Register#

[11]:
from dtoolkit.accessor import register_method_factory


@register_method_factory
def my_dataframe_accessor(name: str | None = None):
    return pd.api.extensions.register_dataframe_accessor(name)
[12]:
@my_dataframe_accessor("my_cols")
@my_dataframe_accessor
def my_columns(pd_obj: pd.DataFrame):
    return pd_obj.columns
[13]:
ds.my_columns()
[13]:
Index(['longitude', 'latitude'], dtype='object')
[14]:
ds.my_cols()
[14]:
Index(['longitude', 'latitude'], dtype='object')

Transform GeoPandas Accessor Register#

[15]:
from dtoolkit.accessor import register_method_factory
from dtoolkit.geoaccessor import register_geodataframe_accessor


@register_method_factory
def my_geodataframe_accessor(name: str | None = None):
    return register_geodataframe_accessor(name)
[16]:
import geopandas as gpd


@my_geodataframe_accessor("is_p")
@my_geodataframe_accessor
def is_point(df: gpd.GeoDataFrame):
    # Return a boolean Series denoting whether each geometry is a point.

    return df.geometry.geom_type == "Point"
[17]:
s = gpd.GeoSeries.from_wkt(["POINT (0 0)", "POINT (1 1)", None])
df = s.to_frame("geometry")
df
[17]:
geometry
0 POINT (0.00000 0.00000)
1 POINT (1.00000 1.00000)
2 None
[18]:
df.is_point()
[18]:
0     True
1     True
2    False
dtype: bool
[19]:
df.is_p()
[19]:
0     True
1     True
2    False
dtype: bool