{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Register a Method as the Original Attribute of Pandas Object" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pandas object method chaining gives us a great coding feeling without any breaking.\n", "\n", "In many cases, it's ok via the original attributes of pandas object.\n", "But it's possible to use our own function to handle some special cases.\n", "\n", "\n", "- Pandas accessor register, `pandas.api.extensions.register_series_accessor` and `pandas.api.extensions.register_dataframe_accessor`\n", "- DToolKit method register, `dtoolkit.accessor.register_series_method` and `dtoolkit.accessor.register_dataframe_method`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pandas Accessor Register" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pandas Register Class" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This example shows an accessor how can combine many attributes.\n", "\n", "Just like `Series.str` accessor can access a lot of `string` attributes, `count`, `find`, and `index`, i.e." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Copy from [pandas accessor example](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.extensions.register_dataframe_accessor.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from __future__ import annotations\n", "\n", "import pandas as pd\n", "import numpy as np\n", "\n", "\n", "@pd.api.extensions.register_dataframe_accessor(\"geo\")\n", "class GeoAccessor:\n", " def __init__(self, df: pd.DataFrame):\n", " self._obj = df\n", "\n", " @property\n", " def center(self):\n", " # return the geographic center point of this DataFrame\n", " lat = self._obj.latitude\n", " lon = self._obj.longitude\n", " return (float(lon.mean()), float(lat.mean()))\n", "\n", " def plot(self):\n", " # plot this array's data on a map, e.g., using Cartopy\n", " pass" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds = pd.DataFrame(\n", " {\n", " \"longitude\": np.linspace(0, 10),\n", " \"latitude\": np.linspace(0, 20),\n", " }\n", ")\n", "ds.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds.geo.center" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pandas Register Method" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What if I want to register only one method?\n", "\n", "It need to wrap class or function." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Wrap Class via `__call__`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@pd.api.extensions.register_dataframe_accessor(\"col\")\n", "@pd.api.extensions.register_series_accessor(\"col\")\n", "class Column:\n", " def __init__(self, pd_obj):\n", " self.pd_obj = pd_obj\n", "\n", " def __call__(self) -> str | list[str]:\n", " if isinstance(self.pd_obj, pd.Series):\n", " return self.pd_obj.name\n", "\n", " return self.pd_obj.columns.tolist()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds.col()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Wrap function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@pd.api.extensions.register_dataframe_accessor(\"col\")\n", "@pd.api.extensions.register_series_accessor(\"col\")\n", "def column(pd_obj) -> str | list[str]:\n", " def wrapper():\n", " if isinstance(pd_obj, pd.Series):\n", " return pd_obj.name\n", " return pd_obj.columns.tolist()\n", "\n", " return wrapper()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds.col()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pandas Accessor Register Conclusion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For class pandas accessor register (`pd.api.extensions.register_*_accessor`) would be great.\n", "But for single method it would be a little bit weird." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## DToolKit Method Register" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To hook single method easier." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### DToolKit Register Method" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dtoolkit.accessor import register_dataframe_method\n", "from dtoolkit.accessor import register_series_method\n", "\n", "\n", "@register_dataframe_method(\"col\")\n", "@register_dataframe_method\n", "@register_series_method(\"col\")\n", "@register_series_method\n", "def column(pd_obj) -> str | list[str]:\n", " if isinstance(pd_obj, pd.Series):\n", " return pd_obj.name\n", " return pd_obj.columns.tolist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use custom accessor name `col`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds.col()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use accessor name `column`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds.column()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extend to Pandas-like Object" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To extend quickly hook method as pandas-like object ability.\n", "\n", "There are a another decorator `dtoolkit.accessor.register_method_factory`.\n", "\n", "```python\n", "@register_method_factory\n", "def object_accessor(name: str | None = None):\n", " return pandas_like_object_accessor(name)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Transform Pandas Accessor Register" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dtoolkit.accessor import register_method_factory\n", "\n", "\n", "@register_method_factory\n", "def my_dataframe_accessor(name: str | None = None):\n", " return pd.api.extensions.register_dataframe_accessor(name)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@my_dataframe_accessor(\"my_cols\")\n", "@my_dataframe_accessor\n", "def my_columns(pd_obj: pd.DataFrame):\n", " return pd_obj.columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds.my_columns()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds.my_cols()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Transform GeoPandas Accessor Register" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dtoolkit.accessor import register_method_factory\n", "from dtoolkit.geoaccessor import register_geodataframe_accessor\n", "\n", "\n", "@register_method_factory\n", "def my_geodataframe_accessor(name: str | None = None):\n", " return register_geodataframe_accessor(name)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import geopandas as gpd\n", "\n", "\n", "@my_geodataframe_accessor(\"is_p\")\n", "@my_geodataframe_accessor\n", "def is_point(df: gpd.GeoDataFrame):\n", " # Return a boolean Series denoting whether each geometry is a point.\n", "\n", " return df.geometry.geom_type == \"Point\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s = gpd.GeoSeries.from_wkt([\"POINT (0 0)\", \"POINT (1 1)\", None])\n", "df = s.to_frame(\"geometry\")\n", "df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.is_point()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.is_p()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 2 }