{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tips About Accessing Element Attributes of `Series`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Series` combins data.\n", "When `Series` combines same **type** data, the `Series` will become a container to access data `attributes`.\n", "So this type of `Series` could be call a `string` type of `Series`, a `Path` type of `Series`, a `CRS` type of `Series` and more." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Are there any ways to access data attributes of `Series`?\n", "\n", "- Use `apply` method to fetch attributes.\n", "- Use Pandas Object Accessor, `pandas.api.extensions.register_series_accessor`.\n", "- Use `getattr`, `dtoolkit.accessor.series.getattr`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use `apply` Method\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `apply` Method's String Example" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "s = pd.Series([\"hello\", \"world\"])\n", "s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Count the `'l'` number." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s.apply(lambda x: x.count(\"l\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find the `'l'`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s.apply(lambda x: x.find(\"l\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Return the element length." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s.apply(lambda x: len(x))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `apply` Method Conclusion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From above example, we could see:\n", "\n", "- advantages\n", " - doesn't need to pre-code firstly\n", " - support to return arbitrary result via `lambda` function\n", "- disadvantages\n", " - bad code style, need to key in a lot of nonsense codes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use Pandas Object Accessor" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For convenience, `Pandas` does a `.str` accessor to access `string` attributes for `string` type of `Series`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `.str` Accessor's String Example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Count the `'l'` number." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s.str.count(\"l\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find the `'l'`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s.str.find(\"l\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Return the element length." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s.str.len()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Accessor Method Conclusion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From above example, we could see:\n", "\n", "- advantages\n", " - keep the same code style, `'a string'.count()` & `Series.str.count()`\n", " - support to add additional attributes, `'a string'.__len__()` -> `Series.str.len()`\n", "- disadvantages\n", " - need to pre-code firstly" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use `getattr`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us quickly glance `getattr` example to show." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `getattr`'s String Example" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import dtoolkit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Count the `'l'` number." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s.getattr(\"count\", \"l\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find the `'l'`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s.getattr(\"find\", \"l\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Return the element length." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s.getattr(\"__len__\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `getattr` Method Conclusion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From above example, we could see:\n", "\n", "- advantages\n", " - doesn't need to pre-code firstly\n", "- disadvantages\n", " - not OOP style but functional style, a little bit weird for switching two different styles.\n", " - doesn't support calling attributes that don't exist." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compare these methods, we can get ranking at different backgrounds.\n", "\n", "- Flexibility / Scalability: accessor > `apply` > `getattr`\n", "- Ease of use\n", " - With pre-code accessor: accessor > `getattr` > `apply`\n", " - Without pre-code accessor: `getattr` > `apply`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The aim of `getattr` is to quickly access element attributes of `Series` attributes without more codes.\n", "\n", "Base on that, it could quickly fetch attributes from Series element.\n", "\n", "So it is a minimal accessor to Series." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 2 }