Tip

This page was generated from guide/tips_about_getattr.ipynb.

Tips About Accessing Element Attributes of Series#

Series combins data. When Series combines same type data, the Series will become a container to access data attributes. So this type of Series could be call a string type of Series, a Path type of Series, a CRS type of Series and more.

Are there any ways to access data attributes of Series?

  • Use apply method to fetch attributes.

  • Use Pandas Object Accessor, pandas.api.extensions.register_series_accessor.

  • Use getattr, dtoolkit.accessor.series.getattr.

Use apply Method#

apply Method’s String Example#

[1]:
import pandas as pd

s = pd.Series(["hello", "world"])
s
[1]:
0    hello
1    world
dtype: object

Count the 'l' number.

[2]:
s.apply(lambda x: x.count("l"))
[2]:
0    2
1    1
dtype: int64

Find the 'l'.

[3]:
s.apply(lambda x: x.find("l"))
[3]:
0    2
1    3
dtype: int64

Return the element length.

[4]:
s.apply(lambda x: len(x))
[4]:
0    5
1    5
dtype: int64

apply Method Conclusion#

From above example, we could see:

  • advantages

    • doesn’t need to pre-code firstly

    • support to return arbitrary result via lambda function

  • disadvantages

    • bad code style, need to key in a lot of nonsense codes

Use Pandas Object Accessor#

For convenience, Pandas does a .str accessor to access string attributes for string type of Series.

.str Accessor’s String Example#

Count the 'l' number.

[5]:
s.str.count("l")
[5]:
0    2
1    1
dtype: int64

Find the 'l'.

[6]:
s.str.find("l")
[6]:
0    2
1    3
dtype: int64

Return the element length.

[7]:
s.str.len()
[7]:
0    5
1    5
dtype: int64

Accessor Method Conclusion#

From above example, we could see:

  • advantages

    • keep the same code style, 'a string'.count() & Series.str.count()

    • support to add additional attributes, 'a string'.__len__() -> Series.str.len()

  • disadvantages

    • need to pre-code firstly

Use getattr#

Let us quickly glance getattr example to show.

getattr’s String Example#

[8]:
import dtoolkit

Count the 'l' number.

[9]:
s.getattr("count", "l")
[9]:
0    2
1    1
dtype: int64

Find the 'l'.

[10]:
s.getattr("find", "l")
[10]:
0    2
1    3
dtype: int64

Return the element length.

[11]:
s.getattr("__len__")
[11]:
0    5
1    5
dtype: int64

getattr Method Conclusion#

From above example, we could see:

  • advantages

    • doesn’t need to pre-code firstly

  • disadvantages

    • not OOP style but functional style, a little bit weird for switching two different styles.

    • doesn’t support calling attributes that don’t exist.

Summary#

Compare these methods, we can get ranking at different backgrounds.

  • Flexibility / Scalability: accessor > apply > getattr

  • Ease of use

    • With pre-code accessor: accessor > getattr > apply

    • Without pre-code accessor: getattr > apply

The aim of getattr is to quickly access element attributes of Series attributes without more codes.

Base on that, it could quickly fetch attributes from Series element.

So it is a minimal accessor to Series.