Dunder What?

The power behind Python's object model

  • Whose heard the term 'dunder' before?
  • OK, so maybe at least a few of you will learn a new word today.
  • This is essentially a tour of dunder methods and their uses.
  • Not exhaustive and not super detailed.
  • Designed to give you a bunch of things to explore and look up.

Luke Lee

  • Software Engineer at Blueback Reservoir
  • Data mining apps in Python
  • Embedded C Developer
  • Django enthusiast

Dunder Basics

  • Dunder just slang for double underscore
  • Sometimes called 'magic methods'
  • 'Private' class methods
  • Just methods!
  • Poorly documented

Lookup the args and when it's called and your good!

Use of 'magic' here is a poor choice since the Python community values not using magic. In fact, dunder methods are just plain old methods that get called at specific, well-defined times.

Why dunders?

  • Your already using them
  • Pythonic (common/idiomatic)
  • Operator overloading
  • Debugging tools
  • Avoid custom API/object semantics
  • Domain Specific Language (DSL)

Main goal is to make your object look and act like a standard Python type.

The person using your object already knows Python and how to use standard types so don't make them learn all new semantics.

Common dunders

1 __init__(self, *args, **kwargs)
2 __str__(self)
3 __unicode__(self)
4 __len__(self)
5 __repr__(self)
  • init/creation are separate steps for a good reason.
  • Initialize your class, not create
  • Takes all arguments from when you create object and passes them
  • Kind of like a constructor (not exactly)

  • When str() is called, '%s', print, etc.

  • Python starts at your object and looks for str.
  • Goes through the MRO

  • Different for old-style classes, some magic involved

  • Similar in theory to str, called by unicode(), etc.

Powered by Dunders

  • Context Managers
  • Collections module
  • Properties
  • Descriptors
  • Callables
  • Pickling
  • Operator overloading

Context Managers

1 __enter__(self)
2 __exit__(self, exc_type, exc_value, traceback)
  • Useful for 'book ended' actions
  • Simple exception handling
  • Built-in for File objects, etc. in 2.7

Context Manager usage

 1 >>> with ProgressBar(5) as p:
 2 ...   for ii in xrange(5):
 3 ...     p.update(ii)
 4 ...
 5 0.0 20.0 40.0 60.0 80.0 100.0
 6 
 7 >>> with ProgressBar(5) as p:
 8 ...   for ii in xrange(2):
 9 ...     p.update(ii)
10 ...
11 0.0 20.0 100.0

Context Manager example

 1 class ProgressBar(object):
 2     """Progress bar that normalizes progress to [0 - 100] scale"""
 3     def __init__(self, max_val):
 4         self._curr_val = 0.0
 5         self._done_val = 100.0
 6         self._max_val = float(max_val)
 7     def __enter__(self):
 8         """Start of context manager, 'with' statement"""
 9         self._curr_val = 0.0
10 
11         # Important!
12         return self
13     def __exit__(self, exc_type, exc_value, traceback):
14         """End of context manager, leaving 'with' statement"""
15         self.update(self._max_val)
16 
17         # Not handling any exceptions, so they'll be raised automatically
18         # To ignore exceptions return True or inspect arguments to handle
19         return False
20 
21     def update(self, value):
22         if value >= self._max_val:
23             self._curr_val = self._done_val
24         else:
25             self._curr_val = (value / self._max_val) * self._done_val
26 
27         print '%s' % (self._curr_val),

OrderedDict

1 __setitem__(self, key, value)
2 __getitem__(self, key)
3 __len__(self)
4 __delitem__(self, key)
5 __iter__(self)
6 __reversed__(self)
7 __contains__(self)
8 __concat__(self)
  • Immutable only needs getitem, len
  • Mutable needs setitem
  • Implementing contains usually not needed, default behavior is to loop over your items and return if it was found.

OrderedDict usage

 1 >>> import collections
 2 >>> ordered_dict = collections.OrderedDict()
 3 >>>
 4 >>> ordered_dict['a'] = 1
 5 >>> ordered_dict['b'] = 2
 6 >>> ordered_dict['c'] = 3
 7 >>> for k, v in ordered_dict.iteritems(): print k,v
 8 ...
 9 a 1
10 b 2
11 c 3
12 
13 >>> unordered_dict = {}
14 >>> unordered_dict['a'] = 1
15 >>> unordered_dict['b'] = 2
16 >>> unordered_dict['c'] = 3
17 >>> for k, v in unordered_dict.iteritems(): print k,v
18 ...
19 a 1
20 c 3
21 b 2

Properties

  • Managed attributes
  • 'Encapsulation'
  • Add behavior to attribute actions
  • Simplify API

Properties

 1 class User(object):
 2     def __init__(self, name, email):
 3         self._name = name
 4         self._email = None
 5 
 6         # Calls descriptor, verify on change and init
 7         self.email = email
 8 
 9     def set_email(self, email):
10         if '@' not in email:
11             raise TypeError
12 
13         self._email = email
14 
15     def get_email(self):
16         return self._email
17 
18     email = property(get_email, set_email)
  • Attribute access looks like variable, can add behavior and method later without changing client/API

Property usage

 1 >>> u = User('luke', 'test')
 2 Traceback (most recent call last):
 3 File "<stdin>", line 1, in <module>
 4 File "<stdin>", line 6, in __init__
 5 File "<stdin>", line 9, in set_email
 6 TypeError
 7 
 8 >>> u = User('luke', 'noreply@lukelee.net')
 9 >>> u.email
10 'noreply@lukelee.net'
11 
12 >>> u.set_email('test')
13 Traceback (most recent call last):
14 File "<stdin>", line 1, in <module>
15 File "<stdin>", line 9, in set_email
16 TypeError
17 
18 >>> u.set_email('test@test.com')
19 >>> u.email
20 'test@test.com'
21 
22 >>> u._email = 'test'
23 >>> u.email
24 'test'

Descriptor powered

1 __get__(self, instance, owner)
2 __set__(self, instance, value)
3 __delete__(self, instance)
  • owner is class instance
  • Fancy wording for class that overrides attribute look up w/ dunders
  • delete called for descriptor attribute deletion
  • del called in other scenarios right before obj deleted

Pure Python property

 1 class Property(object):
 2     "Emulate PyProperty_Type() in Objects/descrobject.c"
 3 
 4     def __init__(self, fget=None, fset=None, fdel=None, doc=None):
 5         self.fget = fget
 6         self.fset = fset
 7         self.fdel = fdel
 8         self.__doc__ = doc
 9 
10     def __get__(self, obj, objtype=None):
11         if obj is None:
12             return self
13         if self.fget is None:
14             raise AttributeError, "unreadable attribute"
15         return self.fget(obj)
16 
17     def __set__(self, obj, value):
18         if self.fset is None:
19             raise AttributeError, "can't set attribute"
20         self.fset(obj, value)
21 
22     def __delete__(self, obj):
23         if self.fdel is None:
24             raise AttributeError, "can't delete attribute"
25         self.fdel(obj)

Custom Descriptor

1 >>> texas = Temperature()
2 >>> texas.farenheit = 98
3 >>> texas.celsius
4 36.666666666666664
5 >>> texas.farenheit
6 98.0
7 >>> texas.celsius = 38
8 >>> texas.farenheit
9 100.4

Custom Descriptor

 1 class Celsius(object):
 2     """Fundamental Temperature Descriptor."""
 3     def __init__(self, value=0.0):
 4         self.value = float(value)
 5 
 6     def __get__(self, instance, owner):
 7         return self.value
 8 
 9     def __set__(self, instance, value):
10         self.value = float(value)
11 
12 class Farenheit(object):
13     """Requires that the owner have a ``celsius`` attribute."""
14     def __get__(self, instance, owner):
15         return instance.celsius * 9 / 5 + 32
16 
17     def __set__(self, instance, value):
18         instance.celsius = (float(value) - 32) * 5 / 9
19 
20 class Temperature(object):
21     celsius = Celsius()
22     farenheit = Farenheit()

Properties extended

1 __getattr__(self, name)
2 __setattr__(self, name, value)
3 __delattr__(self, name)
4 __getattribute__(self, name)
  • Real encapsulation
  • Not always a good idea
  • Power means responsibility
  • getattr called only for missing attributes
  • Useful for throwing a good error message or meta programming
  • getattribute is called for all look up, be careful

Dunders in the wild

  • Parsers: (HTML, markdown, XML)
  • Debugging tools: (IPython, PyCharm)
  • ORM
  • dir, repr for IPython

Peewee ORM

 1 __or__(self, rhs)
 2 __and__(self, rhs)
 3 __invert__(self, rhs)
 4 __add__(self, rhs)
 5 __sub__(self, rhs)
 6 __neg__(self)
 7 
 8 SomeModel.select().where(
 9     (Q(a='A') | Q(b='B')) &
10     (Q(c='C') | Q(d='D'))
11 )
12 
13 # generates something like:
14 # SELECT * FROM some_obj
15 # WHERE ((a = "A" OR b = "B") AND (c = "C" OR d = "D"))
  • Heavy use of all math/logic overrides
  • peewee
  • Relatively small, < 3000 lines
  • map logical '|', '&', '~' to 'OR', 'AND', 'NOT' SQL
  • neg is unary negation operator

Pandas

 1 >>> import pandas
 2 >>> # Series is 1-d numpy array with labels
 3 >>> s = pandas.Series({'a' : 0., 'b' : 1., 'c' : 2.})
 4 >>> s
 5 a    0
 6 b    1
 7 c    2
 8 >>> s[0]
 9 0.0
10 >>> s[[2,1]]
11 c    2
12 b    1
13 >>> s[[2,1]][0]
14 2.0
15 
16 >>> s[s >= 1]
17 b    1
18 c    2
19 
20 >>> s * 3
21 a    0
22 b    3
23 c    6
  • Supports tons of fancy indexing
  • See pandas/core/series.py
  • Show pandas_snippet.py if time

Panda's secrets

 1 __ge__(self, rhs)
 2 __getitem__(self, key)
 3 
 4 >>> class Test(object):
 5         def __getitem__(self, key):
 6             print '%-15s  %s' % (type(key), key)
 7 
 8 >>> t = Test()
 9 >>> t[1]
10 <type 'int'>     1
11 
12 >>> t['hello world']
13 <type 'str'>     hello world
14 
15 >>> t[1, 'b', 3.0]
16 <type 'tuple'>   (1, 'b', 3.0)
17 
18 >>> t[5:200:10]
19 <type 'slice'>   slice(5, 200, 10)
20 
21 >>> t['a':'z':3]
22 <type 'slice'>   slice('a', 'z', 3)
23 
24 >>> t[object()]
25 <type 'object'>  <object object at 0x10aaf40d0>
  • 'key' argument is fancy

My Dunder ideas/attempts

1 >>> from frappy.services.github import Github
2 >>> g = Github()
3 >>> result = g.users('octocat')
4 >>> result.response_json
5 >>> # http://developer.github.com/v3/users/
6 
7 __getattr__(self, name)
8 __call__(self, *args, **kwargs)
  • Extra documentation
  • Update for minor API changes
  • Web API wrapper frappy
  • Main idea was to have a python wrapper for any REST API that didn't need it's own documentation or constant updates.
  • Meta programming
  • Create object with base domain name to query
  • Add missing attributes onto url, add args at the end as query string

Hammock

 1 >>> from hammock import Hammock as Github
 2 
 3 >>> # Let's create the first chain of hammock using base API url
 4 >>> github = Github('https://api.github.com')
 5 
 6 >>> # OK, let the magic happen, ask github for hammock watchers
 7 >>> resp = github.repos('kadirpekel', 'hammock').watchers.GET()
 8 
 9 >>> api = hammock.Hammock('http://localhost:8000')
10 >>> api('users')('foo', 'posts').GET('bar', 'comments')
11 
12 __getattr__(self, name)
13 __call__(self, *args, **kwargs)
14 __iter__(self)
15 __repr__(self)
  • Chains 'hammock' objects
  • Allows arguments anywhere
  • hammock

Python Philosophy

  • When in doubt, rationalize from 'The Zen'
  • In the face of ambiguity, refuse the temptation to guess.
  • There should be one-- and preferably only one --obvious way to do it.
  • Explicit is better than implicit.
  • Readability counts.
  • If the implementation is hard to explain, it's a bad idea.
  • no ambiguity, getlen(), getLength(), len(), etc.
  • Much cleaner IMO than c++/Java syntax of operator overloading
  • Docs aren't great, but clearly defined what happens
  • Use caution

References

Code

Me