Bugs

Can't code without them, so code against them!

http://durden.github.io/defensive_coding

Luke Lee

  • Writing oil/gas software with Geophysists, mathematicians, geologists
  • Embedded C Developer with SSDs
  • Explain what tools I've learned to start becoming a productive Python developer in the energy business.

Shipped it!!

Shipped it!?

Defensive Programming

A form of defensive design intended to ensure the continuing function of a piece of software in spite of unforeseeable usage of said software.

Defensive programming techniques are used especially when a piece of software could be misused mischievously or inadvertently to catastrophic effect.

Defensive Programming

Guidelines

  • Every line of code is a liability
  • Codify your assumptions
  • Executable documentation is preferable

Python tools

  • Asserts
  • Logging
  • Unit tests

Asserts

  • Not just for unit tests
  • Check assumption, raise AssertionError

Find the assumptions

 1 def normalize_ranges(colname):
 2     # Range of 1-D numpy array we loaded application with
 3     orig_range = get_base_range(colname)
 4     colspan = orig_range['datamax'] - orig_range['datamin']
 5 
 6     # User filtered data from GUI
 7     live_data = get_column_data(colname)
 8     live_min = numpy.min(live_data)
 9     live_max = numpy.max(live_data)
10 
11     ratio = {}
12     ratio['min'] = (live_min - orig_range['datamin']) / colspan
13     ratio['max'] = (live_max - orig_range['datamin']) / colspan
14     return ratio
  • Normalize data range to [0 - 1] for a GUI widget
  • Read data from model and GUI, return range [0 - 1]
  • Incorrect calculation here could go unnoticed, far away from this function

Assumptions

 1 # Base/starting data
 2 >>> age = numpy.array([10.0, 20.0, 30.0, 40.0, 50.0])
 3 >>> height = numpy.array([60.0, 66.0, 72.0, 63.0, 66.0])
 4 >>> normalize_ranges('age')
 5 {'max': 1.0, 'min': 0.0}
 6 
 7 # Change current range, simulate user filtering data
 8 >>> age = numpy.array([20.0, 30.0, 40.0, 50.0])
 9 >>> normalize_ranges('age')
10 {'max': 1.0, 'min': 0.25}
11 
12 >>> age = numpy.array([-10.0, 20.0, 30.0, 40.0, 50.0])
13 >>> normalize_ranges('age')
14 {'max': 1.0, 'min': -0.5}
  • Negative numbers, numbers bigger than the actual data
  • Clue you into where data is wrong
  • Maybe it's correct data, just unexpected
  • Maybe it's a specific set of data

Lazy programmer

 1 def normalize_ranges(colname):
 2     orig_range = get_base_range(colname)
 3 
 4     # Max will always be greater than the minimum
 5     colspan = orig_range['datamax'] - orig_range['datamin']
 6 
 7     # User filtered data from GUI
 8     live_data = get_column_data(colname)
 9     live_min = numpy.min(live_data)
10     live_max = numpy.max(live_data)
11 
12     ratio = {}
13 
14     # Numbers should always be positive
15     ratio['min'] = (live_min - orig_range['datamin']) / colspan
16     ratio['max'] = (live_max - orig_range['datamin']) / colspan
17 
18     # Should be between [0 - 1]
19     return ratio
  • These comments are a start, but their not executable. If we disobey them nothing happens.
  • Too easy to go unnoticed

Asserts

1 assert 0.0 <= ratio['min'] <= 1.0, (
2         '"%s" min (%f) not in [0-1] given (%f) colspan (%f)' % (
3         colname, ratio['min'], orig_range['datamin'], colspan))
4 
5 assert 0.0 <= ratio['max'] <= 1.0, (
6         '"%s" max (%f) not in [0-1] given (%f) colspan (%f)' % (
7         colname, ratio['max'], orig_range['datamax'], colspan))
  • Add asserts to verify we return correct answer, not that user passed in correct data

Assert benefits

  • Executable documentation
  • Close the gap
  • Debug info when it counts
  • Here's just a high-level view of the benefits we're going to dive into

Executable documentation

  • Runs alongside production code
  • Normal docs are stale
  • Higher degree of confidence
  • Better than regular docs b/c higher chances of being up to date

Close the gap

  • Debugging is hard
  • Complain early and often
  • Close gap between symptom and cause
  • Alert invalid conditions ASAP

Debug info when it counts

  • Good messages with parameters/local state
  • Invalid assumptions about environment
  • Avoid unreproducible bugs

More information leads to:

  • New use cases
  • Documentation oversights
  • Add in debug information you'll need BEFORE you need it!
  • Remember our assert messages? They showed input params and our state

Assert downsides

  • Only available in debug mode
  • Increased code noise
  • No dynamic control
  • Use -O to run in optimized mode, turns off asserts for performance
  • Default is debug on
  • Docs say it's clear to use for debug, not production
  • Can't control asserts dynamically, can't assign to debug

Good assert usage

  • Check return values, not inputs
  • Document usage in style guide
  • Don't ruin duck-typing
  • Checking returns can allow you to complain about inputs if you get wrong answer, so it's a 2 for 1

Avoid this

 1 def normalize_ranges(colname):
 2     assert isinstance(colname, str)
 3     orig_range = get_base_range(colname)
 4     assert orig_range['datamin'] >= 0
 5     assert orig_range['datamax'] >= 0
 6     assert orig_range['datamin'] <= orig_range['datamax']
 7     colspan = orig_range['datamax'] - orig_range['datamin']
 8     assert colspan >= 0, 'Colspan (%f) is negative' % (colspan)
 9 
10     live_data = get_column_data(colname)
11     assert len(live_data), 'Empty live data'
12     live_min = numpy.min(live_data)
13     live_max = numpy.max(live_data)
14 
15     ratio = {}
16     ratio['min'] = (live_min - orig_range['datamin']) / colspan
17     ratio['max'] = (live_max - orig_range['datamin']) / colspan
18 
19     assert 0.0 <= ratio['min'] <= 1.0
20     assert 0.0 <= ratio['max'] <= 1.0
21     return ratio

Logging

  • Multiple levels
  • Dynamic control
  • Tracebacks
  • Higher-level combination with asserts
  • Here's just a high-level view of some of the benefits of logging

Logging levels

  • Levels: exception, critical, warning, info...
  • Temporarily cut down on debug noise
  • Send levels to multiple locations

Dynamic control

  • Change log level dynamically
  • Important with alternative distribution
    • Pyinstaller, py2exe
  • Read log level from database/config file
  • Dynamic control is big step over asserts
  • Important b/c restarting interpreter could lose state/debug scenario

Tracebacks

  • Built-in with exception level
  • Invaluable in error scenarios

Logging + asserts

  • Log assertions to different file/database
  • Lots of data-mining capabilities
  • Build a low-fi analytics system by logging to database, network file, dropbox
  • Shell scripts to collect simple text files
  • Track what features people use

Logging + asserts

 1 def main():
 2     logging.basicConfig(level=logging.WARNING)
 3     log = logging.getLogger('example')
 4     try:
 5         throws()
 6         return 0
 7     except AssertionError:
 8         log.debug('Error: %s' % (err))
 9     except Exception as err:
10         log.exception('Error from throws():')
11         return 1

Logging downsides

  • Difficulty maintaining consistent levels
  • Multiple loggers can be complicated
    • Logging cookbook
    • http://docs.python.org/2/howto/logging-cookbook.html
  • Two CS problems, naming and caching
  • Levels == naming
  • Be careful/mindful logging could be your best defense against a production bug

Unit tests

  • Not just for Test Driven Development
  • Test bug fixes
  • Re-factoring insurance
  • Hard to get manager buy-in b/c it takes longer usually up-front
  • Release early, find bugs, write tests, repeat

Unit test benefits

  • Improves documentation of bug fix
  • Answer 5 Ws
    • http://en.wikipedia.org/wiki/Five_Ws
  • Future-proof code against duplicate bugs
    • Old bugs have tendency to come back
  • Think executable documentation!
  • Document how you tested it, what the bug scenario was
  • Good unit test is best doc for weird bug fix
    • test will include comments, data to re-create bug, steps for scenario
  • Refactoring has a tendency to allow old bugs to creep in

Unit test downsides

  • Hard to run
  • Not usually located close to real code
  • Some environments hard to duplicate
  • False positives
  • Doctests are nice b/c they are right there with code but can be ugly
    • Problems for documentation generators like sphinx?

Unit test tools

  • Pytest
  • Nose
  • doctests
  • Travis CI
  • Tools to help mitigate the hard to run aspect of tests

Suit up, defense!

  • I consider this talk to be a blend of motivation and pessimism by saying all your code as bugs.
  • So, I'd like to end this talk on a high note.
  • Hopefully I've given you some new tools and perspective so you're properly suited up for your next bug battle like this Thorgi.

Links

Questions?

</soapbox>

  • This talk is pretty pessimistic b/c sadly a lot of being a programmer means thinking of what can and will go wrong. However, the irony of this talk is that the ideas are presented in an optimistic manner.
  • For example, 'here is your problem. now here is the easy solution.'
  • This obviously isn't the real world:
    • Take these ideas and try it out
    • Use what works, throw away what doesn't
    • The important thing is to try something different and change your mindset from preventing bugs to a mindset of they will happen no matter what I do, so how can I fight back and make the maintenance phase of a project easier now with the smallest amount of effort.