Bugs

Can't code without them, so code against them!

http://durden.github.io/defensive_coding

Luke Lee

Software Engineer at Blueback Reservoir
Scientific computing/data mining apps

Writing oil/gas software with Geophysists, mathematicians, geologists
Embedded C Developer with SSDs
Explain what tools I've learned to start becoming a productive Python developer in the energy business.

Shipped it!!

Shipped it!?

Defensive Programming

A form of defensive design intended to ensure the continuing function of a piece of software in spite of unforeseeable usage of said software.

Defensive programming techniques are used especially when a piece of software could be misused mischievously or inadvertently to catastrophic effect.

Defensive Programming

Guidelines

Every line of code is a liability
Codify your assumptions
Executable documentation is preferable

Python tools

Asserts
Logging
Unit tests

Asserts

Not just for unit tests
Check assumption, raise AssertionError

Find the assumptions

 1 def normalize_ranges(colname):
 2     # Range of 1-D numpy array we loaded application with
 3     orig_range = get_base_range(colname)
 4     colspan = orig_range['datamax'] - orig_range['datamin']
 5 
 6     # User filtered data from GUI
 7     live_data = get_column_data(colname)
 8     live_min = numpy.min(live_data)
 9     live_max = numpy.max(live_data)
10 
11     ratio = {}
12     ratio['min'] = (live_min - orig_range['datamin']) / colspan
13     ratio['max'] = (live_max - orig_range['datamin']) / colspan
14     return ratio

Normalize data range to [0 - 1] for a GUI widget
Read data from model and GUI, return range [0 - 1]
Incorrect calculation here could go unnoticed, far away from this function

Assumptions

 1 # Base/starting data
 2 >>> age = numpy.array([10.0, 20.0, 30.0, 40.0, 50.0])
 3 >>> height = numpy.array([60.0, 66.0, 72.0, 63.0, 66.0])
 4 >>> normalize_ranges('age')
 5 {'max': 1.0, 'min': 0.0}
 6 
 7 # Change current range, simulate user filtering data
 8 >>> age = numpy.array([20.0, 30.0, 40.0, 50.0])
 9 >>> normalize_ranges('age')
10 {'max': 1.0, 'min': 0.25}
11 
12 >>> age = numpy.array([-10.0, 20.0, 30.0, 40.0, 50.0])
13 >>> normalize_ranges('age')
14 {'max': 1.0, 'min': -0.5}

Negative numbers, numbers bigger than the actual data
Clue you into where data is wrong
Maybe it's correct data, just unexpected
Maybe it's a specific set of data

Lazy programmer

 1 def normalize_ranges(colname):
 2     orig_range = get_base_range(colname)
 3 
 4     # Max will always be greater than the minimum
 5     colspan = orig_range['datamax'] - orig_range['datamin']
 6 
 7     # User filtered data from GUI
 8     live_data = get_column_data(colname)
 9     live_min = numpy.min(live_data)
10     live_max = numpy.max(live_data)
11 
12     ratio = {}
13 
14     # Numbers should always be positive
15     ratio['min'] = (live_min - orig_range['datamin']) / colspan
16     ratio['max'] = (live_max - orig_range['datamin']) / colspan
17 
18     # Should be between [0 - 1]
19     return ratio

These comments are a start, but their not executable. If we disobey them nothing happens.
Too easy to go unnoticed

Asserts

1 assert 0.0 <= ratio['min'] <= 1.0, (
2         '"%s" min (%f) not in [0-1] given (%f) colspan (%f)' % (
3         colname, ratio['min'], orig_range['datamin'], colspan))
4 
5 assert 0.0 <= ratio['max'] <= 1.0, (
6         '"%s" max (%f) not in [0-1] given (%f) colspan (%f)' % (
7         colname, ratio['max'], orig_range['datamax'], colspan))

Add asserts to verify we return correct answer, not that user passed in correct data

Assert benefits

Executable documentation
Close the gap
Debug info when it counts

Here's just a high-level view of the benefits we're going to dive into

Executable documentation

Runs alongside production code
Normal docs are stale
Higher degree of confidence

Better than regular docs b/c higher chances of being up to date

Close the gap

Debugging is hard
Complain early and often
Close gap between symptom and cause
Alert invalid conditions ASAP

Debug info when it counts

Good messages with parameters/local state
Invalid assumptions about environment
Avoid unreproducible bugs

More information leads to:

New use cases
Documentation oversights

Add in debug information you'll need BEFORE you need it!
Remember our assert messages? They showed input params and our state

Assert downsides

Only available in debug mode
Increased code noise
No dynamic control

Use -O to run in optimized mode, turns off asserts for performance
Default is debug on
Docs say it's clear to use for debug, not production
Can't control asserts dynamically, can't assign to debug

Good assert usage

Check return values, not inputs
Document usage in style guide
Don't ruin duck-typing

Checking returns can allow you to complain about inputs if you get wrong answer, so it's a 2 for 1

Avoid this

 1 def normalize_ranges(colname):
 2     assert isinstance(colname, str)
 3     orig_range = get_base_range(colname)
 4     assert orig_range['datamin'] >= 0
 5     assert orig_range['datamax'] >= 0
 6     assert orig_range['datamin'] <= orig_range['datamax']
 7     colspan = orig_range['datamax'] - orig_range['datamin']
 8     assert colspan >= 0, 'Colspan (%f) is negative' % (colspan)
 9 
10     live_data = get_column_data(colname)
11     assert len(live_data), 'Empty live data'
12     live_min = numpy.min(live_data)
13     live_max = numpy.max(live_data)
14 
15     ratio = {}
16     ratio['min'] = (live_min - orig_range['datamin']) / colspan
17     ratio['max'] = (live_max - orig_range['datamin']) / colspan
18 
19     assert 0.0 <= ratio['min'] <= 1.0
20     assert 0.0 <= ratio['max'] <= 1.0
21     return ratio

Logging

Multiple levels
Dynamic control
Tracebacks
Higher-level combination with asserts

Here's just a high-level view of some of the benefits of logging

Logging levels

Levels: exception, critical, warning, info...
Temporarily cut down on debug noise
Send levels to multiple locations

Dynamic control

Change log level dynamically
Important with alternative distribution
- Pyinstaller, py2exe

Read log level from database/config file
Dynamic control is big step over asserts
Important b/c restarting interpreter could lose state/debug scenario

Tracebacks

Built-in with exception level
Invaluable in error scenarios

Logging + asserts

Log assertions to different file/database
Lots of data-mining capabilities

Build a low-fi analytics system by logging to database, network file, dropbox
Shell scripts to collect simple text files
Track what features people use

Logging + asserts

 1 def main():
 2     logging.basicConfig(level=logging.WARNING)
 3     log = logging.getLogger('example')
 4     try:
 5         throws()
 6         return 0
 7     except AssertionError:
 8         log.debug('Error: %s' % (err))
 9     except Exception as err:
10         log.exception('Error from throws():')
11         return 1

Logging downsides

Difficulty maintaining consistent levels
Multiple loggers can be complicated
- Logging cookbook
- http://docs.python.org/2/howto/logging-cookbook.html

Two CS problems, naming and caching
Levels == naming
Be careful/mindful logging could be your best defense against a production bug

Unit tests

Not just for Test Driven Development
Test bug fixes
Re-factoring insurance

Hard to get manager buy-in b/c it takes longer usually up-front
Release early, find bugs, write tests, repeat

Unit test benefits

Improves documentation of bug fix
Answer 5 Ws
- http://en.wikipedia.org/wiki/Five_Ws
Future-proof code against duplicate bugs
- Old bugs have tendency to come back

Think executable documentation!
Document how you tested it, what the bug scenario was
Good unit test is best doc for weird bug fix
- test will include comments, data to re-create bug, steps for scenario
Refactoring has a tendency to allow old bugs to creep in

Unit test downsides

Hard to run
Not usually located close to real code
Some environments hard to duplicate
False positives

Doctests are nice b/c they are right there with code but can be ugly
- Problems for documentation generators like sphinx?

Unit test tools

Pytest
Nose
doctests
Travis CI

Tools to help mitigate the hard to run aspect of tests

Suit up, defense!

I consider this talk to be a blend of motivation and pessimism by saying all your code as bugs.
So, I'd like to end this talk on a high note.
Hopefully I've given you some new tools and perspective so you're properly suited up for your next bug battle like this Thorgi.

Links

Questions?

</soapbox>

This talk is pretty pessimistic b/c sadly a lot of being a programmer means thinking of what can and will go wrong. However, the irony of this talk is that the ideas are presented in an optimistic manner.
For example, 'here is your problem. now here is the easy solution.'
This obviously isn't the real world:
- Take these ideas and try it out
- Use what works, throw away what doesn't
- The important thing is to try something different and change your mindset from preventing bugs to a mindset of they will happen no matter what I do, so how can I fight back and make the maintenance phase of a project easier now with the smallest amount of effort.