No. 1, Python Way Python Best Practices

This contributes to programmers’ carrying over their previous language’s ‘best-practices’ into Python. A lot of times, these practices end up slowing down your Python programs considerably.
In the last month’s issue, we looked at how we can speed up Python by writing extensions using Pyrex. However, writing extensions should be taken up only when you are very sure that you have gotten the best performance out of pure Python code.
In this article, we will look at some of the idioms and best practices which will make Python programs much more efficient.

Profile your code 
One can not optimise something which does not have a metric. Only by knowing the current performance of the system, we can hope to optimise it further. Profiling is used to determine the time taken by each part of your program in the execution cycle. Once the computationally heavy parts are identified, we can rewrite it to be faster.

import hotshot
import hotshot.stats
profile_data = 'pdata.txt'
def profCode(): 
global profile_data
print "Doing Profiling..."
profile = hotshot.Profile(profile_data)
profile.run("MyFunction()")
profile.close()

This function illustrates the use of ‘HotShot’ profiling module of Python. This module is built into the language libraries, from version 2.2 onwards. We are passing the function which has to be profiled to the profile.run method and dumping the results into a file pdata.txt.

def printProfileStats():
global profile_data
print "Dumping profiling data..."
stats = hotshot.stats.load(profile_data)
stats.sort_stats("cumulative")
stats.print_stats()

This dumps the data in this fashion:

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 123.258 123.258 myfile.py:1 (myfunc) 

This tabulation gives the information on the time consumed by each function. You can start attacking the functions that are eating up most of your cycles.

But how?
Optimising any function to perform better is a delicate balance of tuning the underlying algorithm specific to the problem domain and making use of best language features and idioms. Lets look at some of the Python practices, which, applied consistently, can speed up our apps.

Basics
Chained Comparisons are faster than using ‘and’ operator.
x<y<z
...is faster than...
x < y and y < z
Multiple assignments are slower than individual assignments.
a,b = x,y
...looks cool, but...
a=x ; b = y
...is faster. However, for swapping two variables, multiple assignments is faster.
a,b=b,a
...is faster than...
x=a;a=b;b=x
Also...
if x is not None
...is better than...
if x !=None
...because, checking for object’s identity (where only the address of the object is looked up), is faster than equality checks.

Efficient String Concatenations
The slow and 'usual' method:

>>> mystr = ''
>>> for n in range(300):
>>> mystr += `n`

Note: Backticks substitute the value of n. The other alternative of using str(n) will be even slower.
String concatenation is best done with ''.join(seq) which is an O(n) process. In contrast, using the '+' or '+=' operators can result in an O(n**2) process because new strings may be built for each intermediate step.

>>> seq = ['asia','africa','america','australia','europe']
>>> result = ''.join(seq)
>>>result
'asiaafricaamericaaustraliaeurope'

Function calls
Calling functions from inside of a tight inner loop is a bad idea. Function call overhead is large compared to other instructions. So, it’s better to inline code in such situations. For instance:

>>> sum = 0 
>>> def add1(x):
>>> global sum
>>> sum = sum + x
>>> list = range(10000)
>>> for i in list:
>>> add1(i)
>>> print sum

...is much slower than...

>>> sum = 0
>>> def add2(list):
>>> global sum
>>> for i in list:
>>> sum = sum + i
>>> return sum

Sorting
(as outlined by Guido Van Rossum) Sorting lists of basic Python objects is generally pretty efficient. The sort method for lists takes an optional comparison function as an argument that can be used to change the sorting behavior. This is quite convenient, though it can really slow down your sorts. An alternative way to speed up sorts is to construct a list of tuples whose first element is a sort key that will sort properly using the default comparison, and whose second element is the original list element. This is the so-called Schwartzian Transform. Suppose, for example, you have a list of tuples that you want to sort by the n-th field of each tuple. The following function will do that.

def sortby(somelist, n):
nlist = [(x[n], x) for x in somelist]
nlist.sort()
return [val for (key, val) in nlist]

Matching the behavior of the current list sort method (sorting in place) is easily achieved as well:

def sortby_inplace(somelist, n):
somelist[:] = [(x[n], x) for x in somelist]
somelist.sort()
somelist[:] = [val for (key, val) in somelist]
return

Here's an example use:

>>> somelist = [(1, 2, 'def'), (2, -4, 'ghi'), (3, 6, 'abc')]
>>> somelist.sort()
>>> somelist 
[(1, 2, 'def'), (2, -4, 'ghi'), (3, 6, 'abc')]
>>> nlist = sortby(somelist, 2)
>>> sortby_inplace(somelist, 2)
>>> nlist == somelist
True
>>> nlist = sortby(somelist, 1)
>>> sortby_inplace(somelist, 1)
>>> nlist == somelist
True

Use filter(),reduce(), map() and List comprehensions
If you want to apply a particular function to a list of items, use map() instead of iterating over the list. Don't do:

>>> taxes = []
>>> for person in persons:
>>> taxes.append(calc_tax(person))

Instead, do:

taxes = map(calc_tax, persons)

Written in List Comprehension style:

taxes = [calc_tax(person) for person in persons]

The FP methods are much faster because the looping entirely takes place in the C API.
Note: Refer to February 2005 issue of DeveloperIQ for a tutorial on Functional Programming in Python.

Use inbuilt iterators
Python is a Dynamically typed language. So, you should not care about the type of the object as long as it supports a particular interface. Many tools come with inbuilt iterators and list form. Use ‘in’ as much as possible. Most often than not, programmers new to Python would write:

for i in range(len(arr)):
print arr[i]

...where in Python you can as well write:

for i in arr:
print i

...as long as you don’t require the index. The same logic can be applied to accessing values in a dictionary:

Good: for k in mydict: foo(k) # works even if dict is changed to a list type
bad: for k in mydict.keys(): foo(k) #limits code to work only with objects with key() method

This can be added to your own class by overloading __contains__ function. However, you need to watch out, when the ‘in’ operator is used to scan sequences. Let’s say, you are trying to find the intersection of two lists, i.e., extract the common elements of two arrays; the usual of doing it is:

>>> common = []
>>> for x in shortlist:
>>> if x in longlist:
>>> common.append(x)

This works fine as long as ‘longlist’ is really short. The loop becomes slower and slower as the size of ‘longlist’ increases. The better of achieving the same result would be to replace ‘longlist’ with a dictionary.

>>> common = []
>>> dictionary = dict([(x, 1) for x in longlist])
>>> for x in shortlist:
>>> if x in dictionary:
>>> common.append(x)

If you are using Python 2.4, use ‘sets’ to do the intersection.

>>> set1 = set([1,2,3,4])
>>> set2 = set([1,2])
>>> print set1.insersection(set2)
>>> set([1,2])

 Pradeep Kishore Gowda is a Software Engineer based out of Bangalore. He is a GNU/Linux user since 1997 and is an active member of BangPypers, the Bangalore Python User Group. He blogs at http://btbytes.com. Email: pradeep@btbytes.com
 




Added on May 7, 2007 Comment

Comments

Post a comment