Performance Analysis In Python

Warning

A Note from the Author

This post is an artifact from my early days of learning Python, originally written back in 2014. While some of the information might be outdated, I’m preserving it here as a snapshot of my learning journey.

Back when I was first learning my way around Python, I set out to find out more about performance optimization. This post chronicles what I discovered during that initial exploration.

Performance Analysis

Let’s start by defining what performance is and how to analyze it.

  1. Performance is a measure of how fast the application (or each component of the application) is running and how little memory it uses.

  2. Performance analysis is finding where the speed bottlenecks are and where the memory is leaking.

Info
In this article we will restrict ourselves to measuring and analysing the performance. Improving performance is a different topic altogether. Hopefully, I will have something to write about it someday. Depending on the need of the hour, you can choose any one or a combination of the tools and techniques explained here.

Good old time

While I use a Mac, the time utility is a standard tool with Unix ancestry.

1$ time python slowprogram.py
2python slowprogram.py  3.81s user 0.04s system 97% cpu 3.927 total

The sum of user and system time gives you a measure of how much time your program took (excluding time taken for I/O).

However, this doesn’t help much. While it tells you if the application is taking more time than expected, it doesn’t point to where the bottleneck is.

Python Timer Class - Context Manager

Corey Goldberg, in his blog here, has given a tiny timer class that can be used to time any block of code.

 1from timeit import default_timer
 2class Timer(object):
 3    def __init__(self, verbose=False):
 4        self.verbose = verbose
 5        self.timer = default_timer
 6        
 7    def __enter__(self):
 8        self.start = self.timer()
 9        return self
10        
11    def __exit__(self, *args):
12        end = self.timer()
13        self.elapsed_secs = end - self.start
14        self.elapsed = self.elapsed_secs * 1000  # millisecs
15        if self.verbose:
16            print 'elapsed time: %f ms' % self.elapsed

With this class, you can time any block of code by enclosing it in a with block:

 1from timer import Timer
 2from sorting import Sort
 3import random
 4
 5random_items = [random.randint(-50, 100) for c in range(5000)]
 6sort = Sort()
 7
 8with Timer() as t:
 9    sort.bubble_sort(random_items)
10print "=> elapsed bubble_sort: %s s" % t.elapsed
11
12with Timer() as t:
13    sort.insertion_sort(random_items)
14print "=> elapsed insertion_sort: %s s" % t.elapsed

Here is a sample output :

1$ python UsingTimer.py
2=> elapsed bubble_sort: 4177.52599716 s
3=> elapsed insertion_sort: 1.23190879822 s

The advantage of this approach is that you can enclose any suspicious piece of code inside the with block to see if it’s taking too long to complete. You can start with a large chunk of code and progressively narrow down the scope to pinpoint the problematic part.

Profiling

A profile is a set of statistics that describes how often and for how long various parts of the program get executed. The time utility described above is also a profiler - a simple one whose stats output is not very detailed.

The Python standard library provides three different implementations of the same profiling interface. The documentation is available here.

Simple approach

Simple way to profile a script is to invoke the cProfile when running a script:

1python -m cProfile UsingTimer.py

This prints the stats onto standard out. A lot of information is dumped, but the most important columns are:

  • ncalls: The number of times the function was called.
  • tottime: The total time spent in the function, excluding time spent in sub-functions.
  • percall: The average time spent per call (tottime / ncalls).
  • cumtime: The cumulative time spent in the function, including time spent in sub-functions.
  • percall: The average cumulative time spent per call (cumtime / ncalls).
  • filename:lineno(function): The file, line number, and name of the function.
=> elapsed bubble_sort: 4173.29096794 s
=> elapsed insertion_sort: 2.54702568054 s
         25073 function calls in 4.231 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.006    0.006    4.231    4.231 UsingTimer.py:1(<module>)
    1    0.000    0.000    0.000    0.000 __future__.py:48(<module>)
    1    0.000    0.000    0.000    0.000 __future__.py:74(_Feature)
    7    0.000    0.000    0.000    0.000 __future__.py:75(__init__)
    1    0.017    0.017    0.017    0.017 hashlib.py:55(<module>)
    6    0.000    0.000    0.000    0.000 hashlib.py:94(__get_openssl_constructor)
    1    0.002    0.002    0.002    0.002 heapq.py:31(<module>)
    1    0.000    0.000    0.001    0.001 random.py:100(seed)
 5000    0.011    0.000    0.011    0.000 random.py:175(randrange)
 5000    0.003    0.000    0.014    0.000 random.py:238(randint)
    1    0.008    0.008    0.026    0.026 random.py:40(<module>)
    1    0.000    0.000    0.000    0.000 random.py:655(WichmannHill)
    1    0.000    0.000    0.000    0.000 random.py:72(Random)
    1    0.000    0.000    0.000    0.000 random.py:805(SystemRandom)
    1    0.000    0.000    0.001    0.001 random.py:91(__init__)
    1    0.001    0.001    0.003    0.003 sorting.py:1(<module>)
    1    0.002    0.002    0.003    0.003 sorting.py:11(insertion_sort)
    1    0.000    0.000    0.000    0.000 sorting.py:3(Sort)
    1    4.084    4.084    4.173    4.173 sorting.py:4(bubble_sort)
    1    0.000    0.000    0.000    0.000 timeit.py:105(Timer)
    1    0.005    0.005    0.005    0.005 timeit.py:53(<module>)
    1    0.001    0.001    0.006    0.006 timer.py:1(<module>)
    2    0.000    0.000    0.000    0.000 timer.py:12(__exit__)
    1    0.000    0.000    0.000    0.000 timer.py:3(Timer)
    2    0.000    0.000    0.000    0.000 timer.py:4(__init__)
    2    0.000    0.000    0.000    0.000 timer.py:8(__enter__)
    1    0.000    0.000    0.000    0.000 {_hashlib.openssl_md5}
    1    0.000    0.000    0.000    0.000 {_hashlib.openssl_sha1}
    1    0.000    0.000    0.000    0.000 {_hashlib.openssl_sha224}
    1    0.000    0.000    0.000    0.000 {_hashlib.openssl_sha256}
    1    0.000    0.000    0.000    0.000 {_hashlib.openssl_sha384}
    1    0.000    0.000    0.000    0.000 {_hashlib.openssl_sha512}
    1    0.000    0.000    0.000    0.000 {binascii.hexlify}
    1    0.001    0.001    0.001    0.001 {function seed at 0x109498d70}
    6    0.000    0.000    0.000    0.000 {getattr}
    6    0.000    0.000    0.000    0.000 {globals}
 5002    0.001    0.000    0.001    0.000 {len}
    1    0.000    0.000    0.000    0.000 {math.exp}
    2    0.000    0.000    0.000    0.000 {math.log}
    1    0.000    0.000    0.000    0.000 {math.sqrt}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
 5000    0.001    0.000    0.001    0.000 {method 'random' of '_random.Random' objects}
    1    0.000    0.000    0.000    0.000 {posix.urandom}
 5003    0.089    0.000    0.089    0.000 {range}
    4    0.000    0.000    0.000    0.000 {time.time}

From this output, we can identify which modules and functions are heavily used. This allows us to either improve the function or refactor the code to reduce the number of calls to it.

In the above example, clearly, bubble sort is taking too much time. See for yourself:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    4.084    4.084    4.173    4.173 sorting.py:4(bubble_sort)

Looking at the tottime and cumtime for bubble_sort, we can see that it’s the main consumer of time.

Profiling a single statement

Sometimes it is not a good idea or not required/possible to profile an entire script. In those scenarios you can profile individual statements:

1cProfile.run('sort.bubble_sort(random_items)')   

The stats, as before, are printed on to standard out.

 1$ python cprofile_simple.py
 2     10005 function calls in 3.732 seconds
 3
 4Ordered by: standard name
 5
 6ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 7    1    0.000    0.000    3.732    3.732 <string>:1(<module>)
 8    1    3.653    3.653    3.732    3.732 sorting.py:4(bubble_sort)
 9 5001    0.001    0.000    0.001    0.000 {len}
10    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
11 5001    0.079    0.000    0.079    0.000 {range}  

Profiling Memory Usage

So far, we’ve focused on execution time. Now, let’s look at how to profile memory usage in a Python script.

I will show you a few examples of memory profiling using the memory_profiler package.

1pip install -U memory_profiler

memory_profiler requires that you decorate your function of interest with an @profile decorator:

 1$ python memprofile_sample.py
 2Filename: memprofile_sample.py
 3
 4Line   Mem usage    Increment   Line Contents
 510      8.7 MiB      0.0 MiB   @profile
 611                             def test():
 712      9.2 MiB      0.4 MiB       sort.bubble_sort(random_items)
 813      9.2 MiB      0.0 MiB       sort.insertion_sort(random_items)
 914      9.3 MiB      0.2 MiB       sort.quick_sort(random_items)
1015      9.4 MiB      0.0 MiB       sort.heap_sort(random_items)

The output shows:

  • Line: The line number of the code that was profiled.
  • Mem usage: The memory usage of the Python interpreter after that line has been executed.
  • Increment: The difference in memory usage from the previous line.
  • Line Contents: The code that was profiled.

Memory Leak

Python uses reference counting to keep track of objects in use. What this means is each of the objects that the application creates holds a numeric count. This count represents the number of references to an object at any given time. The count is incremented every time a new reference to the object is created and decremented when a reference is deleted. This way, when the ref-count of an object becomes zero, the Python interpreter knows that the object is no longer required and can be destroyed, thus reclaiming the memory used by that object.

If Python automatically handles deleting unwanted objects how does the memory leak?

A common way for the Python interpreter to know that an object is unwanted is by looking at the number of references it has. If a developer isn’t careful about removing references, the count may never reach zero, preventing Python from reclaiming the memory. While this is a common cause of memory leaks, other more complex scenarios can also occur. However, for many applications, carefully managing references is a key part of preventing memory issues.

Tip

How to spot a memory leak

How do I know if my application is leaking memory? Well, intuitively, if the memory usage of the application grows over time but the amount of data it holds remains the same, there is memory leak.

All the code used in this article is available here