Finding Optimizations in Python With Program Profiling
This blog post discusses profiling methods, specifically for the Python programming language.
Within the data science team, one of the things we are working to build is a processing model for large amounts of textual and review data using natural language processing.
Because we are processing data at such a large scale, it is important that our model is properly optimized to reduce any unnecessary overhead. As such, it is important to identify which areas in our code are taking up the most time. This is where profiling comes in.
Program profiling is a form of analysis that measures things such as the memory usage, time usage, the usage of particular instructions, or the frequency and duration of function calls. It is a way to understand where the largest amount of resources are being spent in order to target optimizations to these areas.
Our use case was to find optimizations in a series of Python files used in our model. In order to find which parts of the program were stalling execution, profiling was used. Python has many native and third party profiling tools that allow for a range of analysis for runtime, memory usage and visualization. Some of the tools we looked at were cProfiler, line_profiler, memory_profiler and QCachegrind. For the purpose of our use case, we are most interested in profiling methods that enable us to see which parts of the program were using up the most time, and if there are any blocking resources.
Profiling Using the Standard Python Library
Profiling Python can be done with the standard Python library, as well as third party modules and programs.
The standard Python library provides three different implementations of the same profiling interface: cProfile, Profile and Hotshot. The most popular of the three is cProfile.
cProfile can be run in terminal, as well as imported as a module in Python.
It shows profiling results by functions for time for ncalls tottime percall cumtime percall, (number of calls to that function, total time of that function excluding calls to other functions, time per call, cumulative time of the function and other function calls, time per cumulative call).
197 function calls (192 primitive calls) in 0.002 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.001 0.001 <string>:1(<module>)
1 0.000 0.000 0.001 0.001 re.py:212(compile)
1 0.000 0.000 0.001 0.001 re.py:268(_compile)
1 0.000 0.000 0.000 0.000 sre_compile.py:172(_compile_charset)
1 0.000 0.000 0.000 0.000 sre_compile.py:201(_optimize_charset)
4 0.000 0.000 0.000 0.000 sre_compile.py:25(_identityfunction)
3/1 0.000 0.000 0.000 0.000 sre_compile.py:33(_compile)
Although we are able to see timing information on a function basis, we aren’t able to see which lines specifically are taking up the most time.
Here is an example that runs in the terminal with cProfiler and reduces the output to the 33 top lines with highest cumulative time:
python -m cProfile -s ‘cumulative’ program.py > temp_file && head -n 33 temp_file && rm temp_file
Third Party Profiling Modules
Third party modules include line profiler and memory profiler for line by line profiling, and QCacheGrind for program visualization.
line_profiler – Line-by-line Timing
line_profiler is a third party module that does line-by-line profiling in a Python program. It shows the time spent on each individual line in the Python program. After installing line_profiler, you use it by decorating the functions that you want to profile using ‘@profile’. Then you create a kernprof script of your Python file that can be used with line_profiler.
pip install line_profiler
kernprof -l myProgram.py
python -m line_profiler myProgram.py.lprof
Timer unit: 1e-06 s
Function: Proc2 at line 149
Line # Hits Time Per Hit % Time Line Contents
150 def Proc2(IntParIO):
151 50000 82003 1.6 13.5 IntLoc = IntParIO + 10
152 50000 63162 1.3 10.4 while 1:
153 50000 69065 1.4 11.4 if Char1Glob == ‘A’:
154 50000 66354 1.3 10.9 IntLoc = IntLoc – 1
155 50000 67263 1.3 11.1 IntParIO = IntLoc – IntGlob
156 50000 65494 1.3 10.8 EnumLoc = Ident1
157 50000 68001 1.4 11.2 if EnumLoc == Ident1:
158 50000 63739 1.3 10.5 break
159 50000 61575 1.2 10.1 return IntParIO
In this output, we can see by line the amount of times a line is executed, the time per execution, the total execution time and percentage time usage. This helps you zero in on which lines are actually causing slowdowns in your program.
memory_profiler – Line-by-line memory usage
memory_profiler is another third party package that is similar to line profiler. It does line by line profiling of a Python program with memory as opposed to time.
pip install -U memory_profiler
python -m memory_profiler example.py
Line # Mem usage Increment Line Contents
4 5.97 MB 0.00 MB def my_func():
5 13.61 MB 7.64 MB a =  * (10 ** 6)
6 166.20 MB 152.59 MB b =  * (2 * 10 ** 7)
7 13.61 MB -152.59 MB del b
8 13.61 MB 0.00 MB return a
The output is similar to line profiler. In the output above, it is seen that memory usage increased when required computation power increased. It is helpful if a program is doing operations that require a lot of memory.
Visualization with QCacheGrind
QCacheGrind is a visual profiling tool, it can be used to view the call stack of a program and see the cumulative time usage of each function in the call stack. You can visually trace through the call stack, and even view the time usage line by line of the source file.
pip install pyprof2calltree
brew install graphviz
brew install qcachegrind –with-graphviz
python -m cProfile -o myscript.cprof myProgram.py
pyprof2calltree -k -i myscript.cprof
Result and Comparison
Profiling helped us zero in on an iteration loop that was taking a large percentage of time. It turns out repeated index references to a large DataFrame object were driving a large percentage of the time usage. This is because while the Pandas DataFrame is a powerful data structure to apply vector operations and aggregation across large amounts of data, it’s inherently a slower data structure when it comes to accessing indexed rows repeatedly or iterating through a number of rows compared to a simple dictionary. After identifying this via profiling, the program was optimized by converting the DataFrame to a list of dictionaries.
We found line_profiler to be the most useful in terms of finding which areas of code to optimize. Using line_profiler we can see the percentage time usage of each function we are interested in line by line. Tools such as cProfile and QCachegrind are able to give a broad perspective on which functions are taking the most time, but do not show which lines of the function are the trouble areas. memory_profiler is good for programs that use heavy amounts of memory, but for our use case memory was not limited.