1.  Porting simuPOP to Python 3. (Aug, 30, 2010)

Because Python 3 is not compatible with Python 2.x, porting simuPOP to Python 3 is non-trivial. Because of the amount of work that is required to port a module, many popular Python modules, such as rpy and wxPython used by simuPOP, have not been ported. This has in turn slowed down the adoption of Python 3. This chicken-egg problem can only be solved by time.

It is certainly not easy to port simuPOP to Python 3 because the simuPOP core is deeply entangled with Python. Other than routine 2to3 API changes, I had a great deal of trouble for secure usages of STL (_SECURE_SCL), redefining carray and defdict types in Python 3, debugging a SWIG bug (interaction between the wrapper and my own __setattr__ and __getattr__ functions), and a namespace issue with python list comprehension. Anyway, after a week's intense work, all tests run successfully for Python 3. I am listing the major steps here in case that they can help other Python module maintainers port their modules to Python 3.

  • Use 2to3 to translate Python code from Python 2 to 3. A Makefile or batch file is recommended to automate the process.
  • Use sys.version_info.major == 3 to test the version of Python and add Python3-specific code to Python modules. For example, you can add option '-py3' to the swig command to generate python3-specific wrapper code.
  • Use python setup.py build to compile the module and try to correct for errors. Add python3 specific code using something like
#if PY_VERSION_HEX >= 0x03000000

#  define PyString_Check PyUnicode_Check
#  define PyString_FromStringAndSize PyUnicode_FromStringAndSize
#  define PyString_FromString PyUnicode_FromString
#  define PyString_Concat PyUnicode_Concat
#  define PyString_ConcatAndDel PyUnicode_ConcatAndDel

#  define PyInt_Check PyLong_Check
#  define PyInt_AsLong PyLong_AsLong
#  define PyInt_FromLong PyLong_FromLong
#  define PyNumber_Int PyNumber_Long
#  define PyInt_FromString PyLong_FromString
#  define PyInt_Type PyLong_Type
#  define PyString_Type PyUnicode_Type

#endif
Some functions are more involved, such as the conversion between char* and unicode. Fortunately, the swig wrapper code provides some functions that you can borrow. :-)
  • Python 3.1 (and 2.7) has _SECURE_SCL defined by default. Under windows, any insecure use of iterators will cause the module to crash (without even a warning or error message). Such errors are triggered by insecure usages of iterators such as *ptr + 1 == myvector.end() + 1 and I was able to locate them relatively easily later on.
  • The SWIG wrapper code works for both Python2 and Python3 but different implementations are used in some cases. However, my wrapper does not work well under Python3 and I had to trace into the wrapper code. I noticed that swig uses fast-get-set to get/set a "this" pointer (access object dictionary directly) for Python 2.x, but calls the regular __setattr__ and __getattr__ functions for Python 3. The problem was then clear because my code redefines __setattr__ and __getattr__ for a class. (See this email for details). After handling this special case, the wrapper code works fine for Python 3.
  • I then encountered a problem where expression [alleleFreq[x][0] for x in range(3)] stops working. It turned out that I was evaluating such expressions using population dictionary as locals, and module dictionary as globals, and such expression requires alleleFreq to be available in globals. The only complete solution to this problem is to merge local dictionaries to the module. However, it is possible that the same variable exists in module_dict and pop_dict so copying variables from pop_dict to module_dict is dangerous. An alternative is to create a combined dictionary from deep copies of module_dict and pop_dict so expression evaluations will not affect module_dict. I really did not want to do this because of performance considerations. In the end, I added __builtins__ to pop_dict and used it as both locals and globals. This effectively limits expressions to __builtins__ and population variables, but should be sufficient in almost all applications. (Please see this email for details).
  • After some testing and things appear to be working. I revised setup.py so that it works with both Python 2.x and Python 3.x. This is mostly easy to do because I only need to change a bunch of print statements, and their function form works well in Python 2.x because print(a) will be interpreted as printing a tuple with one element in Python 2.x. I then added
try:
   from distutils.command.build_py import build_py_2to3 as build_py
except ImportError:
   from distutils.command.build_py import build_py

to my setup.py and
        cmdclass = {'build_py': build_py}
to its setup() function. In this way, setup.py will automatically convert my .py files when setup.py is executed in python 3.
  • Keeping python2 compatible changes from 2to3 and using sys.version_info for the rest of them, I modified test cases and examples so that they are compatible with both Python 2.x and 3.x. This is very helpful because I can provide a single version of documentation for both python versions.

In the end, simuPOP is now fully compatible and I only need to provide one set of source code, test, and documentation because they all work for both Python 2.x and Python 3.x. This is a huge relief because it would be troublesome to provide two sets of code for python 2.x and python 3.x.

A pending issue with simuPOP/Python3 is graphics support. Although rpy has not been updated for a while, it can be compiled for python 2.x and all R versions. This is not the case for Python 3. Because rpy2 recently added support to python 3, I may have to switch to rpy2 although I like rpy much better than rpy2. I do not have time for this now so perhaps I will leave this to 1.0.5 or later. Considering all the trouble in maintaining rpy, it might have been a mistake to use rpy in simuPOP.

Anyway, I am glad that simuPOP now supports Python 2.7 and 3.1 and I did fix a few bugs and cleaned the code along the way. Because 2.7 will be the last Python 2.x release, compatibility with Python will hopefully not an issue for quite some time.

Finally, I would like to thank Georg Brandl and Benjamin Peterson for their prompt and helpful answers to my questions in the python-porting mailinglist. If you have any question in porting your module to Python 3, that mailinglist would be a great resource to use.

2.  Why the development of a MPI version of simuPOP was discontinued? (Mar. 10, 2009)

A prototype of a MPI version of simuPOP was added to simuPOP 0.7.5 in Dec. 2006. The general idea worked and I was even able to run small scripts using it. However, because a MPI version could not achieve its initial design goals, and because a full implementation required major revision to the simuPOP core, the MPI code was removed in simuPOP 0.8.3.

When the MPI version of simuPOP was first designed, 2G of RAM sounded huge and 64 bit operating systems were rare. A MPI version seemed to be the only way to break the 4G RAM barrier of 32 bit operating systems. I also hoped that a MPI version could significantly improve the performance of large simulations.

However, compare to other single-executable programs, a MPI implementation of simuPOP was extremely difficult to design. Because simuPOP is a programming environment, arbitrary user logic could be used. For example, a user could change genotype of a random individual using the Python random module. Different individuals could be chosen if the script is executed separately on different nodes, and lead to erroneous results. The only feasible MPI design would be a master-slave model where a master node interprets a script, and sends very detailed instructions to the slave nodes. However, this model requires a large amount of communication between nodes, especially with population changes. Consequently, the MPI modules may not provide any performance advantage over a regular module. This was more or less confirmed using my experimental implementation.

And you know what happened next. RAM became cheaper and cheaper and even home computers got 4G or more RAM. Dual-core or quad-core machines became commonplace and 64 bit operating systems became mainstream. Because it became easy to simulate large populations on a regular workstation, there was less and less a need for the MPI version of simuPOP.

Another reason for the removal of the MPI version is because I am looking into an openMP implementation. Using a shared-memory architecture, I might be able to simulate several replicates, or produce multiple offspring simultaneously using different threads. The performance boost could be dramatic. In addition, this implementation requires little modification to the simuPOP codebase and it is possible that I can distribute simuPOP modules that can run on both single and multi-core machines...

If everything moves as planned, simuPOP 1.0.x will be bug fix releases of simuPOP 1.0, and the 1.1.x releases will have openMP support.

3.  Icc vs. Gcc: which one is faster for simuPOP simulations? (Feb, 21, 2009)

simuPOP uses Visual C++ 2003 (win32, Python 2.4 and 2.5), Visual C++ Express 2008 (win32, Python 2.6), and GCC/G++ for all other platforms (MacOS, Linux, Solaris). These compilers are chosen because they are the compilers used for the official Python distributions.

Intel icc is usually considered to generate (much) faster code than gcc. I tried icc before, using a simuPOP version around 0.6.0 so I am interested to see whether or not simuPOP 0.9.2 can be compiled with icc.

Here are the steps:

  • Download icc from Intel ICC website. The linux non-commercial version is free, and is the version I use.
  • Install icc. I uses a separate user account and install icc locally to that user so that it will not mess up with my current development environment. I use csh and set a ~/.cshrc file as follows:
source /my/home/intel/Compiler/11.0/081/bin/iccvars.csh intel64
setenv PATH /my/home/Python26/bin:${PATH}
  • Download Python 2.6 source code, and compile as follows
> tar zxf Python-2.6.tgz
> cd Python-2.6
> setenv CC icc
> setenv CXX icc
> ./configure --prefix=/my/home/Python26
> make
> make install
Note that some modules could not be compiled by icc.
  • (Optinal) download and install scons.
  • Then, I check up a clean copy of simuPOP, and compile simuPOP as usual:
    > python setup.py install
    
or
> scons install
if scons is installed.

There are a lot more remarks (warnings) than gcc and many of them do not make much sense to me. Anyway, simuPOP is compiled successfully. All simuPOP tests and examples in the simuPOP user's guide run smoothly.

Does icc really help the performance of simuPOP? I do not have time for a thorough test so I run a typical random mating test in test_21_performance.

> cd simuPOP/test
> python test_21_performance.py TestPerformance.TestRandomMating

for both icc compiled and gcc compiled simuPOP (both for Python 2.6). Here is the result (execution time in seconds, the shorter the better):

  N=10k N=100k N=1000k
    plain with selection with migration plain with selection with migration plain with selection with migration
Gcc 4.1.2 binary 0.30 0.35 0.53 4.41 7.43 9.02 85.33 101.88 141.08
short 0.28 0.32 0.48 3.89 7.04 8.49 94.58 115.20 150.69
long 0.29 0.33 0.50 4.17 7.73 8.52 101.29 118.75 156.76
Icc 11.0 binary 0.34 0.41 0.62 4.83 8.17 10.02 89.03 107.23 151.62
short 0.32 0.38 0.57 4.21 7.83 9.07 97.84 121.65 160.67
long 0.29 0.35 0.55 4.40 8.18 8.97 104.4 121.97 165.00

It is still quite obvious that gcc is better in all cases, which comes as a surprise to me. Anyway, these numbers are from single runs and icc may not get the right optimization flags. If you have any suggestion, please feel free to let me know.

The comparison is done on a 3-year old DELL Precision 650 workstation with Dual Xeon CPU (3.73GHz) and 4G RAM, running RHEL5 x86-64. Just out of curiosity, I also run the same tests on a new PowerMac with a Quad-Core CPU (2.6G) and 8G of RAM, and a PC with a Quad-core CPU (Q6600) and 3G RAM, running 32bit of windows Vista. The benchmark from the Mac machine is impressive. It may be a good time to replace my linux workstation with a Mac. :-)

  N=10k N=100k N=1000k
    plain with selection with migration plain with selection with migration plain with selection with migration
MacOS Gcc 4.1.2 binary 0.23 0.34 0.52 2.31 3.37 5.54 40.12 75.70 100.56
short 0.24 0.34 0.49 2.51 3.60 5.53 40.06 75.19 100.26
long 0.24 0.33 0.48 2.25 3.38 5.49 40.51 75.08 101.39
Windows Vista Visual C++ binary 0.53 0.72 0.93 5.53 8.57 10.02 85.38 123.79 141.51
short 0.43 0.60 0.81 4.55 7.04 8.66 81.89 118.15 134.67
long 0.43 0.61 0.82 4.78 7.59 8.90 88.21 121.93 138.19