1.  Is simuPOP fast enough for my applications?

A few new simulators declared that simuPOP is too slow for certain applications but have failed to provide any concrete numbers to support their claims. Despite of the Python interface, simuPOP is implemented in C/C++ and is highly optimized in CPU and memory usage. With its ability to execute a simulation using multiple threads, simuPOP provides excellent performance for most applications. For example, a comparison of six simulators for a real-world simulation of the evolution of adaptive traits showed that simuPOP ranks number two in performance, about 50% slower than a simulator designed for the application. Whereas a well-implemented special-purpose simulator can be faster than simuPOP through the use of specialized algorithms, the speedup does not usually justify the effort, and the loss of reliability, flexibility, and extensibility of a simuPOP-based implementation.

2.  Is simuPOP the right tool for my application?

There are quite a number of population genetics simulation programs that have been created for various purposes. If one of them happens to fit your need, it may be easier to use (at least you do not need to write a script) or has better performance. The Genetic Simulation Resources website maintains a catalogue of a large number of genetic simulators that can help you locate the right simulation tool for you.

If you cannot find a program that fits your need, you might want to browse this website and get some idea how simuPOP works. It might be a good idea to send an email to the simuPOP MailingList, describing the kind of simulation you would like to perform. Some simuPOP users might have run similar simulations and are able to provide useful information on how to implement your simulation with simuPOP.

3.  Can I generate populations with different inbreeding coefficient (or effective population size)?

simuPOP simulates random and well-defined non-random mating schemes. Because statistics such as effective population size and inbreeding coefficient can be calculated or estimated in different ways, they are considered as observations and cannot be used to direct mating scheme. On the other hand, because different mating schemes can lead to the same statistic, it is not always clear what is an appropriate mating scheme to simulate a population with desired property.

In the case of inbreeding, any of the following mating schemes can introduce inbreeding, and even be tweaked to demonstrate similar levels of inbreeding:

  1. a small population size that mimics a large family so that everyone is more or less related.
  2. a large population consists of small subpopulations within which inbreeding exists.
  3. a large population in which most people outbreed but some people inbreed.
  4. a large population where everyone has a probability to breed with his or her close relative.

If you choose a particular mating scheme, it would be possible to calculate theoretical inbreeding coefficients from simulation parameters, and vise versa. The easiest one can perhaps be defined as a probability p to mate with his/her cousin (sibling?), and 1-p to mate randomly. If you can derive p from inbreeding coefficient f in this case, a mating scheme can be easily (with some pitfalls that I can help you get around) defined.

4.  How scalable is simuPOP?

The maximum population size that simuPOP can simulate depends on the size of memory of a computer, and the speed of simulation depends on CPU speed and number of cores. The higher specification you have, the more scalability of simuPOP. Most of the memory allocated for a simuPOP population is used to store genotypes and individual information so you can estimate the usage of memory by number of individuals and loci, and type of allele, which differs between short (1 byte per locus), binary (1 bit) and long (4-8 bytes) modules. The following table lists the estimated maximum population size of a simuPOP population:

Allele Type Maximum Population Size
short (memory size(GB) * 1024 * 1024 * 1024) / (2 * loci * ploidy + 56)
long (memory size(GB) * 1024 * 1024 * 1024) / (8 * loci * ploidy + 56)
binary (memory size(GB) * 1024 * 1024 * 1024) / (loci * ploidy / 4 + 72)

For example, the maximum population size of 8 GB memory with allele type=short, loci=100, and ploidy=2 is
(8 * 1024 * 1024 * 1024)/(2 * 100 * 2 + 56) = 18,837,575
The following is the python script used to estimate the maximum population size of basic random mating scheme:

 Δ
estimatedSize.py
import os, sys, timeit, time

alleleType='short'
loci = 1000
ploidy=2
# Number of generation
gen = 10
# Memory Size (GB)
memsize = 8
# Number of processors(CPU Core)
numThreads=1


from simuOpt import setOptions
setOptions(quiet=True)
setOptions(alleleType = alleleType)
setOptions(numThreads = numThreads)
from simuPOP import *

if alleleType == 'short':
    size = (memsize * 1024.0 * 1024.0 * 1024.0) / ((loci*ploidy*1.0 + 24.0)*2.0 + 8)
elif alleleType == 'long':
    size = (memsize * 1024.0 * 1024.0 * 1024.0) / ((loci*ploidy*4.0 + 24.0)*2.0 + 8)
elif alleleType == 'binary':
    size = (memsize * 1024.0 * 1024.0 * 1024.0) / ((loci*ploidy/8.0 + 32.0)*2.0 + 8)
mating = timeit.Timer(
    setup = 'from __main__ import Population, InitSex, RandomMating,'
        'MendelianGenoTransmitter\n'
        "pop = Population(size=%d, loci=%d, ploidy = %d)" % (size, loci, ploidy),
    stmt = "pop.evolve(\n"
        "initOps=InitSex(),\n"
        "matingScheme=RandomMating(ops=MendelianGenoTransmitter()),\n"
        "gen=%d)" % gen)
print "Maximum number of population size: %d\nTime(sec):%f" % (size,
    mating.timeit(number=1))

Next, we show the example of simuPOP experiment with basic random mating scheme. The technical hardware and environment specification used in the experiment is as follows:

  • Model Name: Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz
  • Number of Processors: 8
  • Memory: 15.5 GB (16,650,493,952 bytes)
  • Operation system: Red Hat Enterprise Linux Workstation Release 6.1
  • Compiler: GNU compiler

Our system have 15.5 GB memory. We can't use all 15.5 GB to run simuPOP, because we have to leave some space of memory for operation system. If we don't leave the memory for the operation system or allocated memory more the 15.5 GB, the operation system will map the memory to disk (virtual memory) causing the simuPOP run extremely slow. In the experiment we show the maximum population in the memory size 4, 8, 12, and 14 GB respectively as follows:

loci = 100, ploidy= 2, gens=10
AlleleType short long binary
Memory Size CPU Core Max Population time (sec) Max Population time (sec) Max Population time (sec)
4GB 1 9,418,787 72.48 2,593,579 23.45 35,204,649 266.22
2 41.78 13.70 152.5
4 25.76 9.16 92.31
8GB 1 18,837,575 142.84 5,187,158 46.42 70,409,299 528.92
2 83.32 27.88 305.95
4 52.30 18.48 183.88
12GB 1 28,256,363 227.30 7,780,737 70.57 105,613,949 796.33
2 126.29 41.86 463.29
4 78.25 27.67 282.09
14GB 1 32,965,757 306.67 9,077,527 90.47 123,216,274 959.62
2 154.10 52.11 530.84
4 94.29 33.46 327.85
loci = 1000, ploidy= 2, gens=10
AlleleType short long binary
Memory Size CPU Core Max Population time (sec) Max Population time (sec) Max Population time (sec)
4GB 1 1,058,916 12.58 267,499 7.00 7,508,684 61.48
2 8.26 5.71 35.36
4 6.68 5.29 22.18
8GB 1 2,117,833 25.72 534,998 13.95 15,017,368 126.74
2 16.57 11.33 71.69
4 13.28 10.61 44.5
12GB 1 3,176,750 38.77 802,497 20.83 22,526,052 185.14
2 24.22 16.89 107.90
4 19.91 15.88 67.08
14GB 1 3,706,209 45.02 936,247 22.83 26,280,394 218.45
2 27.99 18.70 124.66
4 22.98 18.44 77.40

In addition, we demonstrate the experiment with the maximum loci size in the memory size 4, 8, 12, and 14 GB respectively as follows:

Population size = 1000, ploidy= 2, gens=10
AlleleType short long binary
Memory Size CPU Core Max Loci time (sec) Max Loci time (sec) Max Loci time (sec)
4GB 1 1,073,727 6.11 268,431 5.72 8,589,790 5.96
2 4.97 4.90 4.94
4 4.63 4.57 4.60
8GB 1 2,147,469 12.11 536,867 11.45 17,179,725 12
2 10.01 9.68 9.91
4 9.24 9.1 9.2
12GB 1 3,221,211 17.25 805,302 16.35 25,769,659 17.09
2 14.92 14.13 14.75
4 14.01 14.03 13.84
14GB 1 3,758,082 18.12 939,520 17.69 30,064,627 21.10
2 16.10 15.48 21.35
4 15.79 15.63 19.36
Population size = 100000, ploidy= 2, gens=10
AlleleType short long binary
Memory Size CPU Core Max Loci time (sec) Max Loci time (sec) Max Loci time (sec)
4GB 1 10,723 6.35 2,680 5.92 85,755 6.52
2 5.42 5.14 5.44
4 5.01 4.89 4.97
8GB 1 21,460 12.10 5,365 12.04 171,654 12.32
2 10.32 10.14 10.2
4 9.62 9.58 9.62
12GB 1 32,198 17.59 8,049 17.92 257,554 18.43
2 15.38 15.07 15.52
4 14.61 14.27 14.35
14GB 1 37,566 18.84 9,391 18.76 300,503 19.39
2 16.58 16.57 17.01
4 16.41 16.28 16.35

Finally, we show the worst scenario when we run the experiment in 15GB memory, which almost equals to the memory size of our system as follows:

AlleleType=short, loci = 1000, ploidy= 2, gens=5
Memory Size CPU Core Max Population time (sec)
15GB 4 3,970,938 31,581.11 (8.7 hours)

We can see the execution time of simuPOP is very slow because we use too much memory, and operation system don't have enough space to run its system.