Major features

simuPOP offers a long list of features, many of which are unique among all forward-time population genetics simulation programs. The most distinguished features include:

  1. simuPOP provides three types of modules that use 1, 8 or >=32 bits to store an allele. The binary module (1 bit) is suitable for simulating a large number of SNP markers and the long module (>=32 bits) is suitable for simulating some population genetics models such as the infinite allele mutation model. simuPOP supports different types of chromosomes such as autosome, sex chromosomes and mitochondrial, with arbitrary number of markers.
  2. simuPOP supports autosome, chromosome X, chromosome Y, and mitochondrial chromosomes as a special case of a group of customized chromosome types.
  3. An arbitrary number of float numbers, called information fields, can be attached to individuals of a population. For example, information field father_idx and mother_idx are used to track an individual’s parents, and pack_year can be used to simulate an environmental factor associated with smoking.
  4. simuPOP does not impose any limit on number of homologous sets of chromosomes, the size of the genome, or the number of individuals in a population. During an evolutionary process, a population can hold more than one most-recent generations. Pedigrees can be sampled from such multi-generation populations.
  5. An operator can be native (implemented in C++) or hybrid (Python assisted). A hybrid operator calls a user-provided Python function to implement arbitrary genetic effects. For example, a hybrid mutator passes to-be-mutated alleles to a user-provided function and mutates these alleles according to the returned values.
  6. simuPOP provides more than 70 operators that cover all important aspects of genetic studies. These include mutation (k-allele, stepwise, generalized stepwise and hybrid), migration (arbitrary, can create new subpopulation), recombination and gene conversion (uniform or nonuniform, sex-specific), quantitative trait (single, multilocus or hybrid), selection (single-locus, additive, multiplicative or hybrid multi-locus models), penetrance (single, multi-locus or hybrid), ascertainment (case–control, affected sibpairs, random, nuclear and large pedigree), statistics calculation (including but not limited to allele, genotype, haplotype, heterozygote number and frequency; expected heterozygosity; di-allelic and multi-allelic , and linkage disequilibrium measures), pedigree tracing, visualization (using R or other Python modules) and load/save in simuPOP’s native format and many external formats such as Linkage.
  7. Mating schemes and many operators can work on virtual subpopulations of a subpopulation. For example, positive assortative mating can be implemented by mating individuals with similar properties such as ancestry. The number of offspring per mating event can be fixed, or can follow a statistical distribution. Arbitrary nonrandom mating schemes can be simulated, with assortative mating, age-structured population and overlapping-generations as special cases.
  8. simuPOP is well documented. The recent versions of simuPOP provides more than 150 examples in its user's guide (200 pages), a complete reference manual. The simuPOP online cookbook provides many additional modules that allow simuPOP to work with other applications, and functions and scripts for different applications.
  9. Perhaps most importantly, simuPOP is completely open to you, in the sense that:
    • You can observe any information of any individual at any generation of an evolving population. A simuPOP evolutionary process is completely transparent to you so that you can examine its properties very closely.
    • simuPOP is very well documented so that you can find the details of any function and operator you use. Mathematical formula and even implementation details are provided for important genetic factors especially when there are alternative implementations.
    • simuPOP is open source so you can always check what is going on under the hood.
Although such openness can be overwhelming, it gives serious users a peace of mind of knowing exactly how their simulations are running.

Other forward-time simulation programs

A number of forward-time simulation programs are available. If we exclude early forward-time simulation applications developed primarily for teaching purposes, notable forward-time simulation programs include easyPOP, FPG, Nemo and quantiNemo, Bottleneck, genoSIM and genomeSIMLA, FreGene, GenomePop, ForwSim, ForSim, and SFS_CODE (some links may have become obsolete). These programs are designed with specific applications and specific evolutionary scenarios in mind, and excel in what they are designed for. For some applications, these programs may be easier to use than simuPOP. For example, using a special look-ahead algorithm, ForwSim is among the fastest programs to simulate a standard Wright-Fisher process, and should be used if such a simulation is needed. However, these programs are not flexible enough to be applied to problems outside of their designed application area. For example, none of these programs can be used to study the evolution of a disease predisposing mutant, a process that is of great importance in statistical genetics and genetic epidemiology.

Compared to such programs, simuPOP has the following advantages:

  • The scripting interface gives simuPOP the flexibility to create arbitrarily complex evolutionary scenarios. For example, it is easy to use simuPOP to explicitly introduce a disease predisposing mutant to an evolving population, trace the allele frequency of them, and restart the simulation if they got lost due to genetic drift.
  • The Python interface allows users to define customized genetic effects in Python. In contrast, other programs either do not allow customized effects or force users to modify code at a lower (e.g. C++) level.
  • simuPOP is the only application that embodies the concept of virtual subpopulation that allows evolutions at a finer scale. This is required for realistic simulations of complex evolutionary scenarios. For example, you can calculate statistics for arbitrary subsets of the population (e.g. all affected individuals, heavy smokers), and apply different mutation, migration, selection schemes to individuals with different properties.
  • simuPOP allows users to examine an evolutionary process very closely because all simuPOP objects are Python objects that can be assessed using their member functions. For example, users can keep track of genotype at particular loci during evolution. In contrast, other programs work more or less like a black box where only limited types of statistics can be outputted.

Major drawback of simuPOP?

The scripting interface gives simuPOP the flexibility to simulate almost arbitrary evolutionary scenarios but it has also been frequently quoted as the major drawback of simuPOP because the scripting language interface has a steeper learning curve than the command line (or configuration file) interface of program A.

Whereas this claim is generally true (see, however, simuPOP RoadMap for improvements in this area), the comparison itself hardly makes sense. A fairer comparison would be comparing program A with a simuPOP script that simulates a similar evolutionary scenario. These scripts could be found at the simuPOP online cookbook. They are usually quite user-friendly because they can get user inputs from command line arguments, a configuration file and a graphic user interface (using the simuOpt module of simuPOP) and no knowledge in Python or simuPOP is required to run them.