Introduction

simuPOP is a general-purpose forward-time population genetics simulation environment. It is

  • A population genetics simulation program: simuPOP is an individual based population genetics simulation program. It explicitly models individuals with genotypes and simulates the transmission of individual genotype when a population evolves generation by generation. Although the basic evolutionary scenario follows a non-overlap generation model, aged structured populations can be mimicked using special non-random mating schemes.
  • Forward-time: Unlike coalescent-based programs, simuPOP evolves populations forward in time, subject to arbitrary number of genetic and environmental forces such as mutation, recombination, migration and population/subpopulation size changes. Statistics of populations can be calculated and visualized dynamically which makes simuPOP an ideal tool to demonstrate population genetics models; generate datasets under various evolutionary settings, and more importantly, study complex evolutionary processes and evaluate gene mapping methods.
  • General-purpose: Many population genetics simulation programs are available. However, they are all designed for specific types of evolutionary scenarios and are limited in their ability to simulate complex evolutionary processes. simuPOP is the only general-purpose simulation program that is capable of simulation arbitrarily complex evolutionary scenarios. As a matter of fact, using a large number of functions, simuPOP can be a powerful tool to manipulate and analyze genetic data.
  • A simulation environment: simuPOP is provided as a number of Python modules, which provide of a large number of Python objects and functions, including population, mating schemes, operators (objects that manipulate populations) and simulators to coordinate the evolutionary processes. It is the users’ responsibility to write a Python script to glue these pieces together and form a simulation. At a more user-friendly level, simuPOP provides an increasing number of bundled scripts that perform simulations ranging from implementation of basic population genetics models to generating datasets under complex evolutionary scenarios. No knowledge about Python or simuPOP would be needed to run these simulations, if they happen to fit your need.

An overview of simuPOP concepts

A simuPOP population consists of individuals of the same genotype structure, which include properties such as number of homologous sets of chromosomes (ploidy), number of chromosomes, and names and locations of markers on each chromosome. Individuals can be divided into subpopulations that can be further divided into virtual subpopulations according to individual properties such as sex, affection status, or arbitrary auxiliary information such as age.

Operators are Python objects that act on a population. They can be applied to a population before or after mating during a life cycle of an evolutionary process (Figure [fig:life-cycle]), or to one or two parents during the production of each offspring. Arbitrary numbers of operators can be applied to an evolving population.

A simuPOP mating scheme is responsible for choosing parent or parents from a parental (virtual) subpopulation and for populating an offspring subpopulation. simuPOP provides a number of pre-defined mating schemes, such as random, consanguineous, monogamous, or polygamous mating, selfing, and haplodiploid mating in hymenoptera. More complicated nonrandom mating schemes such as mating in age-structured populations can be constructed using heterogeneous mating schemes.

simuPOP evolves a population generation by generation, following the evolutionary cycle depicted in here:

Briefly speaking, a number of pre-mating operators such as a mutator are applied to a population before a mating scheme repeatedly chooses a parent or parents to produce offspring. During-mating operators such as recombinator can be used to adjust how offspring genotypes are formed from parental genotypes. After an offspring population is populated, post-mating operators can be applied, for example, to calculate population statistics. The offspring population will then become the parental population of the next evolutionary cycle.

A simple example:

>>> from simuPOP import *
>>> simu = simulator(
    population(size=1000, loci=[2]),
    randomMating(ops=recombinator(rates=0.01)),
    rep=3)
>>> simu.evolve(
...     initOps = [
...         initSex(),
...         initByValue([1, 2, 2, 1])],  
...     postOps = [
...         stat(LD=[0, 1]),
...         pyEval(r"'%.2f\t' % LD[0][1]", step=10),
...         pyOutput('\n', rep=-1, step=10)
...     ],
...     gen=100
... )
0.24	0.25	0.24	
0.21	0.23	0.22	
0.17	0.21	0.20	
0.13	0.17	0.18	
0.10	0.15	0.18	
0.11	0.14	0.16	
0.12	0.10	0.16	
0.11	0.11	0.15	
0.09	0.10	0.14	
0.07	0.10	0.11	
(100, 100, 100)
>>>

This example simulates a standard diploid Wright-Fisher model with recombination.

  • The first line imports the standard simuPOP module.
  • The second line creates a simulator with three replicates of a diploid population with 1000 individuals, each having one chromosome with two loci. Random mating will be used to generate offspring but a recombinator (a during-mating operator) is used instead of the default Medelian genotype transmitter.
  • The last statement uses the evolve() function to evolve the populations for 100 generations, subject to five operators.

The first operator initByValue is applied to all populations before evolution. This operator initializes all individuals with the same genotype 12/21. The other operators can be applied at every generation. stat calculates linkage disequilibrium between the first and second loci. The results of this operator are stored in a local variable space of each population. The last two operators pyEval and pyOutput are applied at the end of every 10 generations. pyEval is applied to all replicates to output calculated linkage disequilibrium values with a trailing tab, and the last operator outputs a newline after the last replicate.

The result of this example is a table of three columns, representing the decay of linkage disequilibrium of each replicate at 10 generation intervals. The return value of the evolve function, which is the number of evolved generations for each replicate, is also printed.

Is simuPOP the right tool for you?

There are quite a number of population genetics simulation programs that have been created for various purposes. If one of them happens to fit your need, it may be easier to use (at least you do not need to write a script) or has better performance. Here is a few links for such programs:

Please note that these links are not actively maintained so some of them might not work.

If you cannot find a program that fits your need, you might want to browse this website and the simuPOP online cookbook and get some idea how simuPOP works. It might be a good idea to send an email to the simuPOP MailingList, describing the kind of simulation you would like to perform. Some simuPOP users might have run similar simulations and are able to provide useful information on how to implement your simulation with simuPOP.

How to cite simuPOP

If you use simuPOP for your research, please cite it using:

Bo Peng and Marek Kimmal (2005) simuPOP: a forward-time population genetics simulation environment. bioinformatics, 21(18): 3686-3687. (Link)

and optionally (if nonrandom mating is used)

Bo Peng and Christopher Amos (2008) Forward-time simulations of nonrandom mating populations using simuPOP. bioinformatics, 24 (11): 1408-1409. (Link)