Introduction

simuPOP is an individual-based general-purpose forward-time population genetics simulation environment. More specifically, it is

  • An individual-based population genetics simulation program: simuPOP explicitly models individuals with genotypes and simulates the transmission of individual genotype when a population evolves generation by generation. Although the basic evolutionary scenario follows a discrete non-overlapping generation model, aged structured populations can be mimicked using special non-random mating schemes.
  • Forward-time: Unlike coalescent-based programs, simuPOP evolves populations forward in time, subject to arbitrary number of genetic and environmental forces such as mutation, recombination, migration and population/subpopulation size changes. Statistics of populations can be calculated and visualized dynamically which makes simuPOP an ideal tool to demonstrate population genetics models; generate datasets under various evolutionary settings, and more importantly, study complex evolutionary processes and evaluate gene mapping methods.
  • General-purpose: Many population genetics simulation programs are available. However, they are all designed for specific types of evolutionary scenarios and are limited in their ability to simulate complex evolutionary processes. simuPOP is the only general-purpose simulation program that is capable of simulation arbitrarily complex evolutionary scenarios. As a matter of fact, using a large number of functions, simuPOP can be a powerful tool to manipulate and analyze genetic data.
  • A simulation environment: simuPOP is provided as a number of Python modules, which provide of a large number of Python objects and functions, including population, mating schemes, operators (objects that manipulate populations) and simulators to coordinate the evolutionary processes. It is the users’ responsibility to write a Python script to glue these pieces together and form a simulation. At a more user-friendly level, simuPOP provides an increasing number of bundled scripts that perform simulations ranging from implementation of basic population genetics models to generating datasets under complex evolutionary scenarios. No knowledge about Python or simuPOP would be needed to run these simulations, if they happen to fit your need.

.

An overview of simuPOP concepts

A simuPOP population consists of individuals of the same genotype structure, which include properties such as number of homologous sets of chromosomes (ploidy), number of chromosomes, and names and locations of markers on each chromosome. Individuals can be divided into subpopulations that can be further divided into virtual subpopulations according to individual properties such as sex, affection status, or arbitrary auxiliary information (called information fields) such as age, smoking-status, and geographic locations.

Operators are Python objects that act on a population. They can be applied to a population before or after mating during a life cycle of an evolutionary process, or to one or two parents during the production of each offspring. Arbitrary numbers of operators can be applied to an evolving population.

A simuPOP mating scheme is responsible for choosing parent or parents from a parental (virtual) subpopulation and for populating an offspring subpopulation. simuPOP provides a number of pre-defined mating schemes, such as random, consanguineous, monogamous, or polygamous mating, selfing, and haplodiploid mating in hymenoptera. More complicated nonrandom mating schemes such as mating in age-structured populations can be constructed using heterogeneous mating schemes.

simuPOP evolves a population generation by generation, following the evolutionary cycle depicted in here:

Briefly speaking, a number of pre-mating operators such as a mutator are applied to a population before a mating scheme repeatedly chooses a parent or parents to produce offspring. During-mating operators such as recombinator can be used to adjust how offspring genotypes are formed from parental genotypes. After an offspring population is populated, post-mating operators can be applied, for example, to calculate population statistics. The offspring population will then become the parental population of the next evolutionary cycle.

A simple example:

>>> from simuPOP import *
>>> pop = Population(size=1000, loci=[2]))
>>> pop.evolve(
...     initOps = [
...         InitSex(),
...         InitGenotype(genotype=[1, 2, 2, 1])],  
...     matingScheme=RandomMating(ops=Recombinator(rates=0.01)),
...     postOps = [
...         Stat(LD=[0, 1]),
...         PyEval(r"'%.2f\n' % LD[0][1]", step=10),
...     ],
...     gen=100
... )
0.24
0.21
0.17
0.13
0.10
0.11
0.12
0.11
0.09
0.07
(100,)
>>>

This example simulates a standard diploid Wright-Fisher model with recombination.

  • The first line imports the standard simuPOP module.
  • The second line creates a diploid population with 1000 individuals, each having one chromosome with two loci.
  • The last statement uses the evolve() function to evolve the populations for 100 generations, subject to four operators. It uses a random mating scheme. However, instead of a standard Mendelian genotype transmitter, this script uses a recombinator (a during-mating operator) to recombine parental chromosomes to form offspring genotype.

The first two operators are applied to the population before evolution. Operator InitSex initializes sex of individuals randomly and operator InitGenotype initializes all individuals with the same genotype 12/21. The other operators can be applied at every generation. Operator Stat calculates linkage disequilibrium between the first and second loci. The results of this operator are stored in a local variable space of the population. The last operator PyEval is applied at the end of every 10 generations to output calculated linkage disequilibrium values with a trailing newline.

The result of this example is a list of linkage disequilibrium values, representing the decay of linkage disequilibrium of this population at 10 generation intervals. The return value of the evolve function, which is the number of evolved generations for each replicate, is also printed.

Contributing to simuPOP

simuPOP, being a general-purpose simulator, has been applied to a wide range of topics in a variety of research fields including but not limited to population and evolutionary genetics, landscape genetics, conservation biology, and genetic epidemiology. However, compared to simulators designed for particular research topics, simuPOP usually provides less domain-specific built-in models and functions. If you have applied simuPOP to some research topics in your field, implemented some commonly used models, and are interested in sharing these models with other users of simuPOP, please feel free to contact me.

How to cite simuPOP

If you use simuPOP for your research, please cite it using:

Bo Peng and Marek Kimmal (2005) simuPOP: a forward-time population genetics simulation environment. bioinformatics, 21(18): 3686-3687. (Link)

and optionally (if nonrandom mating is used)

Bo Peng and Christopher Amos (2008) Forward-time simulations of nonrandom mating populations using simuPOP. bioinformatics, 24 (11): 1408-1409. (Link)

You can find a list of publications that have used simuPOP to simulate genetic data from the GSR simuPOP webpage.