Major features

simuPOP offers a long list of features, many of which are unique among all forward-time population genetics simulation programs. The most distinguished features include:

  1. simuPOP provides five types of modules that use 1, 8 or >=32 bits to store an allele, a module to use compression technology to store sparse genotype efficiently, and a module that stores the lineage information with alleles. The binary module (1 bit) is suitable for simulating a large number of SNP markers and the long module (>=32 bits) is suitable for simulating some population genetics models such as the infinite allele mutation model. simuPOP supports different types of chromosomes such as autosome, sex chromosomes and mitochondrial, with arbitrary number of markers.
  2. simuPOP supports autosome, chromosome X, chromosome Y, mitochondrial chromosomes, and a group of customized chromosome types.
  3. An arbitrary number of float numbers, called information fields, can be attached to individuals of a population. For example, information field father_idx and mother_idx are used to track an individualís parents, and pack_year can be used to simulate an environmental factor associated with smoking.
  4. simuPOP does not impose any limit on number of homologous sets of chromosomes, the size of the genome, or the number of individuals in a population. During an evolutionary process, a population can hold more than one most-recent generations. Pedigrees can be sampled from such multi-generation populations.
  5. An operator can be native (implemented in C++) or hybrid (Python assisted). A hybrid operator calls a user-provided Python function to implement arbitrary genetic effects. For example, a hybrid mutator passes to-be-mutated alleles to a user-provided function and mutates these alleles according to the returned values.
  6. simuPOP is designed with performance and scalability in mind. It can make uses of multiple cores of modern CPUs and can simulate millions of individuals with long chromosomes. Please refer to the simuPOP cook book for details.
  7. simuPOP provides more than 70 operators that cover all important aspects of genetic studies. These include mutation (k-allele, stepwise, generalized stepwise and hybrid), migration (arbitrary, can create new subpopulation), recombination and gene conversion (uniform or nonuniform, sex-specific), quantitative trait (single, multilocus or hybrid), selection (single-locus, additive, multiplicative or hybrid multi-locus models), penetrance (single, multi-locus or hybrid), ascertainment (caseĖcontrol, affected sibpairs, random, nuclear and large pedigree), statistics calculation (including but not limited to allele, genotype, haplotype, heterozygote number and frequency; expected heterozygosity; di-allelic and multi-allelic , and linkage disequilibrium measures), pedigree tracing, visualization (using R or other Python modules) and load/save in simuPOPís native format and many external formats such as Linkage.
  8. The stat operator of simuPOP provides a large number of population statistics that can be calculated and outputted during or at the end of evolution, ranging from allele, genotype and haplotype frequencies to association tests and estimates of effective population sizes.
  9. Mating schemes and many operators can work on virtual subpopulations of a subpopulation. For example, positive assortative mating can be implemented by mating individuals with similar properties such as ancestry. The number of offspring per mating event can be fixed, or can follow a statistical distribution. Arbitrary nonrandom mating schemes can be simulated, with assortative mating, age-structured population and overlapping-generations as special cases.
  10. simuPOP is well documented. The recent versions of simuPOP provides more than 150 examples in its user's guide (230+pages), a complete reference manual. The simuPOP online cookbook provides many additional modules that allow simuPOP to work with other applications, and functions and scripts for different applications.
  11. Perhaps most importantly, simuPOP is completely open to you, in the sense that:
    • You can observe any information of any individual at any generation of an evolving population. A simuPOP evolutionary process is completely transparent to you so that you can examine its properties very closely. This makes simuPOP the best tool available to study complex evolutionary processes.
    • simuPOP is very well documented so that you can find the details of any function and operator you use. Mathematical formula and even implementation details are provided for important genetic factors especially when there are alternative implementations.
    • simuPOP is open source so you can always check what is going on under the hood.
Although such openness can be overwhelming, it gives serious users a peace of mind of knowing exactly how their simulations are running.

Other forward-time simulation programs

A number of forward-time simulation programs are available. These programs are designed with specific applications and specific evolutionary scenarios in mind, and excel in what they are designed for. For some applications, these programs may be easier to use than simuPOP. For example, using a special look-ahead algorithm, ForwSim is among the fastest programs to simulate a standard Wright-Fisher process, and should be used if such a simulation is needed. However, these programs are not flexible enough to be applied to problems outside of their designed application area. For example, none of these programs can be used to study the evolution of a disease predisposing mutant, a process that is of great importance in statistical genetics and genetic epidemiology.

Compared to such programs, simuPOP has the following advantages:

  • The scripting interface gives simuPOP the flexibility to create arbitrarily complex evolutionary scenarios. For example, it is easy to use simuPOP to explicitly introduce a disease predisposing mutant to an evolving population, trace the allele frequency of them, and restart the simulation if they got lost due to genetic drift.
  • The Python interface allows users to define customized genetic effects in Python. In contrast, other programs either do not allow customized effects or force users to modify code at a lower (e.g. C++) level.
  • simuPOP is the only application that embodies the concept of virtual subpopulation that allows evolutions at a finer scale. This is required for realistic simulations of complex evolutionary scenarios. For example, you can calculate statistics for arbitrary subsets of the population (e.g. all affected individuals, heavy smokers), and apply different mutation, migration, selection schemes to individuals with different properties.
  • simuPOP allows users to examine an evolutionary process very closely because all simuPOP objects are Python objects that can be assessed using their member functions. For example, users can keep track of genotype at particular loci during evolution. In contrast, other programs work more or less like a black box where only limited types of statistics can be outputted.

If you are interested to see other options, the Genetic Simulation Resources contains links and major features of most simulation programs.