This module provides some commonly used operators and format conversion utilities.
A Trajectory object contains frequencies of one or more loci in one or more subpopulations over several generations. It is usually returned by member functions of class TrajectorySimulator or equivalent global functions simulateForwardTrajectory and simulateBackwardTrajectory.
The Trajectory object provides several member functions to facilitate the use of Trajectory-simulation techiniques. For example, Trajectory.func() returns a trajectory function that can be provided directly to a ControlledOffspringGenerator; Trajectory.mutators() provides a list of PointMutator that insert mutants at the right generations to initialize a trajectory.
For more information about Trajectory simulation techniques and related controlled random mating scheme, please refer to the simuPOP user’s guide, and Peng et al (PLoS Genetics 3(3), 2007).
Plot simulated Trajectory using R through a Python module rpy. The function will return silently if module plotter cannot be imported.
This function will use different colors to plot trajectories at different loci. The trajectories are plotted from generation 0 to endGen even if the trajectories are short. The y-axis ranges from 0 to 1 and is labeled Allele frequency. If a valid filename is given, the figure will be saved to filename in a format specified by file extension. Currently supported formats/extensions are eps, jpg, bmp, tif, png and pdf. The availability of formats may be limited by your version of R.
This function makes use of the derived keyword parameter feature of module plotter. Allowed prefixes are par, plot, lines and dev_print. Allowed repeating suffix are loc and sp. For example, you could use parameter plot_ylim to reset the default value of ylim in R function plot.
A Trajectory Simulator takes basic demographic and genetic (natural selection) information of an evolutionary process of a diploid population and allow the simulation of Trajectory of allele frequencies of one or more loci. Trajectories could be simulated in two ways: forward-time and backward-time. In a forward-time simulation, the simulation starts from certain allele frequency and simulate the frequency at the next generation using given demographic and genetic information. The simulation continues until an ending generation is reached. A Trajectory is successfully simulated if the allele frequency at the ending generation falls into a specified range. In a backward-time simulation, the simulation starts from the ending generation with a desired allele frequency and simulate the allele frequency at previous generations one by one until the allele gets lost (allele frequency equals zero).
The result of a trajectory simulation is a trajectory object which can be used to direct the simulation of a special random mating process that controls the evolution of one or more disease alleles so that allele frequencies are consistent across replicate simulations. For more information about Trajectory simulation techniques and related controlled random mating scheme, please refer to the simuPOP user’s guide, and Peng et al (PLoS Genetics 3(3), 2007).
Create a trajectory Simulator using provided demographic and genetic (natural selection) parameters. Member functions simuForward and simuBackward can then be used to simulate trajectories within certain range of generations. This class accepts the following parameters
Simulate trajectories of multiple disease susceptibility loci using a forward time approach. This function accepts allele frequencies of alleles of multiple unlinked loci (endFreq) at the end of generation endGen. Depending on the number of loci and subpopulations, parameter beginFreq can be a number (same frequency for all loci in all subpopulations), or a list of frequencies for each locus (same frequency in all subpopulations), or a list of frequencies for each locus in each subpopulation in the order of loc0_sp0, loc1_sp0, ..., loc0_sp1, loc1_sp1, ... and so on.
This simulator will simulate a trajectory generation by generation and restart if the disease allele got fixed (instead of lost), or if the length simulated Trajectory does not fall into minMutAge and maxMutAge (ignored if None is given). This simulator will return None if no valid Trajectory is found after maxAttempts attemps.
Simulate trajectories of multiple disease susceptibility loci using a forward time approach. This function accepts allele frequencies of alleles of multiple unlinked loci at the beginning generation (freq) at generation beginGen, and expected range of allele frequencies of these alleles (endFreq) at the end of generation endGen. Depending on the number of loci and subpopulations, these parameters accept the following inputs:
This simulator will simulate a trajectory generation by generation and restart if the resulting frequencies do not fall into specified range of frequencies. This simulator will return None if no valid Trajectory is found after maxAttempts attemps.
migration rate matrix
x m/(n-1) m/(n-1) ....
m/(n-1) x ............
.....
.... m/(n-1) m/(n-1) x
where x = 1-m
Return the migration rate matrix for a hierarchical island model where there are different migration rate within and across groups of islands.
For individuals in an island, the probability that it remains in the same island is 1-r1-r2 (r1, r2 might vary by island groups), that it migrates to another island in the same group is r1 and migrates to another island outside of the group is r2. migrate rate to a specific island depends on the size of group.
migration rate matrix, circular stepping stone model (X=1-m)
X m/2 m/2
m/2 X m/2 0
0 m/2 x m/2 ......0
...
m/2 0 .... m/2 X
or non-circular
X m/2 m/2
m/2 X m/2 0
0 m/2 X m/2 ......0
...
... m X
This function returns [[1]] when there is only one subpopulation.
The ProgressBar class defines a progress bar. This class will use a text-based progress bar that outputs progressing dots (.) with intermediate numbers (e.g. 5 for 50%) under a non-GUI mode (gui=False). In the GUI mode, a Tkinter or wxPython progress dialog will be used (gui=Tkinter or gui=wxPython). The default mode is determined by the global gui mode of simuPOP (see also simuOpt.setOptions).
This class is usually used as follows:
progress = ProgressBar("Start simulation", 500)
for i in range(500):
# i+1 can be ignored if the progress bar is updated by 1 step
progress.update(i+1)
# if you would like to make sure the done message is displayed.
progress.done()
This function is deprecated. Please use export(format='csv') instead. Save a simuPOP population pop in csv format. Columns of this file is arranged in the order of information fields (infoFields), sex (if sexFormatter is not None), affection status (if affectionFormatter is not None), and genotype (if genoFormatter is not None). This function only output individuals in the present generation of population pop. This function accepts the following parameters:
Parameters genoCode, sexCode, and affectionCode from version 1.0.0 have been renamed to genoFormatter, sexFormatter and affectionFormatter but can still be used.
An operator to export the current population in specified format. Currently supported file formats include:
STRUCTURE (http://pritch.bsd.uchicago.edu/structure.html). This format accepts the following parameters:
Genotype information are always outputted. Alleles are coded the same way (0, 1, 2, etc) as they are stored in simuPOP.
GENEPOP (http://genepop.curtin.edu.au/). The genepop format accepts the following parameters:
Because 0 is reserved as missing data in this format, allele A is outputted as A+adjust. simuPOP will use subpopulation names (if available) and 1-based individual index to output individual label (e.g. SubPop2-3). If parameter subPops is used to output selected individuals, each subpop will be outputted as a separate subpopulation even if there are multiple virtual subpopulations from the same subpopulation. simuPOP currently only export diploid populations to this format.
FSTAT (http://www2.unil.ch/popgen/softwares/fstat.htm). The fstat format accepts the following parameters:
MAP (marker information format) output information about each loci. Each line of the map file describes a single marker and contains chromosome name, locus name, and position. Chromosome and loci names will be the names specified by parameters chromNames and lociNames of the Population object, and will be chromosome index + 1, and ‘.’ if these parameters are not specified. This format output loci position to the third column. If the unit assumed in your population does not match the intended unit in the MAP file, (e.g. you would like to output position in basepair while the population uses Mbp), you can use parameter posMultiplier to adjust it. This format accepts the following parameters:
PED (Linkage Pedigree pre MAKEPED format), with columns of family, individual, father mother, gender, affection status and genotypes. The output should be acceptable by HaploView or plink, which provides more details of this format in their documentation. If a population does not have ind_id, father_id or mother_id, this format will output individuals in specified (virtual) subpopulations in the current generation (parental generations are ignored) as unrelated individuals with 0, 0 as parent IDs. An incremental family ID will be assigned for each individual. If a population have ind_id, father_id and mother_id, parents will be recursively traced to separate all individuals in a (multigenerational) population into families of related individuals. father and mother id will be set to zero if one of them does not exist. This format uses 1 for MALE, 2 for FEMALE. If phenoField is None, individual affection status will be outputted with 1 for Unaffected and 2 for affected. Otherwise, values of an information field will be outputted as phenotype. Because 0 value indicates missing value, values of alleles will be adjusted by 1 by default, which should be avoided if you are using non-zero alleles to model ACTG alleles in simuPOP. This format will ignore subpopulation structure because parents might belong to different subpopulations. This format accepts the following parameters:
Phylip (Joseph Felsenstein’s Phylip format). Phylip is generally used for nuclotide sequences and protein sequences. This makes this format suitable for simulations of haploid populations (ploidy=1) with nucleotide or protein sequences (number of alleles = 4 or 24 with alleleNames as nucleotide or amino acid names). If your population does satisfy these conditions, you can still export it, with homologous chromosomes in a diploid population as two sequences, and with specified allele names for allele 0, 1, 2, .... This function outputs sequence name as SXXX where XXX is the 1-based index of individual and SXXX_Y (Y=1 or 2) for diploid individuals, unless names of sequences are provided by parameter seqNames. This format supports the following parameters:
CSV (comma separated values). This is a general format that output genotypes in comma (or tab etc) separated formats. The function form of this operator export(format='csv') is similar to the now-deprecated saveCSV function, but its interface has been adjusted to match other formats supported by this operator. This format outputs a header (optiona), and one line for each individual with values of specified information fields, sex, affection status, and genotypes. All fields except for genotypes are optional. The output format is controlled by the following parameters:
This operator supports the usual applicability parameters such as begin, end, step, at, reps, and subPops. If subPops are specified, only individuals from specified (virtual) subPops are exported. Because this exporter will always overwite any existing file, leading ‘>’ of parameter output is ignored, although the ‘!expr’ format can still be used to generate context dependent output filename. Unless explicitly stated for a particular format, this operator exports individuals from the current generation if there are multiple ancestral generations in the population.
The Exporter class will make use of a progress bar to show the progress. The interface of the progress bar is by default determined by the global GUI status but you can also set it to, for example, gui=False to forcefully use a text-based progress bar.
This function import and return a population from a file filename in specified format. Format-specific parameters can be used to define how the input should be interpreted and imported. This function supports the following file format.
GENEPOP (http://genepop.curtin.edu.au/). For input file of this format, this function ignores the first title line, load the second line as loci names, and import genotypes of different POP sections as different subpopulations. This format accepts the following parameters:
FSTAT (http://www2.unil.ch/popgen/softwares/fstat.htm). This format accepts the following parameters:
Phylip (Joseph Felsenstein’s Phylip format). This function ignores sequence names and import sequences in a haploid (default) or diploid population (if there are even number of sequences). An list of allele names are required to translate symbols to allele names. This format accepts the following parameters: