The simuPOP.plotter module defines a few utility functions and Python operators that help you plot variables and information fields during evolution. A number of operators are defined that
These operators are derived from class PyOperator and call R plot functions when they are applied to a population. For example, operator plotter.VarPlotter collects expression values and use functions plot and lines to plot the data, with help from other functions such as par (device property), dev.print (save figure to files) and legend (add legend). Some functions are called multiple times for different replicate, subpopulation or information fields.
One of the most interesting feature of this module is its use of derived keyword parameters to send arbitrary parameters to the underlying R functions, which usually accept a large number of parameters to customize every aspect of a figure. A derived keyword argument is an argument that is prefixed with a function name and/or suffixed by an iterator name. The former specifies to which underlying R function this parameter will be passed to; the latter allows the users to specify a list of values that will be passed, for example, to lines representing different replicates. For example, parameter par_mar=[1]*4 will pass mar=[1]*4 to R function par, and lty_rep=[1, 2, 3] will pass lty=1, lty=2 and lty=3 to different replicates. A class usually has one or two default functions (such as plot, lines) to which keyword aguments without function prefix will be sent.
In addition, the values of these keyword arguments could vary during evolution. More specifically, if the value is a string with a leading exclamation mark (!), the remaining string will be considered as an expression. This expression will be evaluated against the current population during evolution and the return value will become the value of the parameter at that generation. For example, keyword parameter main=''!'Allele frequency at generation %d' % gen'' will become main='Allele frequency at generation 10' at generation 10.
Class plotter.VarPlotter plots the current and historical values of a Python expression (expr), which are evalulated (against each population’s local namespace) and saved during evolution. The return value of the expression can be a number or a sequence, but should have the same type and length across all replicates and generations. Histories of each value (or each item in the returned sequence) of each replicate form a line, with generation numbers as its x-axis. Number of lines will be the number of replicates multiplied by dimension of the expression. Although complete histories are usually saved, you can use parameter win to save histories only within the last win generations.
simuPOP version 1.1.6 and earlier supports both rpy and matplotlib as the underlying plotting library. However, because of bugs in rpy2 and difficulties in supporting rpy, rpy2 and matplotlib, rpy/rpy2 support is removed in simuPOP 1.1.7. Please use simuPOP 1.1.6 if you are interested in using rpy2.
Except for the first generation where no line could be drawn, a figure will be drawn after this operator is applied to the last specified replicate (parameter reps could be used to specify a subset of replicates). For example, although linkage disequilibrium values between the first two loci are evaluated and saved at the end of generations 0, 5, 10, ..., (step=5) figures are only drawn at generations 40 and 80 (update=40) in Exampe varplotter. This example also demonstrates the use of parameters saveAs and legend. By given a filename rpy.png to parameter saveAs, this operator will save figures (named rpy_40.png and rpy_80.png) after they are drawn.
Example: Use rpy or matplotlib to plot an expression
>>> import simuPOP as sim
>>> from simuPOP.plotter import VarPlotter
>>> pop = sim.Population(size=1000, loci=2)
>>> simu = sim.Simulator(pop, rep=3)
>>> simu.evolve(
... initOps=[
... sim.InitSex(),
... sim.InitGenotype(genotype=[1, 2, 2, 1])
... ],
... matingScheme=sim.RandomMating(ops=sim.Recombinator(rates=0.01)),
... postOps=[
... sim.Stat(LD=[0, 1]),
... #
... VarPlotter('LD[0][1]', step=5, update=40, saveAs='log/varplot.png',
... legend=['Replicate %d' % x for x in range(3)],
... set_ylabel_ylabel='LD between marker 1 and 2',
... set_title_label='LD decay',
... set_ylim_bottom=0, set_ylim_top=0.25,
... plot_linestyle_rep=['-', ':', '-.'],
... ),
... ],
... gen=100
... )
(100L, 100L, 100L)
Parameters after legend (xlab, ylab, ylim, main, ...) deserve more attention here. These parameters are derived keyword arguments because they are not defined by VarPlotter. Parameters without prefix are passed directly to the R functions plot and line. They could be used to customize line type (lty), color (col), title (main), limits of x and y axes (xlim and ylim) and many other graphical features (see R manual for details). If multiple lines are drawn, a list of values could be applied to these lines if you add _rep (for each replicate) or _dim (for each item of a sequence) after the name of the parameter. For example, lty_rep=[1, 2, 3] is used in Example varplotter to pass parameters lty=1, lty=2 and lty=3 to lines for three replicates. Suffix _repdim can also be used to specify values for every replication and dimension. Figure fig_rpy displayed rpy_80.png that is saved at generation 80 for this example.
Figure: rpy_80.png saved at generation 80 for Example
If the expression is multidimensional, the number of lines can be large and it is often desired to separate these lines into subplots. This can be done by parameters byRep or byDim. The former plots lines replicate by replicate and the latter does it dimension by dimension. For example, Example varPlotByRep and varPlotByDim both have three replicates and the expression has allele frequency for four loci. The total number of lines is therefore 12. In Example varPlotByRep, these lines are separated to three subplots, replicate by replicate, with different titles (parameter main_rep). In each subplot, allele frequency trajectories (histories) for different loci are plotted in different color (parameter col_dim). The last saved figure (rpy_byRep_90.png) is displayed in Figure fig_rpyByRep. In Example varPlotByDim, these lines are separated to four subplots, locus by locus, with different titles (parameter main_dim). In each subplot, allele frequency trajectories (histories) for different loci are plotted in different color (parameter col_rep) and line type (parameter lty_rep). The last saved figure (rpy_byDim_90.png) is displayed in Figure fig_rpyByDim.
Example: Separate figures by replicate
>>> import simuPOP as sim
>>> import simuPOP as sim
>>> from simuPOP.plotter import VarPlotter
>>> pop = sim.Population(size=1000, loci=1*4)
>>> simu = sim.Simulator(pop, rep=3)
>>> simu.evolve(
... initOps=[sim.InitSex()] +
... [sim.InitGenotype(freq=[0.1*(x+1), 1-0.1*(x+1)], loci=x) for x in range(4)],
... matingScheme=sim.RandomMating(),
... postOps=[
... sim.Stat(alleleFreq=range(4)),
... VarPlotter('[alleleFreq[x][0] for x in range(4)]', byRep=True,
... update=10, saveAs='log/varplot_byRep.png',
... figure_figsize=(10, 8),
... legend=['Locus %d' % x for x in range(4)],
... set_ylabel_ylabel='Allele frequency',
... set_ylim_bottom=0, set_ylim_top=1,
... set_title_label_rep=['Genetic drift, replicate %d' % x for x in range(3)],
... ),
... ],
... gen=100
... )
(100L, 100L, 100L)
Figure: Allele frequency trajectories separated by replicates
Example varPlotByDim also demonstrates some advanced features of this plotter that allow further customization of the figures. More specifically,
Example: Separate figures by Dimension
>>> import simuPOP as sim
>>> import simuPOP as sim
>>> from simuPOP.plotter import VarPlotter
>>> pop = sim.Population(size=1000, loci=1*4)
>>> simu = sim.Simulator(pop, rep=3)
>>> def rpy_drawFrame(r, dim=None, **kwargs):
... '''Draw a frame around subplot dim. Parameter r is defined in the rpy
... module and is used for calling R functions. Parameter dim is the dimension
... index. Other parameters are ignored.
... '''
... r.axis(1)
... r.axis(2)
... r.grid()
... r.mtext({0:'A', 1:'B', 2:'C', 3:'D'}[dim], adj=1)
...
>>> def mat_drawFrame(ax, dim=None, **kwargs):
... '''Draw a frame around subplot dim. Parameter r is defined in the rpy
... module and is used for calling R functions. Parameter dim is the dimension
... index. Other parameters are ignored.
... '''
... ax.grid()
... ax.text(0.5, 0.8, {0:'A', 1:'B', 2:'C', 3:'D'}[dim])
...
>>> simu.evolve(
... initOps=[sim.InitSex()]+
... [sim.InitGenotype(freq=[0.1*(x+1), 1-0.1*(x+1)], loci=x) for x in range(4)],
... matingScheme=sim.RandomMating(),
... postOps=[
... sim.Stat(alleleFreq=range(4)),
... VarPlotter('[alleleFreq[x][0] for x in range(4)]', byDim=True,
... update=10, saveAs='log/varplot_byDim.png',
... legend=['Replicate %d' % x for x in range(3)],
... set_ylabel_ylabel='Allele frequency',
... set_ylim_bottom=0, set_ylim_top=1,
... set_title_label_dim=['Genetic drift, freq=%.1f' % ((x+1)*0.10) for x in range(4)],
... plot_c_rep=['red', 'blue', 'black'],
... plot_linestyle_rep=['-', '-.', ':'],
... figure_figsize=(10,8),
... plotHook = mat_drawFrame,
... ),
... ],
... gen=100
... )
/Users/bpeng1/bin/anaconda/lib/python2.7/site-packages/matplotlib/axes/_subplots.py:69: MatplotlibDeprecationWarning: The use of 0 (which ends up being the _last_ sub-plot) is deprecated in 1.4 and will raise an error in 1.5
mplDeprecation)
(100L, 100L, 100L)
Figure: Allele frequency trajectories separated by loci
Operator plotter.ScatterPlotter plots individuals in all or selected (virtual) subpopulations in a 2-D plot, using values at two information fields as their x- and y-axis. In the most simplified form,
InfoPlotter(infoFields=['x', 'y'])
will plot all individuals according their values of information fields x and y. Additional parameters such as pch, col, and cex can be used to control the shape, color and size of the points.
What makes this operator useful is its ability to differentiate points (individuals) by (virtual) subpopulations (VSPs). If a list of VSPs are given, points representing individuals from these VSPs will be plotted with different colors and shapes. Because simulations that keep track of multiple information fields are usually complicated, let us simulate something interesting and examine Example ScatterPlotter in details.
At the beginning of this example, all individuals are scattered randomly with x and y being their physical locations. We use anc to record Individual ancestry and assign 0 and 1 each to half of the population. During evolution,
Offspring ancestry values are the average of their parents.
Offspring with higher ancestry value tend to move to the right. More specifically, locations of an offspring will be
where and are locations of parents, and are ancestry values of the parents, and are a random number with normal distribution.
An ScatterPlotter is used to plot the physical location of all individuals. Individual ancestries are divided into five regions (0, 0.2, 0.4, 0.6, 0.8, 1) indicated by small to larger points. MALE and female individuals are plotted by different symbol. This scripts uses the following techniques:
VSPs 0 and 4 appear at the beginning of generation 0, VSP 2 appears at the end of generation 0, and VSP 1 and 3 appear at the end of generation 1. Figure fig_ScatterPlotter displays a figure at the begging of generation 2.
Example: Use ScatterPlotter to plot ancestry of individuals with geographic information.
>>> import simuPOP as sim
>>> import simuPOP as sim
>>> from simuPOP.plotter import ScatterPlotter
>>> import random
>>> pop = sim.Population([500], infoFields=['x', 'y', 'anc'])
>>> # Defines VSP 0, 1, 2, 3, 4 by anc.
>>> pop.setVirtualSplitter(sim.InfoSplitter('anc', cutoff=[0.2, 0.4, 0.6, 0.8]))
>>> #
>>> def passInfo(x, y, anc):
... 'Parental fields will be passed as tuples'
... off_anc = (anc[0] + anc[1])/2.
... off_x = (x[0] + x[1])/2 + random.normalvariate(off_anc - 0.5, 0.1)
... off_y = (y[0] + y[1])/2 + random.normalvariate(0, 0.1)
... return off_x, off_y, off_anc
...
>>> pop.evolve(
... initOps=[
... sim.InitSex(),
... # random geographic location
... sim.InitInfo(random.random, infoFields=['x', 'y']),
... # anc is 0 or 1
... sim.InitInfo(lambda : random.randint(0, 1), infoFields='anc')
... ],
... matingScheme=sim.RandomMating(ops=[
... sim.MendelianGenoTransmitter(),
... sim.PyTagger(passInfo)]),
... postOps=[
... ScatterPlotter(['x', 'y'],
... saveAs = 'log/ScatterPlotter.png',
... subPops = [(0, 0), (0, 1), (0, 2), (0, 3), (0, 4)],
... set_ylim_bottom = 0, set_ylim_top=1.2,
... set_title_label = "!'Ancestry distribution of individuals at generation %d' % gen",
... legend = ['anc < 0.2', '0.2 <= anc < 0.4', '0.4 <= anc < 0.6',
... '0.6 <= anc < 0.8', '0.8 <= anc'],
... ),
...
... ],
... gen = 5,
... )
5L
Figure: Plot of individuals with ancestry marked by different colors