Module simuPOP.plotter

The simuPOP.plotter module defines a few utility functions and Python operators that help you plot variables and information fields during evolution. A number of operators are defined that

  • Operator plotter.VarPlotter: Plot a dynamically evaluated expression with its history. Each expression and its history form a line in the plot. Multiple lines will be plotted for multiple replicates and/or for each element of the expression (if the evaluated value of the expression is a sequence), with options to separate lines to different subplots.
  • Operator plotter.ScatterPlotter: Plot individuals in specified (virtual) subpopulations using values at two information fields as x and y axes. individuals belonging to different (virtual) subpopulations will be plotted with different colors and shapes.
  • Operator plotter.InfoPlotter: Using a R function such as hist and qqnorm to plot one or more information fields of individuala in one or more (virtual) subpopulations. Two specialized operators plotter.HistPlotter and plotter.QQPlotter are provided to plot the histograms and qq plots. Other functions could also be used, and it is even possible to draw a figure completely by your own (with stratified data provided to you by this operator).
  • Operator plotter.BoxPlotter: This operator uses R function boxplot to plot boxplots of data of one or more information fields of individuals in one ore more (virtual) subpopulations. The whiskers could be grouped by information field or subpopulations.

These operators are derived from class PyOperator and call R plot functions when they are applied to a population. For example, operator plotter.VarPlotter collects expression values and use functions plot and lines to plot the data, with help from other functions such as par (device property), dev.print (save figure to files) and legend (add legend). Some functions are called multiple times for different replicate, subpopulation or information fields.

Derived keyword arguments *

One of the most interesting feature of this module is its use of derived keyword parameters to send arbitrary parameters to the underlying R functions, which usually accept a large number of parameters to customize every aspect of a figure. A derived keyword argument is an argument that is prefixed with a function name and/or suffixed by an iterator name. The former specifies to which underlying R function this parameter will be passed to; the latter allows the users to specify a list of values that will be passed, for example, to lines representing different replicates. For example, parameter par_mar=[1]*4 will pass mar=[1]*4 to R function par, and lty_rep=[1, 2, 3] will pass lty=1, lty=2 and lty=3 to different replicates. A class usually has one or two default functions (such as plot, lines) to which keyword aguments without function prefix will be sent.

In addition, the values of these keyword arguments could vary during evolution. More specifically, if the value is a string with a leading exclamation mark (!), the remaining string will be considered as an expression. This expression will be evaluated against the current population during evolution and the return value will become the value of the parameter at that generation. For example, keyword parameter main=''!'Allele frequency at generation %d' % gen'' will become main='Allele frequency at generation 10' at generation 10.

Plot of expressions and their histories (operator plotter.VarPlotter)

Class plotter.VarPlotter plots the current and historical values of a Python expression (expr), which are evalulated (against each population’s local namespace) and saved during evolution. The return value of the expression can be a number or a sequence, but should have the same type and length across all replicates and generations. Histories of each value (or each item in the returned sequence) of each replicate form a line, with generation numbers as its x-axis. Number of lines will be the number of replicates multiplied by dimension of the expression. Although complete histories are usually saved, you can use parameter win to save histories only within the last win generations.

simuPOP version 1.1.6 and earlier supports both rpy and matplotlib as the underlying plotting library. However, because of bugs in rpy2 and difficulties in supporting rpy, rpy2 and matplotlib, rpy/rpy2 support is removed in simuPOP 1.1.7. Please use simuPOP 1.1.6 if you are interested in using rpy2.

Except for the first generation where no line could be drawn, a figure will be drawn after this operator is applied to the last specified replicate (parameter reps could be used to specify a subset of replicates). For example, although linkage disequilibrium values between the first two loci are evaluated and saved at the end of generations 0, 5, 10, ..., (step=5) figures are only drawn at generations 40 and 80 (update=40) in Exampe varplotter. This example also demonstrates the use of parameters saveAs and legend. By given a filename rpy.png to parameter saveAs, this operator will save figures (named rpy_40.png and rpy_80.png) after they are drawn.

Example: Use rpy or matplotlib to plot an expression

>>> import simuPOP as sim
>>> from simuPOP.plotter import VarPlotter
>>> pop = sim.Population(size=1000, loci=2)
>>> simu = sim.Simulator(pop, rep=3)
>>> simu.evolve(
...     initOps=[
...         sim.InitSex(),
...         sim.InitGenotype(genotype=[1, 2, 2, 1])
...     ],
...     matingScheme=sim.RandomMating(ops=sim.Recombinator(rates=0.01)),
...     postOps=[
...         sim.Stat(LD=[0, 1]),
...         #
...         VarPlotter('LD[0][1]', step=5, update=40, saveAs='log/varplot.png',
...             legend=['Replicate %d' % x for x in range(3)],
...             set_ylabel_ylabel='LD between marker 1 and 2',
...             set_title_label='LD decay',
...             set_ylim_bottom=0, set_ylim_top=0.25,
...             plot_linestyle_rep=['-', ':', '-.'],
...         ),
...     ],
...     gen=100
... )
(100L, 100L, 100L)

Download VarPlotter.py

Parameters after legend (xlab, ylab, ylim, main, ...) deserve more attention here. These parameters are derived keyword arguments because they are not defined by VarPlotter. Parameters without prefix are passed directly to the R functions plot and line. They could be used to customize line type (lty), color (col), title (main), limits of x and y axes (xlim and ylim) and many other graphical features (see R manual for details). If multiple lines are drawn, a list of values could be applied to these lines if you add _rep (for each replicate) or _dim (for each item of a sequence) after the name of the parameter. For example, lty_rep=[1, 2, 3] is used in Example varplotter to pass parameters lty=1, lty=2 and lty=3 to lines for three replicates. Suffix _repdim can also be used to specify values for every replication and dimension. Figure fig_rpy displayed rpy_80.png that is saved at generation 80 for this example.

Figure: rpy_80.png saved at generation 80 for Example

../_images/varplot_807.png

varplotter

If the expression is multidimensional, the number of lines can be large and it is often desired to separate these lines into subplots. This can be done by parameters byRep or byDim. The former plots lines replicate by replicate and the latter does it dimension by dimension. For example, Example varPlotByRep and varPlotByDim both have three replicates and the expression has allele frequency for four loci. The total number of lines is therefore 12. In Example varPlotByRep, these lines are separated to three subplots, replicate by replicate, with different titles (parameter main_rep). In each subplot, allele frequency trajectories (histories) for different loci are plotted in different color (parameter col_dim). The last saved figure (rpy_byRep_90.png) is displayed in Figure fig_rpyByRep. In Example varPlotByDim, these lines are separated to four subplots, locus by locus, with different titles (parameter main_dim). In each subplot, allele frequency trajectories (histories) for different loci are plotted in different color (parameter col_rep) and line type (parameter lty_rep). The last saved figure (rpy_byDim_90.png) is displayed in Figure fig_rpyByDim.

Example: Separate figures by replicate

>>> import simuPOP as sim
>>> import simuPOP as sim
>>> from simuPOP.plotter import VarPlotter
>>> pop = sim.Population(size=1000, loci=1*4)
>>> simu = sim.Simulator(pop, rep=3)
>>> simu.evolve(
...     initOps=[sim.InitSex()] +
...         [sim.InitGenotype(freq=[0.1*(x+1), 1-0.1*(x+1)], loci=x) for x in range(4)],
...     matingScheme=sim.RandomMating(),
...     postOps=[
...         sim.Stat(alleleFreq=range(4)),
...         VarPlotter('[alleleFreq[x][0] for x in range(4)]', byRep=True,
...             update=10, saveAs='log/varplot_byRep.png',
...             figure_figsize=(10, 8),
...             legend=['Locus %d' % x for x in range(4)],
...             set_ylabel_ylabel='Allele frequency',
...             set_ylim_bottom=0, set_ylim_top=1,
...             set_title_label_rep=['Genetic drift, replicate %d' % x for x in range(3)],
...         ),
...     ],
...     gen=100
... )
(100L, 100L, 100L)

Download varPlotByRep.py

Figure: Allele frequency trajectories separated by replicates

../_images/varplot_byRep_908.png

Example varPlotByDim also demonstrates some advanced features of this plotter that allow further customization of the figures. More specifically,

  • Function-specific parameters can be passed to the underlying R function by prefixing function names to parameter names. For example, plot_axis=False is used to pass axis=False to the r.plot function (and not to function lines which does not accept this parameter).
  • Several hook function can be defined and passed to parameters preHook, postHook and plotHook, which will be called, respectively, before a figure is drawn, after a figure is drawn, and after each r.plot call. Example varPlotByDim uses a plotHook function to draw axes of the plots and call mtext to add texts to the margins.

Example: Separate figures by Dimension

>>> import simuPOP as sim
>>> import simuPOP as sim
>>> from simuPOP.plotter import VarPlotter
>>> pop = sim.Population(size=1000, loci=1*4)
>>> simu = sim.Simulator(pop, rep=3)
>>> def rpy_drawFrame(r, dim=None, **kwargs):
...     '''Draw a frame around subplot dim. Parameter r is defined in the rpy
...     module and is used for calling R functions. Parameter dim is the dimension
...     index. Other parameters are ignored.
...     '''
...     r.axis(1)
...     r.axis(2)
...     r.grid()
...     r.mtext({0:'A', 1:'B', 2:'C', 3:'D'}[dim], adj=1)
...
>>> def mat_drawFrame(ax, dim=None, **kwargs):
...     '''Draw a frame around subplot dim. Parameter r is defined in the rpy
...     module and is used for calling R functions. Parameter dim is the dimension
...     index. Other parameters are ignored.
...     '''
...     ax.grid()
...     ax.text(0.5, 0.8, {0:'A', 1:'B', 2:'C', 3:'D'}[dim])
...
>>> simu.evolve(
...     initOps=[sim.InitSex()]+
...         [sim.InitGenotype(freq=[0.1*(x+1), 1-0.1*(x+1)], loci=x) for x in range(4)],
...     matingScheme=sim.RandomMating(),
...     postOps=[
...         sim.Stat(alleleFreq=range(4)),
...         VarPlotter('[alleleFreq[x][0] for x in range(4)]', byDim=True,
...             update=10, saveAs='log/varplot_byDim.png',
...             legend=['Replicate %d' % x for x in range(3)],
...             set_ylabel_ylabel='Allele frequency',
...             set_ylim_bottom=0, set_ylim_top=1,
...             set_title_label_dim=['Genetic drift, freq=%.1f' % ((x+1)*0.10) for x in range(4)],
...             plot_c_rep=['red', 'blue', 'black'],
...             plot_linestyle_rep=['-', '-.', ':'],
...             figure_figsize=(10,8),
...             plotHook = mat_drawFrame,
...         ),
...     ],
...     gen=100
... )
/Users/bpeng1/bin/anaconda/lib/python2.7/site-packages/matplotlib/axes/_subplots.py:69: MatplotlibDeprecationWarning: The use of 0 (which ends up being the _last_ sub-plot) is deprecated in 1.4 and will raise an error in 1.5
  mplDeprecation)
(100L, 100L, 100L)

Download varPlotByDim.py

Figure: Allele frequency trajectories separated by loci

../_images/varplot_byDim_908.png

Scatter plots (operator plotter.ScatterPlotter)

Operator plotter.ScatterPlotter plots individuals in all or selected (virtual) subpopulations in a 2-D plot, using values at two information fields as their x- and y-axis. In the most simplified form,

InfoPlotter(infoFields=['x', 'y'])

will plot all individuals according their values of information fields x and y. Additional parameters such as pch, col, and cex can be used to control the shape, color and size of the points.

What makes this operator useful is its ability to differentiate points (individuals) by (virtual) subpopulations (VSPs). If a list of VSPs are given, points representing individuals from these VSPs will be plotted with different colors and shapes. Because simulations that keep track of multiple information fields are usually complicated, let us simulate something interesting and examine Example ScatterPlotter in details.

At the beginning of this example, all individuals are scattered randomly with x and y being their physical locations. We use anc to record Individual ancestry and assign 0 and 1 each to half of the population. During evolution,

  • Offspring ancestry values are the average of their parents.

  • Offspring with higher ancestry value tend to move to the right. More specifically, locations of an offspring will be

    \frac{\left(x_{1}+x_{2}\right)}{2}+N\left(\frac{a_{1}+a_{2}}{2}-0.5,0.1\right),\frac{\left(y_{1}+y_{2}\right)}{2}+N\left(0,0.1\right)

    where \left(x_{1},y_{1}\right) and \left(x_{2},y_{2}\right) are locations of parents, a_{1} and a_{2} are ancestry values of the parents, and N\left(a,b\right) are a random number with normal distribution.

An ScatterPlotter is used to plot the physical location of all individuals. Individual ancestries are divided into five regions (0, 0.2, 0.4, 0.6, 0.8, 1) indicated by small to larger points. MALE and female individuals are plotted by different symbol. This scripts uses the following techniques:

  • Set individual information fields randomly using setIndInfo.
  • Define virtual subpopulations using a InfoSplitter.
  • Use PyTagger to calculate offspring information fields from parental fields.
  • Mark individuals in different VSPs using parameters col_sp and cex_sp.
  • Use plot_axes=False and par_mar=[0, 0, 2, 0] to pass parameters axes=False and mar=[0, 0, 2, 0] to functions plot and par respectively.

VSPs 0 and 4 appear at the beginning of generation 0, VSP 2 appears at the end of generation 0, and VSP 1 and 3 appear at the end of generation 1. Figure fig_ScatterPlotter displays a figure at the begging of generation 2.

Example: Use ScatterPlotter to plot ancestry of individuals with geographic information.

>>> import simuPOP as sim
>>> import simuPOP as sim
>>> from simuPOP.plotter import ScatterPlotter
>>> import random
>>> pop = sim.Population([500], infoFields=['x', 'y', 'anc'])
>>> # Defines VSP 0, 1, 2, 3, 4 by anc.
>>> pop.setVirtualSplitter(sim.InfoSplitter('anc', cutoff=[0.2, 0.4, 0.6, 0.8]))
>>> #
>>> def passInfo(x, y, anc):
...     'Parental fields will be passed as tuples'
...     off_anc = (anc[0] + anc[1])/2.
...     off_x = (x[0] + x[1])/2 + random.normalvariate(off_anc - 0.5, 0.1)
...     off_y = (y[0] + y[1])/2 + random.normalvariate(0, 0.1)
...     return off_x, off_y, off_anc
...
>>> pop.evolve(
...     initOps=[
...         sim.InitSex(),
...         # random geographic location
...         sim.InitInfo(random.random, infoFields=['x', 'y']),
...         # anc is 0 or 1
...         sim.InitInfo(lambda : random.randint(0, 1), infoFields='anc')
...     ],
...     matingScheme=sim.RandomMating(ops=[
...         sim.MendelianGenoTransmitter(),
...         sim.PyTagger(passInfo)]),
...     postOps=[
...         ScatterPlotter(['x', 'y'],
...             saveAs = 'log/ScatterPlotter.png',
...             subPops = [(0, 0), (0, 1), (0, 2), (0, 3), (0, 4)],
...             set_ylim_bottom = 0, set_ylim_top=1.2,
...             set_title_label = "!'Ancestry distribution of individuals at generation %d' % gen",
...             legend = ['anc < 0.2', '0.2 <= anc < 0.4', '0.4 <= anc < 0.6',
...                 '0.6 <= anc < 0.8', '0.8 <= anc'],
...         ),
...
...     ],
...     gen = 5,
... )
5L

Download ScatterPlotter.py

Figure: Plot of individuals with ancestry marked by different colors

../_images/ScatterPlotter_2_08.png