A number of during-mating operators are defined to transmit genotype from parent(s) to offspring. They are rarely used or even seen directly because they are used as genotype transmitters of mating schemes.
The generic genotype transmitters do not handle genetic recombination. A genotype transmitter Recombinator is provided for such purposes, and can be used with RandomMating and SelfMating (replace MendelianGenoTransmitter and SelfingGenoTransmitter used in these mating schemes).
Recombination rate is implemented between adjacent markers. There can be only one recombination event between adjacent markers no matter how far apart they are located on a chromosome. In practise, a Recombinator goes along chromosomes and determine, between each adjacent loci, whether or not a recombination happens.
Recombination rates could be specified in the following ways:
If a single recombination rate is specified through paramter rates, it will be the recombination rate between all adjacent loci, regardless of loci position.
If recombination happens only after certain loci, you can specify these loci using parameter loci. For example,
Recombinator(rates=0.1, loci=[2, 5])
recombines a chromosome only after loci 2 (between 2 and 3) and 5 (between 5 and 6).
If parameter loci is given with a list of loci, different recombination rate can be given to each of them. The two lists should have the same length. For example
Recombinator(rates=[0.1, 0.05], loci=[2, 5])
uses two different recombination rates after loci 2 and 5.
If parameter loci is not given (default to loci=ALL_AVAIL) but a list of recombination rates is assigned, the rates will be assigned to each locus. The length of prameter rates should equal to total number of loci but the recombiantion rates for the locus at the end of each chromosome will be ignored (assumed to be 0.5). For example
Recombinator(rates=[0.1]*5 + [0.2]*5)
uses two different recombination rates for two chromosomes with 5 loci.
If recombination rates vary across your chromosomes, a long list of rate and loci may be needed to specify recombination rates one by one. An alternative method is to specify a recombination intensity. Recombination rate between two adjacent loci is calculated as the product of this intensity and distance between them. For example, if you apply operator
Recombinator(intensity=0.1)
to a population
Population(size=100, loci=[4], lociPos=[0.1, 0.2, 0.4, 0.8])
The recombination rates between adjacent markers will be 0.1*0.1, 0.1*0.2 and 0.1*0.4 respectively.
Example: Genetic recombination at all and selected loci
>>> import simuPOP as sim
>>> simu = sim.Simulator(sim.Population(size=[1000], loci=[100]),
... rep=2)
>>> simu.evolve(
... initOps=[
... sim.InitSex(),
... sim.InitGenotype(genotype=[0]*100 + [1]*100)
... ],
... matingScheme=sim.RandomMating(ops = [
... sim.Recombinator(rates=0.01, reps=0),
... sim.Recombinator(rates=[0.01]*10, loci=range(50, 60), reps=1),
... ]),
... postOps=[
... sim.Stat(LD=[[40, 55], [60, 70]]),
... sim.PyEval(r'"%d:\t%.3f\t%.3f\t" % (rep, LD_prime[40][55], LD_prime[60][70])'),
... sim.PyOutput('\n', reps=-1)
... ],
... gen = 5
... )
0: 0.742 0.816 1: 0.898 1.000
0: 0.649 0.750 1: 0.848 1.000
0: 0.598 0.683 1: 0.816 1.000
0: 0.476 0.642 1: 0.794 1.000
0: 0.413 0.597 1: 0.782 1.000
(5L, 5L)
Example recRate demonstrates how to specify recombination rates for all loci or for specified loci. In this example, two replicates of a population are evolved, subject to two different Recombinators. The first Recombinator applies the same recombination rate between all adjacent loci, and the second Recombinator recombines only after loci 50 - 59. Because there is no recombination event between loci 60 and 70 for the second replicate, linkage disequilibrium values between these two loci does not decrease as what happens in the first replicate.
Example: Genetic recombination rates specified by intensity
>>> import simuPOP as sim
>>> pop = sim.Population(size=[1000], loci=3, lociPos=[0, 1, 1.1])
>>> pop.evolve(
... initOps=[
... sim.InitSex(),
... sim.InitGenotype(genotype=[0]*3 + [1]*3)
... ],
... matingScheme=sim.RandomMating(ops=sim.Recombinator(intensity=0.01)),
... postOps=[
... sim.Stat(LD=[[0, 1], [1, 2]]),
... sim.PyEval(r'"%.3f\t%.3f\n" % (LD_prime[0][1], LD_prime[1][2])', step=10)
... ],
... gen = 50
... )
0.988 0.998
0.912 0.996
0.836 0.991
0.896 0.982
0.814 0.991
50L
Example recIntensity demonstrates the use of the intensity parameter. In this example, the distances between the first two loci and the latter two loci are 1 and 0.1 respectively. This leads recombination rates 0.01 and 0.001 respectively with a recombination intensity 0.01. Consequently, LD between the first two loci decay much faster than the latter two.
If more advanced recombination model is desired, a customized genotype transmitter can be used. For example, Example sexSpecificRec uses two Recombinators to implement sex-specific recombination.
Note
Both loci positions and recombination intensity are unitless. You can assume different unit for loci position and recombination intensity as long as the resulting recombination rate makes sense.
simuPOP uses the Holliday junction model to simulate gene conversion. This model treats recombination and conversion as a unified process. The key features of this model is
In practise, gene conversion can be considered as a double recombination event. That is to say, when a recombination event happens, it has certain probability to trigger a second recombination event along the chromosome. The distance between the two locations where recombination events happen is the tract length of this conversion event.
The probability at which gene conversion happens, and how tract length is determined is specify using parameter convMode of a Recombinator. This parameter can be
Note that
Example conversion compares two Recombinators. The first Recombinator is a regular Recombinator that recombine between loci 50 and 51. The second Recombinator is a conversion operator because every recombination event will become a conversion event (prob=1). Because a second recombination event will surely happen between loci 60 and 61, there will be either no or double recombination events between loci 40, 70. LD between these two loci therefore does not decrease, although LD between locus 55 and these two loci will decay.
Example: Gene conversion
>>> import simuPOP as sim
>>> simu = sim.Simulator(sim.Population(size=[1000], loci=[100]),
... rep=2)
>>> simu.evolve(
... initOps=[
... sim.InitSex(),
... sim.InitGenotype(genotype=[0]*100 + [1]*100)
... ],
... matingScheme=sim.RandomMating(ops=[
... sim.Recombinator(rates=0.01, loci=50, reps=0),
... sim.Recombinator(rates=0.01, loci=50, reps=1, convMode=(sim.NUM_MARKERS, 1, 10)),
... ]),
... postOps=[
... sim.Stat(LD=[[40, 55], [40, 70]]),
... sim.PyEval(r'"%d:\t%.3f\t%.3f\t" % (rep, LD_prime[40][55], LD_prime[40][70])'),
... sim.PyOutput('\n', reps=-1)
... ],
... gen = 5
... )
0: 0.980 0.980 1: 0.978 1.000
0: 0.966 0.966 1: 0.976 1.000
0: 0.950 0.950 1: 0.968 1.000
0: 0.922 0.922 1: 0.970 1.000
0: 0.913 0.913 1: 0.969 1.000
(5L, 5L)
To understand the evolutionary history of a simulated population, it is sometimes needed to track down all ancestral recombination events. In order to do that, you will first need to give an unique ID to each individual so that you could make sense of the dumped recombination events. Although this is routinely done using operator IdTagger (see example IdTagger for details), it is a little tricky here because you need to place the during- mating IdTagger before a Recombinator in the ops parameter of a mating scheme so that offspring ID could be set and outputted correctly.
After setting the name of the ID field (usually ind_id) to the infoField parameter of a Recombinator, it can dump a list of recombinatin events (loci after which recombinatin events happened) for each set of homologous chromosomes of an offspring. Each line is in the format of
offspringID parentID startingPloidy rec1 rec2 ....
Example trackRec gives an example how the output looks like.
Example: Tracking all recombination events
>>> import simuPOP as sim
>>> pop = sim.Population(1000, loci=[1000, 2000], infoFields='ind_id')
>>> pop.evolve(
... initOps=[
... sim.InitSex(),
... sim.IdTagger(),
... ],
... matingScheme=sim.RandomMating(ops = [
... sim.IdTagger(),
... sim.Recombinator(rates=0.001, output='>>rec.log', infoFields='ind_id')]),
... gen = 5
... )
5L
>>> rec = open('rec.log')
>>> # print the first three lines of the log file
>>> print(''.join(rec.readlines()[:4]))
1001 642 0 381 999 1490
1001 250 1 908 999 1315 2134
1002 847 1 999
1002 91 0 975 999 1245 2546