QC for AraThal DFE #1344

chriscrsmith · 2022-08-29T21:22:15Z

PR for new model:
#1324

Original paper:
Huber et al., 2018

Parameter values:
I took the gamma parameters from Supplementary Table 4, the genome-wide, additive-only model for A. LYRATA- due to challenges with simulating with selfing for A. thaliana. Not certain this is what the group had in mind, also there might be updates on simulating selfing.

Potential issues:

I used the estimates from A. Lyrata.
I'm only 90% certain there were zero neutral mutations estimated as part of this DFE.

QC'er requests:
Kirk Lohmueller :)

petrelharp · 2023-05-16T16:38:27Z

I'm having a look at this. I am not seeing why we should use the A. lyrata-estimated DFE, when (a) the species listed is A. thaliana, and (b) we have a DFE for A. thaliana? It's true we do need to do something about simulating selfing, but isn't that orthogonal to this?

petrelharp · 2023-05-16T16:44:48Z

Hm, also - the DFE is estimated only from nonsynonymous changes. So, to complement this we need an estimate of what proportion of mutations are synonymous (and we'll put these into the DFE as neutral).

petrelharp · 2023-05-16T16:49:04Z

The A. thaliana data in the paper are from Durvasula et al; but they don't report numbers of synonymous/nonsynonymous mutations either; hmph.

petrelharp · 2023-05-16T16:52:35Z

Hm, furthermore, unless I'm wrong, I think we actually want the percent of new mutations that are synonymous, which will be lower than the percent of segregating mutations that are synonymous.

petrelharp · 2023-05-16T17:40:26Z

Proposal: just use the proportion of possible coding changes that are synonymous, which is 330/1728:

# from https://gist.github.com/juanfal/09d7fb53bd367742127e17284b9c47bf
codontab = {
    'TCA': 'S',    # Serina
    'TCC': 'S',    # Serina
    'TCG': 'S',    # Serina
    'TCT': 'S',    # Serina
    'TTC': 'F',    # Fenilalanina
    'TTT': 'F',    # Fenilalanina
    'TTA': 'L',    # Leucina
    'TTG': 'L',    # Leucina
    'TAC': 'Y',    # Tirosina
    'TAT': 'Y',    # Tirosina
    'TAA': '*',    # Stop
    'TAG': '*',    # Stop
    'TGC': 'C',    # Cisteina
    'TGT': 'C',    # Cisteina
    'TGA': '*',    # Stop
    'TGG': 'W',    # Triptofano
    'CTA': 'L',    # Leucina
    'CTC': 'L',    # Leucina
    'CTG': 'L',    # Leucina
    'CTT': 'L',    # Leucina
    'CCA': 'P',    # Prolina
    'CCC': 'P',    # Prolina
    'CCG': 'P',    # Prolina
    'CCT': 'P',    # Prolina
    'CAC': 'H',    # Histidina
    'CAT': 'H',    # Histidina
    'CAA': 'Q',    # Glutamina
    'CAG': 'Q',    # Glutamina
    'CGA': 'R',    # Arginina
    'CGC': 'R',    # Arginina
    'CGG': 'R',    # Arginina
    'CGT': 'R',    # Arginina
    'ATA': 'I',    # Isoleucina
    'ATC': 'I',    # Isoleucina
    'ATT': 'I',    # Isoleucina
    'ATG': 'M',    # Methionina
    'ACA': 'T',    # Treonina
    'ACC': 'T',    # Treonina
    'ACG': 'T',    # Treonina
    'ACT': 'T',    # Treonina
    'AAC': 'N',    # Asparagina
    'AAT': 'N',    # Asparagina
    'AAA': 'K',    # Lisina
    'AAG': 'K',    # Lisina
    'AGC': 'S',    # Serina
    'AGT': 'S',    # Serina
    'AGA': 'R',    # Arginina
    'AGG': 'R',    # Arginina
    'GTA': 'V',    # Valina
    'GTC': 'V',    # Valina
    'GTG': 'V',    # Valina
    'GTT': 'V',    # Valina
    'GCA': 'A',    # Alanina
    'GCC': 'A',    # Alanina
    'GCG': 'A',    # Alanina
    'GCT': 'A',    # Alanina
    'GAC': 'D',    # Acido Aspartico
    'GAT': 'D',    # Acido Aspartico
    'GAA': 'E',    # Acido Glutamico
    'GAG': 'E',    # Acido Glutamico
    'GGA': 'G',    # Glicina
    'GGC': 'G',    # Glicina
    'GGG': 'G',    # Glicina
    'GGT': 'G'     # Glicina
}

syn = {}
for x in codontab:
  syn[x] = 0
  for k in range(3):
   for nuc in n:
      y = list(x)
      y[k] = nuc
      other = codontab["".join(y)]
      if other == codontab[x]:
          syn[x] += 1

# {'TCA': 6, 'TCC': 6, 'TCG': 6, 'TCT': 6, 'TTC': 4, 'TTT': 4, 'TTA': 5, 'TTG': 5, 'TAC': 4, 'TAT': 4, 'TAA': 5, 'TAG': 4, 'TGC': 4, 'TGT': 4, 'TGA': 4, 'TGG': 3, 'CTA': 7, 'CTC': 6, 'CTG': 7, 'CTT': 6, 'CCA': 6, 'CCC': 6, 'CCG': 6, 'CCT': 6, 'CAC': 4, 'CAT': 4, 'CAA': 4, 'CAG': 4, 'CGA': 7, 'CGC': 6, 'CGG': 7, 'CGT': 6, 'ATA': 5, 'ATC': 5, 'ATT': 5, 'ATG': 3, 'ACA': 6, 'ACC': 6, 'ACG': 6, 'ACT': 6, 'AAC': 4, 'AAT': 4, 'AAA': 4, 'AAG': 4, 'AGC': 4, 'AGT': 4, 'AGA': 5, 'AGG': 5, 'GTA': 6, 'GTC': 6, 'GTG': 6, 'GTT': 6, 'GCA': 6, 'GCC': 6, 'GCG': 6, 'GCT': 6, 'GAC': 4, 'GAT': 4, 'GAA': 4, 'GAG': 4, 'GGA': 6, 'GGC': 6, 'GGG': 6, 'GGT': 6}

prop_syn = sum(syn.values()) / ((3 ** 3) * (4 ** 3))
sum(syn.values()), ((3 ** 3) * (4 ** 3)), prop_syn
# (330, 1728, 0.1909722222222222)

chriscrsmith · 2023-05-17T22:33:04Z

This is sounding good to me

petrelharp · 2023-05-23T03:37:09Z

On slack, @ckyriazis says:

In the Huber et al 2018 paper, I believe they actually assume the same DFE for A. thaliana and A. lyrata. So it shouldnt make a difference

In that paper, they also assume a ratio of nonsynonymous to synonymous sites of 2.31:1, as estimated by Huber et al 2017

petrelharp · 2023-05-23T15:54:18Z

Well, I guess we should have read the paper more carefully. Indeed:

petrelharp · 2023-05-23T16:44:22Z

Conclusion from the meeting now: implement the "genome-wide both species h-s relationship", following how @ckyriazis discretized this (@klohmueller to provide a relference).

chriscrsmith added the Model QC Quality control process for model addition label Aug 29, 2022

chriscrsmith mentioned this issue Aug 29, 2022

adding DFE for Arabidopsis #1324

Merged

chriscrsmith mentioned this issue Sep 6, 2022

AraTha DFE #1321

Closed

nspope added this to the Version 0.2.0 milestone Sep 20, 2022

petrelharp added a commit to petrelharp/stdpopsim that referenced this issue May 16, 2023

AraTha DFE QC, closes popsim-consortium#1344

7390656

petrelharp mentioned this issue May 16, 2023

AraTha DFE QC #1479

Draft

petrelharp mentioned this issue Jul 2, 2024

Release 0.2.1 checklist #1565

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QC for AraThal DFE #1344

QC for AraThal DFE #1344

chriscrsmith commented Aug 29, 2022

petrelharp commented May 16, 2023

petrelharp commented May 16, 2023

petrelharp commented May 16, 2023

petrelharp commented May 16, 2023

petrelharp commented May 16, 2023 •

edited

Loading

chriscrsmith commented May 17, 2023

petrelharp commented May 23, 2023

petrelharp commented May 23, 2023

petrelharp commented May 23, 2023

QC for AraThal DFE #1344

QC for AraThal DFE #1344

Comments

chriscrsmith commented Aug 29, 2022

petrelharp commented May 16, 2023

petrelharp commented May 16, 2023

petrelharp commented May 16, 2023

petrelharp commented May 16, 2023

petrelharp commented May 16, 2023 • edited Loading

chriscrsmith commented May 17, 2023

petrelharp commented May 23, 2023

petrelharp commented May 23, 2023

petrelharp commented May 23, 2023

petrelharp commented May 16, 2023 •

edited

Loading