Last updated: 2022-03-30

Checks: 7 0

Knit directory: R_gene_analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200917) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 2f81d73. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Untracked files:
    Untracked:  FigS5.png
    Untracked:  data/Brec_R1.txt
    Untracked:  data/Brec_R2.txt
    Untracked:  data/CR15_R1.txt
    Untracked:  data/CR15_R2.txt
    Untracked:  data/CR_14_R1.txt
    Untracked:  data/CR_14_R2.txt
    Untracked:  data/KS_R1.txt
    Untracked:  data/KS_R2.txt
    Untracked:  data/NBS_PAV.txt.gz
    Untracked:  data/NLR_PAV_GD.txt
    Untracked:  data/NLR_PAV_GM.txt
    Untracked:  data/PAVs_newick.txt
    Untracked:  data/PPR1.txt
    Untracked:  data/PPR2.txt
    Untracked:  data/SNPs_lee.id.biallic_maf_0.05_geno_0.1.vcf.gz
    Untracked:  data/SNPs_newick.txt
    Untracked:  data/bac.txt
    Untracked:  data/brown.txt
    Untracked:  data/cy3.txt
    Untracked:  data/cy5.txt
    Untracked:  data/early.txt
    Untracked:  data/flowerings.txt
    Untracked:  data/foregeye.txt
    Untracked:  data/height.txt
    Untracked:  data/inds.txt
    Untracked:  data/late.txt
    Untracked:  data/mature.txt
    Untracked:  data/motting.txt
    Untracked:  data/mvp.kin.bin
    Untracked:  data/mvp.kin.desc
    Untracked:  data/oil.txt
    Untracked:  data/paperinds.txt
    Untracked:  data/pdh.txt
    Untracked:  data/protein.txt
    Untracked:  data/rust_tan.txt
    Untracked:  data/salt.txt
    Untracked:  data/seedq.txt
    Untracked:  data/seedweight.txt
    Untracked:  data/snp.gds
    Untracked:  data/stem_termination.txt
    Untracked:  data/sudden.txt
    Untracked:  data/virus.txt
    Untracked:  output/FigS5.png
    Untracked:  output/NLR_boxplot_saved.Rds
    Untracked:  output/merged_fig1.png

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/total_numbers.Rmd) and HTML (docs/total_numbers.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 2f81d73 Philipp Bayer 2022-03-30 New Fig 1
html dd05d73 Philipp Bayer 2021-03-29 Build site.
Rmd 22ac14c Philipp Bayer 2021-03-29 wflow_publish(“analysis/total_numbers.Rmd”)
html 7bd9ed5 Philipp Bayer 2021-03-29 Build site.
Rmd 68ed88d Philipp Bayer 2021-03-29 wflow_publish(“analysis/total_numbers.Rmd”)
html 2844fd3 Philipp Bayer 2021-02-09 Add changes, add missing yield
html 6e1fa53 Philipp Bayer 2020-12-11 Build site.
Rmd 4b21bc2 Philipp Bayer 2020-12-11 wflow_publish(“analysis/total_numbers.Rmd”)
html e519122 Philipp Bayer 2020-12-11 Build site.
html 83dc49a Philipp Bayer 2020-12-11 Trace deleted files
html e059725 Philipp Bayer 2020-11-18 Build site.
Rmd a9702e9 Philipp Bayer 2020-11-18 wflow_publish(“analysis/total_numbers.Rmd”)
Rmd dae157b Philipp Bayer 2020-09-24 Update of analysis
html dae157b Philipp Bayer 2020-09-24 Update of analysis
Rmd 1b346e9 Philipp Bayer 2020-09-22 More data!
html 1b346e9 Philipp Bayer 2020-09-22 More data!
html 0777e1d Philipp Bayer 2020-09-22 update
Rmd 3adcc5a Philipp Bayer 2020-09-22 wflow_git_commit(all = TRUE)
html 3adcc5a Philipp Bayer 2020-09-22 wflow_git_commit(all = TRUE)
Rmd 50f995b Philipp Bayer 2020-09-21 Add more analysis
html 50f995b Philipp Bayer 2020-09-21 Add more analysis
html e6e9b9b Philipp Bayer 2020-09-21 Build site.
Rmd c1fbbf9 Philipp Bayer 2020-09-21 wflow_publish("analysis/*")
Rmd c71005a Philipp Bayer 2020-09-18 lots changes
html c71005a Philipp Bayer 2020-09-18 lots changes
html 7d33bac Philipp Bayer 2020-09-18 Build site.
Rmd 695db1e Philipp Bayer 2020-09-18 wflow_publish(c(“analysis/eda.Rmd”, “analysis/first-analysis.Rmd”,

This is the same analysis as first-analysis, but with total numbers, not percentages genes lost

knitr::opts_chunk$set(warning = FALSE, message = FALSE) 
library(tidyverse)
library(patchwork)
library(ggsci)
library(dabestr)
library(dabestr)
library(cowplot)
library(ggsignif)
library(ggforce)

theme_set(theme_cowplot())

Introduction

npg_col = pal_npg("nrc")(9)
col_list <- c(`Wild`=npg_col[8],
   Landrace = npg_col[3],
  `Old cultivar`=npg_col[2],
  `Modern cultivar`=npg_col[4])

pav_table <- read_tsv('./data/soybean_pan_pav.matrix_gene.txt.gz')

NBS part

Let’s pull the NBS genes from the table

nbs <- read_tsv('./data/Lee.NBS.candidates.lst', col_names = c('Name', 'Class'))
nbs
# A tibble: 486 x 2
   Name                   Class
   <chr>                  <chr>
 1 UWASoyPan00953.t1      CN   
 2 GlymaLee.13G222900.1.p CN   
 3 GlymaLee.18G227000.1.p CN   
 4 GlymaLee.18G080600.1.p CN   
 5 GlymaLee.20G036200.1.p CN   
 6 UWASoyPan01876.t1      CN   
 7 UWASoyPan04211.t1      CN   
 8 GlymaLee.19G105400.1.p CN   
 9 GlymaLee.18G085100.1.p CN   
10 GlymaLee.11G142600.1.p CN   
# ... with 476 more rows
# have to remove the .t1s 
nbs$Name <- gsub('.t1','', nbs$Name)
nbs_pav_table <- pav_table %>% filter(Individual %in% nbs$Name)
write_delim(nbs_pav_table, 'data/NBS_PAV.txt.gz', delim='\t')

Are NLR genes more likely to be lost?

not_nbs_pav_table <- pav_table %>% filter(! Individual %in% nbs$Name)
not_nbs_pav_table
# A tibble: 50,928 x 1,111
   Individual    `AB-01` `AB-02` `BR-01` `BR-02` `BR-03` `BR-04` `BR-05` `BR-06`
   <chr>           <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1 GlymaLee.01G~       1       1       1       1       1       1       1       1
 2 GlymaLee.01G~       1       1       1       1       1       1       1       1
 3 GlymaLee.01G~       1       1       1       1       1       1       1       1
 4 GlymaLee.01G~       1       1       1       1       1       1       1       1
 5 GlymaLee.01G~       1       1       1       1       1       1       1       1
 6 GlymaLee.01G~       1       1       1       1       1       1       1       1
 7 GlymaLee.01G~       1       1       1       1       1       1       1       1
 8 GlymaLee.01G~       1       1       1       1       1       1       1       1
 9 GlymaLee.01G~       1       1       1       1       1       1       1       1
10 GlymaLee.01G~       1       1       1       1       1       1       1       1
# ... with 50,918 more rows, and 1,102 more variables: BR-07 <dbl>,
#   BR-08 <dbl>, BR-09 <dbl>, BR-10 <dbl>, BR-11 <dbl>, BR-12 <dbl>,
#   BR-13 <dbl>, BR-14 <dbl>, BR-15 <dbl>, BR-16 <dbl>, BR-17 <dbl>,
#   BR-18 <dbl>, BR-20 <dbl>, BR-23 <dbl>, BR-24 <dbl>, BR-29 <dbl>,
#   BR-30 <dbl>, BR-32 <dbl>, DT2000 <dbl>, ESS <dbl>, For <dbl>, HN001 <dbl>,
#   HN002 <dbl>, HN003 <dbl>, HN004 <dbl>, HN005 <dbl>, HN006 <dbl>,
#   HN007 <dbl>, HN008 <dbl>, HN009 <dbl>, HN010 <dbl>, HN011 <dbl>, ...

Chisquare test - I need the number of non-NBS genes core + variable, and the number of NBS genes core + variable

not_nbs_pav_table$Individual <- NULL
var <- 0
core <- 0
for (row in 1:nrow(not_nbs_pav_table)) {
  if (0 %in% not_nbs_pav_table[row,]) {
    var <- var + 1
  } else {
    core <- core + 1
  }

}

There are var variable genes in the non-NBS genes, and core core genes in the non-NBS genes.

backup <- nbs_pav_table
nbs_pav_table$Individual <- NULL
nbsvar <- 0
nbscore <- 0
for (row in 1:nrow(nbs_pav_table)) {
  if (0 %in% nbs_pav_table[row,]) {
    nbsvar <- nbsvar + 1
  } else {
    nbscore <- nbscore + 1
  }

}
nbs_pav_table <- backup

There are 166 variable genes in the NBS genes, and 320 core genes in the NBS genes.

test_df <- data.frame(nonnbs=c(var, core), nbs=c(nbsvar, nbscore))
test_df
  nonnbs nbs
1   6594 166
2  44334 320
chisq.test(test_df)

    Pearson's Chi-squared test with Yates' continuity correction

data:  test_df
X-squared = 187.77, df = 1, p-value < 2.2e-16

OK, so there’s a statistically significant association between variable genes and NBS genes.

test_df$nbs <- test_df$nbs/sum(test_df$nbs)
test_df$nonnbs <- test_df$nonnbs/sum(test_df$nonnbs)
test_df
     nonnbs       nbs
1 0.1294769 0.3415638
2 0.8705231 0.6584362

The same with RLPs

rlp <- read_tsv('./data/Lee.RLP.candidates.lst', col_names = c('Name', 'Class', 'Subtype'))
# have to remove the .t1s 
rlp$Name <- gsub('.t1','', rlp$Name)
rlp_pav_table <- pav_table %>% filter(Individual %in% rlp$Name)
backup <- rlp_pav_table
rlp_pav_table$Individual <- NULL
rlpvar <- 0
rlpcore <- 0
for (row in 1:nrow(rlp_pav_table)) {
  if (0 %in% rlp_pav_table[row,]) {
    rlpvar <- rlpvar + 1
  } else {
    rlpcore <- rlpcore + 1
  }

}
rlp_pav_table <- backup

There are 55 variable genes in the RLP genes, and 125 core genes in the RLP genes.

test_df <- data.frame(nonrlp=c(var, core), rlp=c(rlpvar, rlpcore))
test_df
  nonrlp rlp
1   6594  55
2  44334 125
chisq.test(test_df)

    Pearson's Chi-squared test with Yates' continuity correction

data:  test_df
X-squared = 47.594, df = 1, p-value = 5.242e-12
test_df$rlp <- test_df$rlp/sum(test_df$rlp)
test_df$nonrlp <- test_df$nonrlp/sum(test_df$nonrlp)
test_df
     nonrlp       rlp
1 0.1294769 0.3055556
2 0.8705231 0.6944444

The same with RLKs

rlk <- read_tsv('./data/Lee.RLK.candidates.lst', col_names = c('Name', 'Class', 'Subtype'))
rlk
# A tibble: 1,173 x 3
   Name                   Class Subtype       
   <chr>                  <chr> <chr>         
 1 GlymaLee.01G001800.1.p RLK   lrr           
 2 GlymaLee.01G004900.1.p RLK   lrr           
 3 GlymaLee.01G007300.1.p RLK   lrr           
 4 GlymaLee.01G007400.1.p RLK   lrr           
 5 GlymaLee.01G012800.1.p RLK   other_receptor
 6 GlymaLee.01G018800.1.p RLK   lrr           
 7 GlymaLee.01G021100.1.p RLK   other_receptor
 8 GlymaLee.01G025500.1.p RLK   lysm          
 9 GlymaLee.01G026500.1.p RLK   other_receptor
10 GlymaLee.01G027000.1.p RLK   lrr           
# ... with 1,163 more rows
# have to remove the .t1s 
rlk$Name <- gsub('.t1','', rlk$Name)
rlk_pav_table <- pav_table %>% filter(Individual %in% rlk$Name)
backup <- rlk_pav_table
rlk_pav_table$Individual <- NULL
rlkvar <- 0
rlkcore <- 0
for (row in 1:nrow(rlk_pav_table)) {
  if (0 %in% rlk_pav_table[row,]) {
    rlkvar <- rlkvar + 1
  } else {
    rlkcore <- rlkcore + 1
  }

}
rlk_pav_table <- backup

There are 98 variable genes in the RLK genes, and 1075 core genes in the RLK genes.

test_df <- data.frame(nonrlk=c(var, core), rlk=c(rlkvar, rlkcore))
chisq.test(test_df)

    Pearson's Chi-squared test with Yates' continuity correction

data:  test_df
X-squared = 21.199, df = 1, p-value = 4.14e-06
test_df$rlk <- test_df$rlk/sum(test_df$rlk)
test_df$nonrlk <- test_df$nonrlk/sum(test_df$nonrlk)
test_df
     nonrlk        rlk
1 0.1294769 0.08354646
2 0.8705231 0.91645354

The correct test - non-RGA vs NLR, RLP, RLK

The problem in the above calculations is that I still include the counts of NLR in the non-RPK genes, for example. Do the chisquare-tests come out differently if I remove the NLR/RLK/RLP counts from the non-RGA counts?

truecore <-  core - (rlkcore + rlpcore + nbscore)
truevar <- var - (rlkvar + rlpvar + nbsvar)
rlk_test_df <- data.frame(var = c(rlkvar, truevar), core = c(rlkcore, truecore))
chisq.test(rlk_test_df)

    Pearson's Chi-squared test with Yates' continuity correction

data:  rlk_test_df
X-squared = 19.892, df = 1, p-value = 8.193e-06
rlp_test_df <- data.frame(var = c(rlpvar, truevar), core = c(rlpcore, truecore))
chisq.test(rlp_test_df)

    Pearson's Chi-squared test with Yates' continuity correction

data:  rlp_test_df
X-squared = 49.017, df = 1, p-value = 2.538e-12
nlr_test_df <- data.frame(var = c(nbsvar, truevar), core = c(nbscore, truecore))
chisq.test(nlr_test_df)

    Pearson's Chi-squared test with Yates' continuity correction

data:  nlr_test_df
X-squared = 192.59, df = 1, p-value < 2.2e-16

OK - they’re still p < 0.05

A quick check since we ran three tests:

p.adjust(c(8.193e-06,  2.538e-12,  2.2e-16), method = 'bonferroni')
[1] 2.4579e-05 7.6140e-12 6.6000e-16

Still p < 0.05 in all cases

Modern vs Old gene loss

groups <- read_csv('./data/Table_of_cultivar_groups.csv')
groups
# A tibble: 1,069 x 3
   `Data-storage-ID` `PI-ID`   `Group in violin table`
   <chr>             <chr>     <chr>                  
 1 SRR1533284        PI416890  landrace               
 2 SRR1533282        PI323576  landrace               
 3 SRR1533292        PI157421  landrace               
 4 SRR1533216        PI594615  landrace               
 5 SRR1533239        PI603336  landrace               
 6 USB-108           PI165675  landrace               
 7 HNEX-13           PI253665D landrace               
 8 USB-382           PI603549  landrace               
 9 SRR1533236        PI587552  landrace               
10 SRR1533332        PI567293  landrace               
# ... with 1,059 more rows

Which genes are present more or less in old / modern cultivars?

big_norm_count <- tibble(
  name = character(),
  landrace = numeric(),
  Modern_cultivar = numeric(),
  Old_cultivar = numeric(),
  `Wild` = numeric()
)

groups_list <- split(groups$`Group in violin table`, groups$`Data-storage-ID`)

for( i in 1:nrow(nbs_pav_table) ) {
  this_gene <- nbs_pav_table[i,]
  groups_count <- list()
  total_groups_count <- list()
  for (x in seq_along(nbs_pav_table)){
    if ( x == 1) next
    thisind <- colnames(nbs_pav_table)[x]
    thisind_group <- groups_list[[thisind]]
    if( is.null(thisind_group) ) next # no group for this individual
    pavs <- this_gene[[x]] # either 1 or 0
    
    if ( thisind_group %in% names(groups_count)) {
      # count the number of present genes
      groups_count[[thisind_group]] <- groups_count[[thisind_group]] + pavs
      # count the total number of individuals for this group
      total_groups_count[[thisind_group]] <- total_groups_count[[thisind_group]] + 1
    } else {
      groups_count[[thisind_group]] <-  pavs
      total_groups_count[[thisind_group]] <- 1
    }
  }
  
  norm_group_count <- list()
  for (m in seq_along(groups_count)) {
    thisname <- names(groups_count)[m]
    norm_group_count[[thisname]] <- groups_count[[thisname]] / total_groups_count[[thisname]] * 100
  }
  norm_group_count$Individual <- this_gene$Individual
  big_norm_count <- rbind(big_norm_count, as_tibble(norm_group_count))

}

# wow, I DO write R like Python

Let’s pull out the genes that are variable in any group

var_norm_count <- big_norm_count %>% 
  filter(landrace != 100 & 
           Modern_cultivar != 100 & 
           Old_cultivar != 100 & 
           `Wild-type` != 100)

var_norm_count <- left_join(var_norm_count, nbs, by=c('Individual'='Name'))
var_norm_count$Mod_minus_Old <- var_norm_count$Modern_cultivar - var_norm_count$Old_cultivar

The top 20 genes reduced the most in modern cultivars compared with old cultivars:

var_norm_count %>% 
  arrange(Mod_minus_Old) %>% 
  head(20) %>% 
  select(Individual, `Wild-type`, landrace, Old_cultivar, Modern_cultivar, Mod_minus_Old, Class) %>% 
  knitr::kable()
Individual Wild-type landrace Old_cultivar Modern_cultivar Mod_minus_Old Class
UWASoyPan03261 74.52229 62.102351 60.86957 26.573427 -34.296139 TX
UWASoyPan00953 83.43949 33.056708 43.47826 11.188811 -32.289450 CN
UWASoyPan00725 89.17197 92.392808 82.60870 51.748252 -30.860444 TX
UWASoyPan00316 91.08280 90.594744 80.43478 50.349650 -30.085132 NBS
UWASoyPan01530 80.89172 45.089903 47.82609 20.979021 -26.847066 NL
UWASoyPan00975 42.67516 19.778700 32.60870 7.692308 -24.916388 TX
UWASoyPan00155 85.35032 77.316736 63.04348 42.657343 -20.386136 NBS
UWASoyPan00772 54.77707 18.948824 30.43478 11.888112 -18.546671 NBS
UWASoyPan03402 62.42038 42.738589 36.95652 18.881119 -18.075403 NBS
UWASoyPan01320 45.85987 30.152144 23.91304 6.993007 -16.920036 NBS
UWASoyPan02799 65.60510 30.843707 19.56522 2.797203 -16.768015 NBS
GlymaLee.03G045500.1.p 82.16561 58.921162 67.39130 51.748252 -15.643053 OTHER
UWASoyPan01253 73.24841 24.481328 17.39130 3.496504 -13.894801 NBS
UWASoyPan03340 18.47134 26.279391 19.56522 6.293706 -13.271511 TX
GlymaLee.06G230600.1.p 78.34395 57.399723 56.52174 44.055944 -12.465795 TX
GlymaLee.06G228900.1.p 96.17834 74.688797 91.30435 80.419580 -10.884767 TX
UWASoyPan00251 60.50955 20.470263 21.73913 11.188811 -10.550319 NL
GlymaLee.03G045700.1.p 75.79618 67.496542 65.21739 55.244755 -9.972636 OTHER
GlymaLee.06G228600.1.p 90.44586 79.391425 82.60870 72.727273 -9.881423 TX
UWASoyPan00670 30.57325 6.915629 10.86957 2.097902 -8.771663 TX

So these are the NLR genes selected against during soybean breeding.

Let’s look at those without the TX ones:

var_norm_count %>% 
  arrange(Mod_minus_Old) %>% 
  select(Individual, `Wild-type`, landrace, Old_cultivar, Modern_cultivar, Mod_minus_Old, Class) %>% 
  filter(Class != 'TX') %>% 
  head(20) %>% 
  knitr::kable()
Individual Wild-type landrace Old_cultivar Modern_cultivar Mod_minus_Old Class
UWASoyPan00953 83.439490 33.0567082 43.478261 11.188811 -32.289450 CN
UWASoyPan00316 91.082802 90.5947441 80.434783 50.349650 -30.085132 NBS
UWASoyPan01530 80.891720 45.0899032 47.826087 20.979021 -26.847066 NL
UWASoyPan00155 85.350319 77.3167358 63.043478 42.657343 -20.386136 NBS
UWASoyPan00772 54.777070 18.9488243 30.434783 11.888112 -18.546671 NBS
UWASoyPan03402 62.420382 42.7385892 36.956522 18.881119 -18.075403 NBS
UWASoyPan01320 45.859873 30.1521438 23.913044 6.993007 -16.920036 NBS
UWASoyPan02799 65.605096 30.8437068 19.565217 2.797203 -16.768015 NBS
GlymaLee.03G045500.1.p 82.165605 58.9211618 67.391304 51.748252 -15.643053 OTHER
UWASoyPan01253 73.248408 24.4813278 17.391304 3.496504 -13.894801 NBS
UWASoyPan00251 60.509554 20.4702628 21.739130 11.188811 -10.550319 NL
GlymaLee.03G045700.1.p 75.796178 67.4965422 65.217391 55.244755 -9.972636 OTHER
UWASoyPan03194 45.222930 10.7883817 10.869565 2.097902 -8.771663 NBS
GlymaLee.03G045900.1.p 74.522293 65.5601660 63.043478 54.545454 -8.498024 OTHER
UWASoyPan00326 40.764331 19.0871369 17.391304 9.790210 -7.601095 CN
UWASoyPan02496 26.114650 14.6611342 13.043478 6.993007 -6.050471 CN
UWASoyPan01217 57.961783 14.3845090 10.869565 5.594406 -5.275160 NBS
GlymaLee.06G229300.1.p 86.624204 50.3457815 65.217391 60.839161 -4.378230 TN
UWASoyPan04757 6.369427 0.9681881 4.347826 0.000000 -4.347826 NBS
GlymaLee.03G042000.1.p 99.363057 88.3817427 91.304348 88.111888 -3.192460 CNL

Let’s plot:

var_norm_count %>% 
  arrange(Mod_minus_Old) %>% 
  head(20) %>% 
  select(Individual, `Wild-type`, landrace, Old_cultivar, Modern_cultivar, Class) %>% 
  pivot_longer(!c(Individual, Class)) %>% 
  mutate(name = str_replace_all(name, 'landrace', 'Landrace')) %>%
  mutate(name = str_replace_all(name, 'Wild-type', 'Wild')) %>% 
  mutate(name = str_replace_all(name, 'Old_cultivar', 'Old cultivar')) %>% 
  mutate(name = str_replace_all(name, 'Modern_cultivar', 'Modern cultivar')) %>%  
  ggplot(aes(x=factor(name, levels=c('Wild', 'Landrace', 'Old cultivar', 'Modern cultivar')), y=value, group=Individual, color=Class)) + 
  geom_line(size=1.5) +
  xlab('Group') +
  ylab('Percentage presence of gene in group') +
  scale_color_brewer(palette = 'Dark2')

The top 20 genes increased the most in modern cultivars:

var_norm_count %>% 
  arrange(desc(Mod_minus_Old)) %>% 
  head(20) %>% 
  select(Individual, `Wild-type`, landrace, Old_cultivar, Modern_cultivar, Mod_minus_Old, Class) %>% 
  knitr::kable()
Individual Wild-type landrace Old_cultivar Modern_cultivar Mod_minus_Old Class
GlymaLee.01G030900.1.p 81.52866 50.345782 34.782609 71.328671 36.546063 NL
GlymaLee.15G199500.1.p 85.98726 73.582296 69.565217 86.713287 17.148069 CN
GlymaLee.15G199200.1.p 92.99363 74.827109 73.913044 90.909091 16.996047 CNL
GlymaLee.06G232800.1.p 40.12739 47.579530 56.521739 73.426573 16.904834 NBS
GlymaLee.01G088400.1.p 49.68153 89.488243 82.608696 97.202797 14.594102 TNL
UWASoyPan05312 30.57325 8.575380 10.869565 23.776224 12.906659 NBS
UWASoyPan00005 36.94268 13.831259 8.695652 20.279720 11.584068 NBS
UWASoyPan01876 43.31210 15.629322 8.695652 20.279720 11.584068 CN
GlymaLee.10G034600.1.p 91.08280 91.839557 84.782609 95.104895 10.322286 NL
UWASoyPan01330 29.29936 25.172891 15.217391 23.776224 8.558832 NBS
UWASoyPan00202 50.31847 35.408022 26.086956 32.167832 6.080876 NBS
UWASoyPan00427 97.45223 84.232365 73.913044 79.720280 5.807236 NBS
GlymaLee.15G199300.1.p 96.17834 88.243430 93.478261 98.601399 5.123138 NL
GlymaLee.03G070700.1.p 65.60510 89.903181 89.130435 93.706294 4.575859 TNL
GlymaLee.06G229100.1.p 85.98726 38.174274 50.000000 53.146853 3.146853 TX
GlymaLee.07G070200.1.p 78.98089 91.286307 91.304348 94.405594 3.101247 NBS
UWASoyPan01418 68.78981 66.251729 71.739130 74.825175 3.086044 TX
GlymaLee.03G070600.1.p 66.87898 90.179806 91.304348 93.706294 2.401946 TNL
GlymaLee.16G175200.1.p 95.54140 96.127248 95.652174 97.902098 2.249924 TNL
UWASoyPan04967 35.03185 6.224066 2.173913 4.195804 2.021891 TX

As these genes have relatively high percentages in WT they must have been re-introduced by using WT in the breeding process.

Let’s plot those too:

var_norm_count %>% 
  arrange(desc(Mod_minus_Old)) %>% 
  head(20) %>% 
  select(Individual, `Wild-type`, landrace, Old_cultivar, Modern_cultivar, Class) %>% 
  pivot_longer(!c(Individual, Class)) %>% 
  mutate(name = str_replace_all(name, 'landrace', 'Landrace')) %>% 
  mutate(name = str_replace_all(name, 'Old_cultivar', 'Old cultivar')) %>% 
  mutate(name = str_replace_all(name, 'Modern_cultivar', 'Modern cultivar')) %>%  
  mutate(name = str_replace_all(name, 'Wild-type', 'Wild')) %>% 
  ggplot(aes(x=factor(name, levels=c('Wild', 'Landrace', 'Old cultivar', 'Modern cultivar')), y=value, group=Individual, color=Class)) + 
  geom_line(size=1.5) +
  xlab('Group') +
  ylab('Percentage presence of gene in group') +
  scale_color_brewer(palette = 'Dark2')

Presence plotting per individual

names <- c()
presences <- c()

for (i in seq_along(nbs_pav_table)){
  if ( i == 1) next
  thisind <- colnames(nbs_pav_table)[i]
  pavs <- nbs_pav_table[[i]]
  presents <- sum(pavs)
  names <- c(names, thisind)
  presences <- c(presences, presents)
}
nbs_res_tibb <- new_tibble(list(names = names, presences = presences))

OK what do these presence percentages look like?

ggplot(data=nbs_res_tibb, aes(x=presences)) + geom_histogram(bins=25) 

On average, 446.0027027 of NBS genes are present in each individual.

Now let’s join the table of presences to the four different types so we can group these numbers.

nbs_joined_groups <- left_join(nbs_res_tibb, groups, by = c('names'='Data-storage-ID'))
nbs_joined_groups$`Group in violin table` <- gsub('landrace', 'Landrace', nbs_joined_groups$`Group in violin table`)
nbs_joined_groups$`Group in violin table` <- gsub('Modern_cultivar', 'Modern cultivar', nbs_joined_groups$`Group in violin table`)
nbs_joined_groups$`Group in violin table` <- gsub('Old_cultivar', 'Old cultivar', nbs_joined_groups$`Group in violin table`)
nbs_joined_groups$`Group in violin table` <- gsub('Wild-type', 'Wild', nbs_joined_groups$`Group in violin table`)

nbs_joined_groups$`Group in violin table` <- factor(nbs_joined_groups$`Group in violin table`, levels=c(NA, 'Wild', 'Landrace', 'Old cultivar', 'Modern cultivar'))
nbs_vio <- nbs_joined_groups %>% filter(!is.na(`Group in violin table`)) %>% 
  ggplot(aes(y=presences, x=`Group in violin table`, fill=`Group in violin table`)) + 
  geom_violin(draw_quantiles = c(0.5)) +
  geom_sina(alpha=0.5) +
  geom_smooth(aes(group=1), method='glm') +
  scale_fill_manual(values=col_list) +
  guides(fill = FALSE)

nbs_vio

nbs_joined_groups %>% filter(`Group in violin table` != 'NA') %>% 
  ggplot(aes(y=presences, x=`Group in violin table`, fill=`Group in violin table`)) + 
  geom_smooth(aes(group=1), method='lm', se = FALSE) +
  geom_jitter() +
  scale_fill_manual(values=col_list)+
  guides(fill = FALSE)

nbs_joined_groups %>% filter(!is.na(`PI-ID`)) %>% 
  group_by(`Group in violin table`) %>% 
  summarise(min_present = min(presences),
            max_present = max(presences),
            mean_present = mean(presences),
            median_present = median(presences),
            std_present = sd(presences)) %>% 
  knitr::kable()
Group in violin table min_present max_present mean_present median_present std_present
Wild 435 473 452.9490 453 7.170806
Landrace 429 465 444.8907 445 5.011672
Old cultivar 433 456 444.8696 445 5.200892
Modern cultivar 431 455 442.3147 442 4.047986

RLK part

Let’s do the same plot with RLKs

names <- c()
presences <- c()

for (i in seq_along(rlk_pav_table)){
  if ( i == 1) next
  thisind <- colnames(rlk_pav_table)[i]
  pavs <- rlk_pav_table[[i]]
  presents <- sum(pavs)
  names <- c(names, thisind)
  presences <- c(presences, presents)
}
rlk_res_tibb <- new_tibble(list(names = names, presences = presences))
rlk_res_tibb
# A tibble: 1,110 x 2
   names presences
   <chr>     <dbl>
 1 AB-01      1167
 2 AB-02      1162
 3 BR-01      1166
 4 BR-02      1165
 5 BR-03      1166
 6 BR-04      1167
 7 BR-05      1164
 8 BR-06      1167
 9 BR-07      1165
10 BR-08      1167
# ... with 1,100 more rows

OK what do these presence percentages look like?

ggplot(data=rlk_res_tibb, aes(x=presences)) + geom_histogram(bins=25) 

On average, 1163.5036036% of NBS genes are present in each individual.

Now let’s join the table of presences to the four different types so we can group these numbers.

rlk_joined_groups <- left_join(rlk_res_tibb, groups, by = c('names'='Data-storage-ID'))
rlk_joined_groups$`Group in violin table` <- gsub('landrace', 'Landrace', rlk_joined_groups$`Group in violin table`)
rlk_joined_groups$`Group in violin table` <- gsub('Modern_cultivar', 'Modern cultivar', rlk_joined_groups$`Group in violin table`)
rlk_joined_groups$`Group in violin table` <- gsub('Old_cultivar', 'Old cultivar', rlk_joined_groups$`Group in violin table`)
rlk_joined_groups$`Group in violin table` <- gsub('Wild-type', 'Wild', rlk_joined_groups$`Group in violin table`)


rlk_joined_groups$`Group in violin table` <- factor(rlk_joined_groups$`Group in violin table`, levels=c(NA, 'Wild', 'Landrace', 'Old cultivar', 'Modern cultivar'))
rlk_vio <- rlk_joined_groups %>% filter(`Group in violin table` != 'NA') %>% 
  ggplot(aes(y=presences, x=`Group in violin table`, fill=`Group in violin table`)) + 
  geom_violin(draw_quantiles = c(0.5)) +
  geom_sina(alpha=0.5) +
  geom_smooth(aes(group=1), method='lm', se = FALSE) +
  scale_fill_manual(values=col_list)+
  guides(fill = FALSE)
rlk_vio

rlk_joined_groups %>% filter(!is.na(`PI-ID`)) %>% 
  group_by(`Group in violin table`) %>% 
  summarise(min_present = min(presences),
            max_present = max(presences),
            mean_present = mean(presences),
            median_present = median(presences),
            std_present = sd(presences)) %>% 
  knitr::kable()
Group in violin table min_present max_present mean_present median_present std_present
Wild 1154 1170 1164.357 1165 2.554565
Landrace 1157 1168 1163.217 1163 1.499264
Old cultivar 1161 1166 1163.587 1164 1.407537
Modern cultivar 1159 1168 1163.490 1163 1.472122

RLP part

And now with RLPs

names <- c()
presences <- c()

for (i in seq_along(rlp_pav_table)){
  if ( i == 1) next
  thisind <- colnames(rlp_pav_table)[i]
  pavs <- rlp_pav_table[[i]]
  presents <- sum(pavs)
  names <- c(names, thisind)
  presences <- c(presences, presents)
}
rlp_res_tibb <- new_tibble(list(names = names, presences = presences))

OK what do these presence percentages look like?

ggplot(data=rlp_res_tibb, aes(x=presences)) + geom_histogram(bins=25) 

On average, 172.1693694% of NBS genes are present in each individual.

Now let’s join the table of presences to the four different types so we can group these numbers.

rlp_joined_groups <- left_join(rlp_res_tibb, groups, by = c('names'='Data-storage-ID'))
rlp_joined_groups$`Group in violin table` <- gsub('landrace', 'Landrace', rlp_joined_groups$`Group in violin table`)
rlp_joined_groups$`Group in violin table` <- gsub('Modern_cultivar', 'Modern cultivar', rlp_joined_groups$`Group in violin table`)
rlp_joined_groups$`Group in violin table` <- gsub('Old_cultivar', 'Old cultivar', rlp_joined_groups$`Group in violin table`)
rlp_joined_groups$`Group in violin table` <- gsub('Wild-type', 'Wild', rlp_joined_groups$`Group in violin table`)

rlp_joined_groups$`Group in violin table` <- factor(rlp_joined_groups$`Group in violin table`, levels=c(NA, 'Wild', 'Landrace', 'Old cultivar', 'Modern cultivar'))
rlp_vio <- rlp_joined_groups %>% filter(`Group in violin table` != 'NA') %>% 
  ggplot(aes(y=presences, x=`Group in violin table`, fill=`Group in violin table`)) + 
  geom_violin(draw_quantiles = c(0.5)) +
  geom_sina(alpha=0.5) +
    geom_smooth(aes(group=1), method='lm', se = FALSE) +
  scale_fill_manual(values=col_list)+
  guides(fill = FALSE)
rlp_vio

rlp_joined_groups %>% filter(`Group in violin table` != 'NA') %>% 
  ggplot(aes(y=presences, x=`Group in violin table`, fill=`Group in violin table`)) + 
  geom_jitter() +
  #geom_sina(alpha=0.5) +
  scale_fill_manual(values=col_list)+
  guides(fill = FALSE) +
  ylim(c(87, 100))

rlp_joined_groups %>% filter(!is.na(`PI-ID`)) %>% 
  group_by(`Group in violin table`) %>% 
  summarise(min_present = min(presences),
            max_present = max(presences),
            mean_present = mean(presences),
            median_present = median(presences),
            std_present = sd(presences)) %>% 
  knitr::kable()
Group in violin table min_present max_present mean_present median_present std_present
Wild 168 177 173.4140 173 1.617392
Landrace 162 177 171.9668 172 1.661526
Old cultivar 169 176 171.8261 172 1.623499
Modern cultivar 169 175 171.8042 172 1.290587

Plotting together

nbs_vio + rlk_vio + rlp_vio

Stats - Dabayes

I want to know whether the groups are statistically significantly different. First let’s use dabestr

NBS

Let’s run dabestr first:

nbs_multi.two.group.unpaired <- 
  nbs_joined_groups %>% filter(!is.na(`PI-ID`)) %>% 
  dabest(`Group in violin table`, presences, 
         idx = list(c("Wild", "Landrace"),
                    c('Old cultivar', 'Modern cultivar')),
         paired = FALSE)
nbs_multi.two.group.unpaired
dabestr (Data Analysis with Bootstrap Estimation in R) v0.3.0
=============================================================

Good evening!
The current time is 18:10 PM on Wednesday March 30, 2022.

Dataset    :  .
The first five rows are:
# A tibble: 5 x 4
  names  presences `PI-ID`  `Group in violin table`
  <chr>      <dbl> <chr>    <fct>                  
1 AB-01        445 PI458020 Landrace               
2 AB-02        454 PI603713 Landrace               
3 DT2000       447 PI635999 Modern cultivar        
4 For          448 PI548645 Modern cultivar        
5 HN001        448 PI518664 Modern cultivar        

X Variable :  Group in violin table
Y Variable :  presences

Effect sizes(s) will be computed for:
  1. Landrace minus Wild
  2. Modern cultivar minus Old cultivar
nbs_multi.two.group.unpaired.meandiff <- mean_diff(nbs_multi.two.group.unpaired)
nbs_multi.two.group.unpaired.meandiff
dabestr (Data Analysis with Bootstrap Estimation in R) v0.3.0
=============================================================

Good evening!
The current time is 18:10 PM on Wednesday March 30, 2022.

Dataset    :  .
X Variable :  Group in violin table
Y Variable :  presences

Unpaired mean difference of Landrace (n = 723) minus Wild (n = 157)
 -8.06 [95CI  -9.24; -6.86]

Unpaired mean difference of Modern cultivar (n = 143) minus Old cultivar (n = 46)
 -2.55 [95CI  -4.25; -0.97]


5000 bootstrap resamples.
All confidence intervals are bias-corrected and accelerated.
plot(nbs_multi.two.group.unpaired.meandiff, color.column=`Group in violin table`,
     rawplot.ylabel = 'Presence (%)', show.legend=FALSE)

RLK

rlk_multi.two.group.unpaired <- 
  rlk_joined_groups %>% filter(!is.na(`PI-ID`)) %>% 
  dabest(`Group in violin table`, presences, 
         idx = list(c("Wild", "Landrace"),
                    c('Old cultivar', 'Modern cultivar')),
         paired = FALSE)
rlk_multi.two.group.unpaired
dabestr (Data Analysis with Bootstrap Estimation in R) v0.3.0
=============================================================

Good evening!
The current time is 18:10 PM on Wednesday March 30, 2022.

Dataset    :  .
The first five rows are:
# A tibble: 5 x 4
  names  presences `PI-ID`  `Group in violin table`
  <chr>      <dbl> <chr>    <fct>                  
1 AB-01       1167 PI458020 Landrace               
2 AB-02       1162 PI603713 Landrace               
3 DT2000      1165 PI635999 Modern cultivar        
4 For         1163 PI548645 Modern cultivar        
5 HN001       1163 PI518664 Modern cultivar        

X Variable :  Group in violin table
Y Variable :  presences

Effect sizes(s) will be computed for:
  1. Landrace minus Wild
  2. Modern cultivar minus Old cultivar
rlk_multi.two.group.unpaired.meandiff <- mean_diff(rlk_multi.two.group.unpaired)
rlk_multi.two.group.unpaired.meandiff
dabestr (Data Analysis with Bootstrap Estimation in R) v0.3.0
=============================================================

Good evening!
The current time is 18:10 PM on Wednesday March 30, 2022.

Dataset    :  .
X Variable :  Group in violin table
Y Variable :  presences

Unpaired mean difference of Landrace (n = 723) minus Wild (n = 157)
 -1.14 [95CI  -1.55; -0.717]

Unpaired mean difference of Modern cultivar (n = 143) minus Old cultivar (n = 46)
 -0.0974 [95CI  -0.562; 0.362]


5000 bootstrap resamples.
All confidence intervals are bias-corrected and accelerated.
plot(rlk_multi.two.group.unpaired.meandiff, color.column=`Group in violin table`,
     rawplot.ylabel = 'Presence (%)', show.legend=FALSE)

No difference between old and modern cultivars!

RLP

rlp_multi.two.group.unpaired <- 
  rlp_joined_groups %>% filter(!is.na(`PI-ID`)) %>% 
  dabest(`Group in violin table`, presences, 
         idx = list(c("Wild", "Landrace"),
                    c('Old cultivar', 'Modern cultivar')),
         paired = FALSE)
rlp_multi.two.group.unpaired
dabestr (Data Analysis with Bootstrap Estimation in R) v0.3.0
=============================================================

Good evening!
The current time is 18:10 PM on Wednesday March 30, 2022.

Dataset    :  .
The first five rows are:
# A tibble: 5 x 4
  names  presences `PI-ID`  `Group in violin table`
  <chr>      <dbl> <chr>    <fct>                  
1 AB-01        171 PI458020 Landrace               
2 AB-02        172 PI603713 Landrace               
3 DT2000       171 PI635999 Modern cultivar        
4 For          171 PI548645 Modern cultivar        
5 HN001        172 PI518664 Modern cultivar        

X Variable :  Group in violin table
Y Variable :  presences

Effect sizes(s) will be computed for:
  1. Landrace minus Wild
  2. Modern cultivar minus Old cultivar
rlp_multi.two.group.unpaired.meandiff <- mean_diff(rlp_multi.two.group.unpaired)
rlp_multi.two.group.unpaired.meandiff
dabestr (Data Analysis with Bootstrap Estimation in R) v0.3.0
=============================================================

Good evening!
The current time is 18:10 PM on Wednesday March 30, 2022.

Dataset    :  .
X Variable :  Group in violin table
Y Variable :  presences

Unpaired mean difference of Landrace (n = 723) minus Wild (n = 157)
 -1.45 [95CI  -1.74; -1.17]

Unpaired mean difference of Modern cultivar (n = 143) minus Old cultivar (n = 46)
 -0.0219 [95CI  -0.53; 0.477]


5000 bootstrap resamples.
All confidence intervals are bias-corrected and accelerated.
plot(rlp_multi.two.group.unpaired.meandiff, color.column=`Group in violin table`,
     rawplot.ylabel = 'Presence (%)', show.legend=FALSE)

Again, no difference between old and modern cultivars!

Stats - classic t-test

NBS

nlr_plot <- nbs_joined_groups %>% 
  filter( !is.na(`PI-ID`) ) %>%
    ggplot(aes(x=`Group in violin table`, y = presences,
               fill = `Group in violin table`)) + 
  geom_boxplot() +
  scale_fill_manual(values = col_list) + 
  theme_minimal_hgrid() +
  theme(axis.text.x = element_text(size=12),
        axis.text.y = element_text(size=12)) +
  geom_signif(comparisons = list(c('Wild', 'Landrace'),
                                 c('Old cultivar', 'Modern cultivar')), 
              map_signif_level = T) +
  guides(fill=FALSE) +
  ylab('Number of NLR genes') +
  xlab('Accession group')
nlr_plot

save(nlr_plot, file = 'output/NLR_boxplot_saved.Rds')

RLP

rlp_joined_groups %>% 
  filter( !is.na(`PI-ID`) ) %>%
    ggplot(aes(x=`Group in violin table`, y = presences,
               fill = `Group in violin table`)) + 
  geom_boxplot() +
  scale_fill_manual(values = col_list) + 
  theme_minimal_hgrid() +
  theme(axis.text.x = element_text(size=12),
        axis.text.y = element_text(size=12)) +
  geom_signif(comparisons = list(c('Wild', 'Landrace'),
                                 c('Old cultivar', 'Modern cultivar')), 
              map_signif_level = T) +
  guides(fill=FALSE) +
  ylab('Number of RLP genes') +
  xlab('Accession group')

RLK

rlk_joined_groups %>% 
  filter( !is.na(`PI-ID`) ) %>%
    ggplot(aes(x=`Group in violin table`, y = presences,
               fill = `Group in violin table`)) + 
  geom_boxplot() +
  scale_fill_manual(values = col_list) + 
  theme_minimal_hgrid() +
  theme(axis.text.x = element_text(size=12),
        axis.text.y = element_text(size=12)) +
  geom_signif(comparisons = list(c('Wild', 'Landrace'),
                                 c('Old cultivar', 'Modern cultivar')), 
              map_signif_level = T) +
  guides(fill=FALSE) +
  ylab('Number of RLK genes') +
  xlab('Accession group')

Ratio of NLR/RLK plot

rlk_nbs_joined_groups <- rlk_joined_groups %>% inner_join(nbs_joined_groups, by=c('names'))
rlk_nbs_joined_groups$ratio <- rlk_nbs_joined_groups$presences.x / rlk_nbs_joined_groups$presences.y # RLK/NLR
rlk_nbs_joined_groups %>% 
  filter( !is.na(`PI-ID.x`) ) %>%
    ggplot(aes(x=`Group in violin table.x`, y = ratio,
               fill = `Group in violin table.x`)) + 
  geom_boxplot() +
  scale_fill_manual(values = col_list) + 
  theme_minimal_hgrid() +
  theme(axis.text.x = element_text(size=12),
        axis.text.y = element_text(size=12)) +
  geom_signif(comparisons = list(c('Wild', 'Landrace'),
                                 c('Old cultivar', 'Modern cultivar')), 
              map_signif_level = T) +
  guides(fill=FALSE) +
  ylab('Number of RLK divided by NLR') +
  xlab('Accession group')

Ratio of NLR/RLP plot

rlp_nbs_joined_groups <- rlp_joined_groups %>% inner_join(nbs_joined_groups, by=c('names'))
rlp_nbs_joined_groups$ratio <- rlp_nbs_joined_groups$presences.x / rlp_nbs_joined_groups$presences.y # RLP/NLR
rlp_nbs_joined_groups %>% 
  filter( !is.na(`PI-ID.x`) ) %>%
    ggplot(aes(x=`Group in violin table.x`, y = ratio,
               fill = `Group in violin table.x`)) + 
  geom_boxplot() +
  scale_fill_manual(values = col_list) + 
  theme_minimal_hgrid() +
  theme(axis.text.x = element_text(size=12),
        axis.text.y = element_text(size=12)) +
  geom_signif(comparisons = list(c('Wild', 'Landrace'),
                                 c('Old cultivar', 'Modern cultivar')), 
              map_signif_level = T) +
  guides(fill=FALSE) +
  ylab('Number of RLP divided by NLR') +
  xlab('Accession group')


sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggforce_0.3.3   ggsignif_0.6.3  cowplot_1.1.1   dabestr_0.3.0  
 [5] magrittr_2.0.1  ggsci_2.9       patchwork_1.1.1 forcats_0.5.1  
 [9] stringr_1.4.0   dplyr_1.0.7     purrr_0.3.4     readr_2.1.2    
[13] tidyr_1.1.4     tibble_3.1.5    ggplot2_3.3.5   tidyverse_1.3.1
[17] workflowr_1.6.2

loaded via a namespace (and not attached):
 [1] nlme_3.1-152       fs_1.5.0           lubridate_1.8.0    bit64_4.0.5       
 [5] RColorBrewer_1.1-2 httr_1.4.2         rprojroot_2.0.2    tools_4.1.0       
 [9] backports_1.2.1    bslib_0.3.1        utf8_1.2.2         R6_2.5.1          
[13] vipor_0.4.5        mgcv_1.8-35        DBI_1.1.1          colorspace_2.0-2  
[17] withr_2.5.0        tidyselect_1.1.1   bit_4.0.4          compiler_4.1.0    
[21] git2r_0.28.0       cli_3.2.0          rvest_1.0.2        xml2_1.3.2        
[25] labeling_0.4.2     sass_0.4.0         scales_1.1.1       digest_0.6.28     
[29] rmarkdown_2.11     pkgconfig_2.0.3    htmltools_0.5.2    dbplyr_2.1.1      
[33] fastmap_1.1.0      highr_0.9          rlang_0.4.12       readxl_1.3.1      
[37] rstudioapi_0.13    jquerylib_0.1.4    farver_2.1.0       generics_0.1.1    
[41] jsonlite_1.7.2     vroom_1.5.7        Matrix_1.3-3       ggbeeswarm_0.6.0  
[45] Rcpp_1.0.7         munsell_0.5.0      fansi_0.5.0        lifecycle_1.0.1   
[49] stringi_1.7.5      whisker_0.4        yaml_2.2.1         MASS_7.3-54       
[53] plyr_1.8.6         grid_4.1.0         parallel_4.1.0     promises_1.2.0.1  
[57] crayon_1.4.1       lattice_0.20-44    splines_4.1.0      haven_2.4.3       
[61] hms_1.1.1          knitr_1.36         pillar_1.6.4       boot_1.3-28       
[65] reprex_2.0.1       glue_1.6.2         evaluate_0.14      modelr_0.1.8      
[69] vctrs_0.3.8        tzdb_0.1.2         tweenr_1.0.2       httpuv_1.6.3      
[73] cellranger_1.1.0   gtable_0.3.0       polyclip_1.10-0    assertthat_0.2.1  
[77] xfun_0.27          broom_0.7.9        later_1.3.0        beeswarm_0.4.0    
[81] ellipsis_0.3.2