This study investigates the impact of DNA extraction method on the microbial composition of urine samples evaluated by 16S rRNA amplicon sequencing. Details of the experimental design and methods can be found in the manuscript Benchmarking DNA isolation kits used in analyses of the urinary microbiome. Briefly, eleven voided urine samples were collected from women. The urine was pelleted, resuspended, and divided into five aliquots. Each aliquot was subject to DNA extraction by a different commercially available DNA extraction kit prior to being prepared for 16S rRNA amplicon sequencing of the V4 region.
| Kit code | Kit name |
|---|---|
| BandT | DNeasy Blood and Tissue |
| Biostic | BiOstic Bacteremia |
| PromegaSoil | DNeasy PowerSoil |
| PromegaMaxwell | Promega Maxwell RSC |
| UltraClean | DNeasy UltraClean |
This R Markdown file documents the analyses presented in the manuscript.
library(knitr)
library(phyloseq)
library(speedyseq)
library(ggplot2)
library(dplyr)
library(vegan)
library(here)
library(wesanderson)
library(plotly)
library(purrr)
library(tidyr)
library(tibble)
library(patchwork)
library(kableExtra)
load(here("dna_kits_urine.RData"))
This RData file has several objects in the workspace:
We perform a few minor manipulations for the downstream analysis - subsetting the meta variable to remove the negative control and normalizing the phyloseq objects to relative abundance for visualization.
# Create a subset without sample 12 (blank control samples):
meta_samples <- meta %>% filter(sample_number != 12)
# Create phyloseq objects normalized to 100
ps_norm <- transform_sample_counts(ps, function(x) 100* x/sum(x))
ps_norm_genus <- transform_sample_counts(ps_genus, function(x) 100* x/sum(x))
First, we evaluate if the DNA concentration varies by kit.
# **Figures: of DNA Concentration** Figure 1A and Supplemental Figure 1
fig1_A <- meta %>% filter(sample_number != "PBS") %>%
ggplot(., aes(x= kit, y = Qubit_Conc)) +
theme_classic() +
geom_boxplot(aes(fill = kit)) +
labs(y = 'DNA Concentration \n (ng/uL)', x = 'Kit', fill = "Kit") +
ggtitle('A. DNA yield') +
scale_fill_manual(values = wes_palette("IsleofDogs1")) +
theme(
axis.text.x = element_text(angle = 75, hjust = 1),
text = element_text(size=8),
plot.title = element_text(size = 8),
plot.title.position = "plot",
axis.title=element_text(size=8),
legend.text = element_text(size = 8),
legend.key.size = unit(1,"line"))
# Plot all data
fig_S1A <- ggplot(meta, aes(x = kit, y = Qubit_Conc, fill = kit)) +
geom_bar(stat = 'identity') +
theme_classic() +
facet_grid(~sample_number) +
ggtitle('DNA concentration by DNA extraction kit per sample') +
labs(y = 'DNA concentration (ng/uL)', x = 'Extraction kit') +
theme(
axis.text.x = element_text(angle = 75, hjust = 1),
text = element_text(size=8), plot.title = element_text( size = 8 ),
legend.text = element_text(size = 6),
legend.key.size = unit(.6,"line")) +
scale_fill_manual(values = wes_palette("IsleofDogs1"))
fig1_A + fig_S1A
# Run kruskal-wallis test to determine if DNA concentration varies by kit
kruskal.test(Qubit_Conc ~ kit, data = meta_samples)
##
## Kruskal-Wallis rank sum test
##
## data: Qubit_Conc by kit
## Kruskal-Wallis chi-squared = 18.992, df = 4, p-value = 0.0007887
# Significant, run post-hoc tests
pairwise.wilcox.test(meta_samples$Qubit_Conc, meta_samples$kit, p.adj = "none")
##
## Pairwise comparisons using Wilcoxon rank sum exact test
##
## data: meta_samples$Qubit_Conc and meta_samples$kit
##
## BiOstic Blood&Tissue Promega PowerSoil
## Blood&Tissue 0.15997 - - -
## Promega 0.00316 0.00067 - -
## PowerSoil 0.34736 0.03872 0.01341 -
## UltraClean 0.14865 0.00829 0.05022 0.47758
##
## P value adjustment method: none
There is a significant difference between the amount of DNA extracted from each sample and the DNA kit used. Specifically, there was a significant difference in the amount of DNA extracted from the Blood and Tissue Kit and the Promega Maxwell kit, Promega Soil, and UltraClean kit.
Note - sample 12 was removed from the statistical analysis since it is the negative extraction blank and not an actual sample.
Next, we evaluate if the sequencing depth (number of sequencing reads) vary by kit used.
# **Figures:** Figure 1B and Supplemental Figure 2
fig1_B <- meta %>% filter(sample_number != "PBS") %>%
ggplot(., aes(x=kit, y = TotalReads)) +
theme_classic() +
geom_boxplot(aes(fill = kit)) +
labs(y = 'Total sequences', x = 'Kit', fill = "Kit") +
ggtitle('B. Sequencing Depth') +
scale_fill_manual(values = wes_palette("IsleofDogs1")) +
theme(
axis.text.x = element_text(angle = 75, hjust = 1),
text = element_text(size=8),
plot.title = element_text(size = 8),
plot.title.position = "plot",
axis.title=element_text(size=8),
legend.text = element_text(size = 8),
legend.key.size = unit(1,"line"))
# plot Total Reads by kit
fig_S1B <- meta %>%
ggplot(., aes(x = kit, y = TotalReads, fill = kit)) +
geom_bar(stat = 'identity') + facet_grid(~sample_number) +
ggtitle('Number of sequences by DNA extraction kit per sample)') +
labs(y = 'Total sequences', x = 'Extraction kit') +
scale_fill_manual(values = wes_palette("IsleofDogs1")) +
theme_classic() +
theme(
axis.text.x = element_text(angle = 75, hjust = 1),
text = element_text(size=8), plot.title = element_text( size = 8 ),
legend.text = element_text(size = 6),
legend.key.size = unit(.6,"line"))
fig1_B + fig_S1B
# Run kruskal-wallis test
kruskal.test(TotalReads ~ kit, data = meta_samples)
##
## Kruskal-Wallis rank sum test
##
## data: TotalReads by kit
## Kruskal-Wallis chi-squared = 1.6148, df = 4, p-value = 0.8061
# not significant
There was no significant difference between the number of reads per sample and DNA extraction kit used, though the Promega Maxwell, Promega Soil, and Ultraclean did have less than 50% of the total number of reads compared to the Biostic and DNEasy Blood and Tissue kits for a subset of samples (3, 2, and 3 respectively).
Next, we evaluate alpha diversity metrics - summary measures commonly used in microbiome science to compare the numbers and distribution of bacteria in a given sample. We will use the number of observed genera, Shannon index, and Inverse Simpson index and identify if there are significant differences depending on which kit was used for DNA extraction.
# **Figures** - Alpha diversity
fig2 <- plot_richness(ps_genus, x = "kit", measures = c("Observed", "Shannon", "InvSimpson")) +
scale_fill_manual(values = wes_palette("IsleofDogs1")) +
geom_boxplot(aes(fill = kit)) +
theme_classic() +
theme(
axis.text.x = element_text(angle = 75, hjust = 1),
text = element_text(size=8), plot.title = element_text( size = 8 ),
legend.text = element_text(size = 6),
legend.key.size = unit(.6,"line")) +
labs(fill = "Kit")
fig2
alpha_measures <- estimate_richness(ps_genus, measures = c("Observed", "Shannon", "InvSimpson"))
test_frame <- data.frame(observed=alpha_measures$Observed, shannon = alpha_measures$Shannon, inv_simpson = alpha_measures$InvSimpson, kit = factor(sample_data(ps_genus)$kit))
# test for normality
shapiro.test(test_frame$observed)
##
## Shapiro-Wilk normality test
##
## data: test_frame$observed
## W = 0.96335, p-value = 0.0687
shapiro.test(test_frame$shannon)
##
## Shapiro-Wilk normality test
##
## data: test_frame$shannon
## W = 0.96742, p-value = 0.1089
shapiro.test(test_frame$inv_simpson)
##
## Shapiro-Wilk normality test
##
## data: test_frame$inv_simpson
## W = 0.75634, p-value = 1.282e-08
kruskal.test(observed ~ kit, data = test_frame)
##
## Kruskal-Wallis rank sum test
##
## data: observed by kit
## Kruskal-Wallis chi-squared = 4.9591, df = 4, p-value = 0.2915
kruskal.test(shannon ~ kit, data = test_frame)
##
## Kruskal-Wallis rank sum test
##
## data: shannon by kit
## Kruskal-Wallis chi-squared = 4.3279, df = 4, p-value = 0.3634
kruskal.test(inv_simpson ~ kit, data = test_frame)
##
## Kruskal-Wallis rank sum test
##
## data: inv_simpson by kit
## Kruskal-Wallis chi-squared = 3.7814, df = 4, p-value = 0.4364
There are no significant differences in alpha diversity measured by the number of observed genera, the Shannon index, or the inverse Simpson index per sample across kits.
To get an idea of the types and distribution of bacteria in each sample, we will visualize the data as stacked bar plots (normalized to relative abundance) and summarize a beta diversity analysis.
For the stacked bar plots, we will use a coloring scheme that uses the hierarchical structure of microbiome data and color each Phylum by a “parent” color, and the top 4 genera within each Phylum as a different shade of the parent color. Finally, all other genera that are grouped in an “other” category within each Phylum. To visually see the differences/similarities between kits, we facet (group) the stacked bar plots by subject (each subject in theory should have a very similar microbiome profile regardless of kit).
ps_norm_genus_top <- create_group(ps_norm_genus, color_map)
fig3_B <- plot_bar(ps_norm_genus_top, x = "kit", fill = 'group') +
scale_fill_manual( values = color_map$hex, breaks = color_map$group) +
geom_bar(stat = "identity") +
facet_wrap(~sample_number, ncol = 6) +
theme_classic() +
theme(
axis.text.x = element_text(angle = 75, hjust = 1),
text = element_text(size=8),
plot.title = element_text( size = 8 ),
plot.title.position = "plot",
legend.text = element_text(size = 6),
legend.key.size = unit(.5,"line"),
strip.text.x = element_text(size = 6, margin = margin(.05, 0, .05, 0, "cm" ))) +
labs(fill = "Phylum-Genus") +
xlab('Kit') +
guides(fill = guide_legend(ncol = 1)) +
ggtitle('B. Relative Abundance')
fig3_B
To further evaluate and summarize how different each microbiome sample is from each other, we will use the Bray-Curtis dissimilarity metric and MDS. We will test for significant effects using adonis.
out.wuf <- ordinate(ps, method = "MDS", distance = "bray")
asv_table <- as.data.frame(ps@otu_table)
meta <- as.data.frame(as.matrix(sample_data(ps)))
# **Figures** Beta Diversity
# define colors
color_palette <- c(`1` ="#FF7F00", `2` ="#1F78B4" , `3` = "#B2DF8A", `4` = "#8dd3c7", `5` = "#FB9A99" ,`6` = "#FEE391", `7`= "#6A3D9A", `8` ="#A6CEE3" , `9` = "#c2a5cf", `10`= "#FDBF6F", `11` = "#E31A1C", `PBS` = "#BABABA")
fig3_A <- plot_ordination(ps, out.wuf, color = "sample_number") +
ggtitle('A. Beta Diversity') +
geom_point(size = 1) +
scale_color_manual(values = color_palette, name = 'Sample Number') +
theme_light() +
theme(
text = element_text(size=8),
plot.title = element_text( size = 8 ),
plot.title.position = "plot",
legend.text = element_text(size = 6),
legend.position="right") +
guides(color = guide_legend(ncol = 2))
fig3_A
adonis (formula = asv_table ~ sample_number, data = meta)
##
## Call:
## adonis(formula = asv_table ~ sample_number, data = meta)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## sample_number 11 18.3253 1.66593 10.273 0.70187 0.001 ***
## Residuals 48 7.7838 0.16216 0.29813
## Total 59 26.1091 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis (formula = asv_table ~ kit, data = meta)
##
## Call:
## adonis(formula = asv_table ~ kit, data = meta)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## kit 4 1.4224 0.35561 0.79228 0.05448 0.87
## Residuals 55 24.6866 0.44885 0.94552
## Total 59 26.1091 1.00000
Beta diversity did not significantly differ by kit (p=0.870), but did by subject (p = 0.001), which is expected since individuals typically have distinct microbiomes from each other. Further, in the MDS plots, it is obvious that, for most subjects, the samples clustered together regardless of kit used for DNA extraction. The exception to this was sample 7, which had a great deal of variation across the DNA kit used.
The initial overview of the phyloseq object shows that some samples have approximately equal read depths across all kits (samples 1, 3, 5, 6, 8, 10), but others have quite varying depths (2, 4, 7, 9, 11). One sample has a low depth across all kits (12, PBS negative control).
We are interested in whether there is a difference in gram positive or gram negative bacteria represented from different kits. Note, for this analysis we annotated the tax_table with a column indicating if each genera was gram positive/gram negative/ gram variable/ or unknown.
## Remove blanks from analysis
ps_norm <- subset_samples(ps_norm, sample_number != "PBS")
# Agglomerate tax_table at the gram level
ps_norm_gram <- tax_glom(ps_norm, 'gram')
# create phyloseq objects for each subset
ps_gram_pos <- subset_taxa(ps_norm_gram, gram == 'positive')
ps_gram_neg <- subset_taxa(ps_norm_gram, gram == 'negative')
ps_gram_other <- subset_taxa(ps_norm_gram, gram != 'positive')
ps_gram_other <- subset_taxa(ps_gram_other, gram != 'negative')
# summarize into a datatable
gram_summary <- cbind(positive = sample_sums(ps_gram_pos),negative = sample_sums(ps_gram_neg), other = sample_sums(ps_gram_other))
ps_gram_summary <- phyloseq(otu_table(gram_summary, taxa_are_rows = FALSE), sample_data(meta))
ps_gram_melt <- psmelt(ps_gram_summary)
gram_positive <- ps_gram_melt %>% filter(OTU == 'positive')
print('Test for differences in gram positive bacteria by kit')
## [1] "Test for differences in gram positive bacteria by kit"
kruskal.test(Abundance ~ kit, gram_positive)
##
## Kruskal-Wallis rank sum test
##
## data: Abundance by kit
## Kruskal-Wallis chi-squared = 6.0283, df = 4, p-value = 0.197
pairwise.wilcox.test(gram_positive$Abundance, gram_positive$kit, p.adj = "none")
##
## Pairwise comparisons using Wilcoxon rank sum exact test
##
## data: gram_positive$Abundance and gram_positive$kit
##
## BiOstic Blood&Tissue PowerSoil Promega
## Blood&Tissue 0.332 - - -
## PowerSoil 0.332 0.797 - -
## Promega 0.013 0.171 0.300 -
## UltraClean 0.606 0.748 0.748 0.101
##
## P value adjustment method: none
pairwise.wilcox.test(gram_positive$Abundance, gram_positive$kit, p.adj = "fdr")
##
## Pairwise comparisons using Wilcoxon rank sum exact test
##
## data: gram_positive$Abundance and gram_positive$kit
##
## BiOstic Blood&Tissue PowerSoil Promega
## Blood&Tissue 0.55 - - -
## PowerSoil 0.55 0.80 - -
## Promega 0.13 0.55 0.55 -
## UltraClean 0.80 0.80 0.80 0.51
##
## P value adjustment method: fdr
fig4 <- ps_gram_melt %>%
filter(OTU == 'positive') %>%
ggplot( aes(x= kit,
y=Abundance,
fill = kit)) +
geom_boxplot() +
ggtitle('Abundance of gram positive bacteria') +
scale_fill_manual(values = wes_palette("IsleofDogs1"))+
theme_classic() +
theme(
axis.text.x = element_text(angle = 75, hjust = 1),
text = element_text(size=8), plot.title = element_text( size = 8 ),
legend.text = element_text(size = 6),
legend.key.size = unit(.6,"line"))
fig4
Based on the above results, it does appear that some DNA kits have increased variability and the the Promega kit recovered fewer Gram positive bacteria than the other kits, though this was not significantly different (Kruskal–Wallis, p = 0.197).
We are also interested in if there are differences between bacteria that are known to be uropathogens or are potential commensals.
ps_Lactobacillus <- subset_taxa(ps_norm, Genus == "Lactobacillus")
ps_Prevotella <- subset_taxa(ps_norm, Genus == "Prevotella")
ps_Gardnerella <- subset_taxa(ps_norm, Genus == "Gardnerella")
ps_Escherichia <- subset_taxa(ps_norm, Genus == "Escherichia/Shigella")
ps_Enterococcus <- subset_taxa(ps_norm, Genus == "Enterococcus")
ps_Corynebacterium <- subset_taxa(ps_norm, Genus == "Corynebacterium")
ps_Klebsiella <- subset_taxa(ps_norm, Genus == "Klebsiella")
ps_Staphylococcus <- subset_taxa(ps_norm, Genus == "Staphylococcus")
select_summary <- cbind(Lactobacillus = sample_sums(ps_Lactobacillus),Prevotella = sample_sums(ps_Prevotella), Gardnerella = sample_sums(ps_Gardnerella), Escherichia = sample_sums(ps_Escherichia), Staphylococcus = sample_sums(ps_Staphylococcus), Klebsiella = sample_sums(ps_Klebsiella), Corynebacterium = sample_sums(ps_Corynebacterium), Enterococcus = sample_sums(ps_Enterococcus) )
ps_select_taxa <- phyloseq(otu_table(select_summary, taxa_are_rows = FALSE), sample_data(meta))
ps_taxa_melt <- psmelt(ps_select_taxa)
taxa_summary <- ps_taxa_melt %>%
group_by(OTU, kit) %>%
summarise(
sd = sd(Abundance, na.rm = TRUE),
mean_abd = mean(Abundance),
med_abd = median(Abundance))
fig5 <- ggplot(taxa_summary, aes(kit,mean_abd , fill = kit)) +
geom_bar(stat = "identity", color = "black") +
geom_errorbar(aes(ymin = mean_abd, ymax = mean_abd+sd), width = 0.2) +
scale_fill_manual(values = wes_palette("IsleofDogs1")) +
facet_wrap(~OTU, scales = 'free_y', ncol= 4) +
ggtitle('Average relative abundance of select urobiome genera by kit ') +
labs(y = 'Relative Abundance', x = 'Kit', fill = "Kit") +
theme_classic() +
theme(
axis.text.x = element_text(angle = 75, hjust = 1, size = 10),
plot.title = element_text( size = 10 ),
legend.key.size = unit(.7,"line"),
strip.text.x = element_text(size = 8, margin = margin(.1, 0, .1, 0, "cm" ), face = "italic"))
fig5
Calculate stats of differences of individual bacteria per kit
ind_bact_test <- ps_taxa_melt %>%
select(OTU, Abundance, kit) %>%
nest(-OTU) %>%
mutate(model = map(data, ~kruskal.test(Abundance ~ kit, .)),
tidy = map(model, broom::tidy)) %>%
select(OTU, tidy) %>%
unnest(.)
ind_bact_test %>%
select(OTU, p.value) %>%
kable(caption = 'Kruskal-Wallis test of bacterial abundance by kit') %>%
kable_styling()
| OTU | p.value |
|---|---|
| Lactobacillus | 0.9724062 |
| Escherichia | 0.3146390 |
| Gardnerella | 0.9914122 |
| Corynebacterium | 0.2200768 |
| Prevotella | 0.9840616 |
| Staphylococcus | 0.4374412 |
| Klebsiella | 0.4604939 |
| Enterococcus | 0.1021204 |
# explore select individual bacteria with pair-wise comparisons
ps_taxa_melt %>%
filter(OTU == "Enterococcus") %>%
with(., pairwise.wilcox.test(Abundance, kit, p.adjust.method = "none"))
##
## Pairwise comparisons using Wilcoxon rank sum test with continuity correction
##
## data: Abundance and kit
##
## BiOstic Blood&Tissue PowerSoil Promega
## Blood&Tissue 0.581 - - -
## PowerSoil 0.115 0.361 - -
## Promega 0.018 0.079 0.329 -
## UltraClean 0.105 0.302 0.788 0.509
##
## P value adjustment method: none
Overall, we found a few differences in recovery of specific bacteria, though none of these findongs were significant in our small sample. The Promega kit tended to recover fewer Gram positive bacteria (Corynebacteria, Enterococci, and Staphylococci) and the PowerSoil kit recovered more Escherichia and Klebsiella compared to other kits.