ITS amplicon computational workflow

Geoff Zahn

Nov 10, 2023

ITS amplicon computational workflow

DOI

dx.doi.org/10.17504/protocols.io.q26g7pw63gwz/v1

Geoff Zahn¹

¹Utah Valley University

Geoff Zahn

DOI: dx.doi.org/10.17504/protocols.io.q26g7pw63gwz/v1

Protocol Citation: Geoff Zahn 2023. ITS amplicon computational workflow . protocols.io https://dx.doi.org/10.17504/protocols.io.q26g7pw63gwz/v1

Manuscript citation:

Zahn, G. (2022). Marker Genes (16S and ITS) Protocol for Plant Microbiome Analyses. BIO-PROTOCOL, 12(8). https://doi.org/10.21769/BioProtoc.4395

  

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: November 10, 2023

Last Modified: November 10, 2023

Protocol Integer ID: 90779

Abstract

Many research questions in plant science depend on surveys of the plant microbiome. When the questions depend on answering "who is there" rather than "what are they doing," the most cost-effective method for interrogating microbiomes is often via targeted meta-amplicon surveys. There are numerous platforms for processing and analyzing meta-amplicon data sets, but here we will look at a flexible, reproducible, and fully customizable pipeline in the R environment. This approach has the benefit of being able to process and analyze your data in the same setting, without moving back and forth between standalone platforms, and further benefits from being completely flexible in terms of analyses and visualizations you can produce, without being limited to pre-selected tools available in point-and-click analysis engines, such as QIIME, Galaxy, or MG-RAST.

Attachments

Zahn - 2022 - Marker...

15.3MB

Guidelines

All scripts for this computational "recipe" can be found at: https://github.com/Bio-protocol/metaamplicon-recipe

Materials

Software 
cutadapt (Martin, M., 2011; version 2.10; https://cutadapt.readthedocs.io/en/stable/changes.html#v2-1-2019-03-15)
ITSxpress (Rivers et al., 2018; version 1.0; https://github.com/usda-ars-gbru/itsxpress)
R (R Core Team, 2017; version 3.6.3; https://www.R-project.org/)
tidyverse R package (Wickham et al., 2019; version 1.3.0; https://www.tidyverse.org/)
DADA2 R package (Callahan et al., 2016; version 1.14.1; https://benjjneb.github.io/dada2/)
decontam R package (Davis et al., 2018; version 1.6.0; https://benjjneb.github.io/decontam/)
phyloseq R package (McMurdie and Holmes, 2013; version 1.30.0; https://joey711.github.io/phyloseq/)
Biostrings R package (Pagès et al., 2021; version 2.54.0; http://www.bioconductor.org/packages/release/bioc/html/Biostrings.html)
phangorn R package (Schliep, 2011; version 2.2.5; https://CRAN.R-project.org/package=phangorn)
msa R package (Bodenhofer et al., 2015; version 1.18.0; https://bioconductor.org/packages/release/bioc/html/msa.html)
ShortRead R package (Morgan et al., 2009; version 1.44.3; https://bioconductor.org/packages/release/bioc/html/ShortRead.html)
corncob R package (Martin, B. D. et al., 2020; version 0.2.0; https://CRAN.R-project.org/package=corncob)
vegan R package (Okansen et al., 2016; version 2.6.0; https://CRAN.R-project.org/package=vegan)
patchwork R package (Pedersen, 2020; version 1.0.1; https://CRAN.R-project.org/package=patchwork)

Input data 
Input data are the demultimlexed fastq reads from sequenced bacterial 16S amplicons. There should be two files for each  experimental sample that  was sequenced; a file for forward reads, and a file for reverse reads.  This workflow is written for the commonly used Illumina sequencing platform, but IonTorrent sequences can be used with some slight modification (noted in the workflow).
Example data consists of custom subsets of host-associated bacterial 16S amplicons  taken from 46 seagrass samples (including PCR negatives). Files have been truncated and modified from their original form to allow for faster computational times, while still showing significant differences in community composition between sampling groups. They no longer represent any real study system, but are meant to be used as a teaching tool. Raw and intermediate data are provided in the GitHub repository associated with this publication.

Before start

All scripts for this computational "recipe" can be found at: https://github.com/Bio-protocol/metaamplicon-recipe

Remove primers

The first step is to remove any PCR primers that may still be present on your reads. Often, sequencing centers remove these by default, but it's a good idea to check. The R script "00_Remove_Primers.R" walks you through this process. In short, you provide the PCR primers
that we used for generating our amplicons and then use the cutadapt tool to search for their presence in all possible orientations. They are then removed and we can proceed with quality filtration.
This process first removes all reads with ambiguous (N) bases, and then removes primer sequences and adaptors from remaining reads. The newly modified reads are stored in a separate directory (in this case ./data/filtN/ ). When working with bacterial 16S amplicons, this step can generally be accomplished by simply trimming the first N bases from both forward and reverse reads. However, using cutadapt is a more generalized approach that also accounts for mis-oriented reads, does not require you to know ahead of time if your primers are present or not, and is thus recommended.

Extract ITS region

It’s important that we extract the ITS region of interest at this point. The problem is that while ITS1 or ITS2 are good at predicting fungal taxonomy, the primers used to amplify these regions sit inside highly conserved flanking regions (Gardes and Bruns, 1993). Since ITS lengths can vary so greatly (Bengtsson-Palme et al., 2013), keeping these conserved regions in our sequences can greatly bias downstream taxonomic assignment algorithms. A solution exists though, in the itsxpress software (Rivers et al., 2018), which can identify and trim these conserved regions from the ends of our ITS reads. This is typically run via the command line, but we can do it from within R. The supplementary file “00_Trim_ITS_Region.R” gives an thorough walkthrough example of how to extract the ITS1 region.

Quality filtration

Once you are sure your primers aren’t present in your reads (or flanking conserved regions if analyzing fungal ITS data), it’s time to do some quality control work to filter out ambiguous and low-quality reads, and to trim reads where quality begins to suffer. The R script “01_Process_Raw_16S_Reads.R” goes through this step by step, but it’s worth our time to explain what’s going on. The workhorse software package for this and subsequent steps is DADA2 (Callahan et al., 2016).

We first parse our files generated by cutadapt (and itsxpress, in the case of fungal amplicons). The only filtration done on these so far is to remove ambiguous reads and any remaining primer sequences. The file parsing needs to account for separate forward and reverse reads, and it does so by matching patterns in the file names. In our example data forward reads are denoted by “…pass_1” and reverse reads are denoted by “…pass_2” in their names. This can vary between sequencing centers, and your reads may have different patterns such as “R1” and “R2” or “F” and “R.” Be sure to modify the example code to match your forward and reverse reads accordingly where noted.

The plotQualityProfile() function from the DADA2 package takes a look inside these sequence files and allows you to make an informed judgement about how to filter and trim your raw reads. For example, in Fig. 2, we are looking at the separate quality profiles of the forward and reverse reads of two samples. The X-axis shows read length, and the Y-axis shows summary information of quality scores for every read in the file. The green line is the mean, the orange line is the median, and the dashed orange lines are the 25th and 75th quantiles. It’s up to you to determine where good cutoff points might be for the forward and reverse reads. Keep in mind that in order to merge the reads later on, they will need to have sufficient overlap (typically >50 bases).

If you are working with fungal ITS reads, it is typical that reverse reads aren't of sufficient quality to  merge with forward reads. Especially with variable ITS lengths, it is somewhat common practice to forego using reverse reads, and to just process the forward reads output by itsxpress (Darcy et al.,
2020; Tipton et al., 2019, 2021). 

The trimming and filtration step will discard some reads; how many depends on how stringently you set the parameters. If you notice that the majority of your reads are being discarded, you might try
relaxing the maxEE parameter a bit.

ASV inferrence

Now that we have a set of quality-controlled sequence files, we’re ready to perform a “denoising” step that can predict and correct likely sequencing errors. This is an important step when analyzing exact “amplicon sequence variants” (ASVs) as opposed to “operational taxonomic units” (OTUs). There is always a bit of “noise” in sequence data, representing sequence errors. OTUs were, in part, conceived of as a way to deal with this constant threat of sequencing noise. By clustering highly similar sequences together, it accounts for small amounts of variation. However, much of that variation is real and informative (Callahan et al., 2017; Tipton et al., 2021). Methods now exist to discover and correct sequence errors, allowing us to use ASVs. The advantages include finer taxonomic resolution, the potential to discover patterns that might be hidden by artificially clustering sequences together, and most importantly, the ability to easily combine data from multiple studies. Since each ASV is an actual sequence, rather than an undefined cloud of sequences (as with OTUs), they are directly comparable outside of a given study. This facilitates reproducibility and re-use such as in meta-analyses.

The DADA2 R package gives us a set of tools for inferring these ASVs. First, we build a mathematical profile of error rates along the length of our reads. In basic terms, we can think of this error profile as being a mathematical representation of our quality profile plots (Figure 3). The learnErrors() function is a machine-learning tool that builds an error model based on your data. Figure 4 shows this error model. What you’re looking for is a decent fit between observed and estimated errors, and a decline in error as quality scores increase; It doesn’t need to be a perfect fit. If your model looks reasonable (as ours does in Figure 4), you are ready to use that error model to correct sequence errors. A poorly fit error model (indicated by large discrepancies between observed points and the model fit line) may be a result of binned quality scores or a lack of ‘diversity’ of quality scores. In this case, be sure you are using the raw sequence read files for creating the error profile, not reads that have been pre-processed in any way.

The dada() function performs denoising based on the estimated error model. In essence, it is performing statistical tests that determine whether a given sequence was observed too many times to have been caused by sequencing errors in currently inferred sequences (Rosen et al., 2012). The dada() function is run separately on forward and reverse reads and the output is a set of sequence variants that were inferred from the “noisy” input sequences. From this point, we are ready to merge our forward and reverse reads and construct a sequence table. These tasks are accomplished with the mergePairs() and makeSequenceTable() functions, respectively.

Remove chimeras 

Chimeric sequences result from incomplete extension during PCR. Partial amplicons can serve as "false templates" for primers to bind to, resulting in reads that are composed of more than one true
sequence. Chimera detection involves evaluating distinct matches between the left-side and right-side of reads and potetial "parent" source reads. In practice, this detection and removal step can be
conducted with the removeBimeraDenovo() function as shown in the R script 01_Process_Raw_16S_Reads.R."

Remove contaminant sequences

While sadly not universally practiced, including and sequencing negative controls is a crucial step when performing meta-amplicon studies. Often, these control samples are blank DNA extractions that are then sequenced. This allows us to discover DNA sequences that were present in the lab environment that do not necessarily represent meaningful diversity in our real samples. The decontam package (Davis et al., 2018) has been implemented to deal with these negative controls, using them to detect likely contaminant sequences. The way we will use this package is by giving the isContaminant() function our newly created sequence table and a list of which samples are negative controls. Your sample metadata spreadsheet likely contains study-relevant information about each sequenced sample, but should also include a column denoting whether a sample is a negative control (Table 2 shows an example metadata sheet formatted correctly). Sequences with increased prevalence in control samples can then be identified and removed. Once potential contaminants are removed, we can then remove the negative control samples from further downstream steps since their job is done.

Assign taxonomy

Assigning taxonomy to DNA barcodes is a complicated subject. The success and accuracy of any assignments depend on the algorithms and databases that you use. In fact, with phylogeny-independent taxonomic assignments, the current limiting factors are the completeness and accuracy of the databases to which we match our sequences. With this in mind, it is critical that you carefully consider what methods to use. The RDP Naive Bayesian Classifier algorithm (Wang et al., 2007) is one such algorithm that is widely used and has several advantages. First, it does not rely on sequence alignments, which makes it suitable for fungal ITS studies. Second, it is incredibly fast compared to algorithms such as BLAST. Third, it provides bootstrapped confidence values to all assignments and allows matching to the “least common ancestor,” reducing the incidence of false positive matches. Its performance for species-level taxonomic assignments of bacteria is lacking, and new information has shown that exact 100% matching is the only justifiable method for assigning species to bacterial ASVs (Edgar, 2018). A selection of properly formatted reference files for popular taxonomic databases are available at http://benjjneb.github.io/dada2/training.html.  	Here, we use two complementary methods to assign taxonomy to our reads: 1) the RDP Classifier using the RDP 16S Training Set database, implemented via the assignTaxonomy() function, and 2) exact 16S matching for species-level assignments implemented via the addSpecies() function. The “Assign Taxonomy” section of “01_Process_Raw_16S_Reads.R” script covers these steps. These functions take our sequence table as input and return a data frame matching our ASVs to taxonomy at the deepest levels that could be assigned unambiguously.

If you are using fungal ITS reads, taxonomic assignment is essentially the same, but you will need to use a eukaryotic database such as the UNITE Eukaryotic Database (DOI: 10.15156/BIO/2938069) instead of a bacterial 16S database, and the addSpecies() function should be omitted.

The last step in this script is to construct a phyloseq object in which to store your ASV table along with your sample metadata and taxonomic assignments. The phyloseq package (McMurdie and Holmes, 2013) provides a data structure that can combine a sequence abundance table (like our ASV table), taxonomic information, sample metadata, and a phylogenetic tree into one object that makes working with these disparate tables much simpler.

Protocol references

	Abarenkov,
	K., Zirk, A., Piirmann, T., Pöhönen, R., Ivanov, F., Nilsson, R.
	H. and
	Kõljalg, U. (2020). UNITE
	general FASTA release for eukaryotes
	[Application/gzip]. UNITE Community.
	https://doi.org/10.15156/BIO/786370

	

	Afgan,
	E., Baker, D., Batut, B., van den Beek, M., Bouvier, D.,
	Čech, M., Chilton, J., Clements, D., Coraor, N., Grüning, B. A.,
	Guerler, A., Hillman-Jackson, J., Hiltemann, S., Jalili, V., Rasche,
	H., Soranzo, N., Goecks, J., Taylor, J., Nekrutenko, A.
	and Blankenberg, D. (2018).
	The
	Galaxy platform for accessible, reproducible and collaborative
	biomedical analyses: 2018 update. Nucleic
	Acids Res 46(W1): W537-W544. 
	

	

	Bengtsson-Palme,
	J., Ryberg, M., Hartmann, M., Branco, S., Wang, Z., Godhe, A., De
	Wit, P., Sánchez-García, M., Ebersberger, I., de Sousa, F., Amend,
	A. S., Jumpponen, A., Unterseher, M., Kristiansson, E., Abarenkov,
	K., Bertrand, Y. J. K., Sanli, K., Eriksson, K. M., Vik, U., Veldre,
	V. and Nilsson, R. H. (2013). Improved
	software detection and extraction of ITS1 and ITS2 from ribosomal
	ITS sequences of fungi and other eukaryotes for analysis of
	environmental sequencing data. Methods
	Ecol Evol 4(10): 914-919.

	

	Bodenhofer,
	U., Bonatesta, E., Horejš-Kainrath, C. and Hochreiter, S. (2015).
	msa:
	An R package for multiple sequence alignment. Bioinformatics
	31(24): 3997-3999.
	
	

	

	Bolyen,
	E., Rideout, J. R., Dillon, M. R., Bokulich, N. A., Abnet, C. C.,
	Al-Ghalith, G. A., Alexander, H., Alm, E. J., Arumugam, M., Asnicar,
	F., Bai, Y., Bisanz, J. E., Bittinger, K., Brejnrod, A., Brislawn,
	C. J., Brown, C. T., Callahan, B. J., Caraballo-Rodríguez, A. M.,
	Chase, J., Cope, E. K., Silva, R. D., Diener, C., Dorrestein, P. C.,
	Douglas, G. M., Durall, D. M., Duvallet, C., Edwardson, C. F.,
	Ernst, M., Estaki, M., Fouquier,
	J., Gauglitz, J. M., Gibbons, S. M., Gibson, D. L., Gonzalez, A.,
	Gorlick, K., Guo, J., Hillmann, B.,
	Holmes, S., Holste, H., Huttenhower, C., Huttley,
	G. A., Janssen, S., Jarmusch, A. K., Jiang, L., Kaehler, B. D.,
	Kang, K. B., Keefe, C. R., Keim, P., Kelley, S. T., Knights, D.,
	Koester, I., Kosciolek, T., Kreps, J., Langille, M. G. I., Lee, J.,
	Ley, R., Liu, Y. X., Loftfield, E., Lozupone, C., Maher, M.,
	Marotz, C., Martin, B. D., McDonald, D.,
	McIver, L. J., Melnik, A. V., Metcalf, J. L., Morgan, S. C., Morton,
	J. T., Naimey, A. T., Navas-Molina, J. A., Nothias, L. F.,
	Orchanian, S. B., Pearson, T., Peoples, S. L., Petras, D., Preuss,
	M. L., Pruesse, E., Rasmussen, L. B., Rivers, A., Robeson II, M. S.,
	Rosenthal, P., Segata, N., Shaffer, M., Shiffer, A., Sinha, R.,
	Song, S. J., Spear, J. R., Swafford, A. D., Thompson, L. R., Torres,
	P. J., Trinh, P., Tripathi, A., Turnbaugh, P. J., Ul-Hasan, S., van
	der Hooft, J. J. J., Vargas, F., Vázquez-Baeza, Y., Vogtmann, E.,
	von Hippel, M., Walters, W., Wan, Y., Wang, M., Warren, J.,
	Weber, K. C., Williamson, C. H. D.,
	Willis, A. D., Xu, Z. Z., Zaneveld, J. R., Zhang, Y., Zhu, Q.,
	Knight, R. and Caporaso,
	J. G. (2019). Reproducible,
	interactive, scalable and extensible microbiome data science using
	QIIME 2. Nat Biotechnol
	37(8): 852-857. 
	

	

	Bryan,
	J. (2017). Project-oriented
	workflow.
	https://www.tidyverse.org/blog/2017/12/workflow-vs-script/
	
	

	

	Busby,
	P. E., Peay, K. G., & Newcombe, G. (2016). Common foliar fungi
	of Populus trichocarpa modify Melampsora rust disease severity. New
	Phytologist, 209(4),
	1681–1692. https://doi.org/10.1111/nph.13742

	

	Callahan,
	B. J., McMurdie, P. J. and Holmes, S. P. (2017). Exact
	sequence variants should replace operational taxonomic units in
	marker-gene data analysis. ISME
	J 11: 2639-2643.

	

	Callahan,
	B. J., McMurdie, P. J., Rosen, M. J., Han, A. W., Johnson, A. J. A.
	and Holmes, S. P. (2016). DADA2:
	High-resolution sample inference from Illumina amplicon data.
	Nat Methods
	13(7): 581-583. 
	

	

	Darcy,
	J. L., Swift, S. O. I., Cobian, G. M., Zahn, G. L., Perry, B. A. and
	Amend, A. S. (2020). Fungal
	communities living within leaves of native Hawaiian dicots are
	structured by landscape-scale variables as well as by host plants.
	Mol Ecol
	29(16):
	3102-3115. 
	

	

	Davis,
	N. M., Proctor, D. M., Holmes, S. P., Relman, D. A. and Callahan, B.
	J. (2018). Simple
	statistical identification and removal of contaminant sequences in
	marker-gene and metagenomics data. Microbiome
	6(1): 226. 
	

	

	Edgar,
	R. C. (2004). MUSCLE:
	a multiple
	sequence alignment method with reduced time and space complexity.
	BMC Bioinformatics
	5(1): 113. 
	

	

	Fernandes,
	A. D., Macklaim, J. M., Linn, T. G., Reid, G. and Gloor, G. B.
	(2013). ANOVA-like
	differential expression (ALDEx) analysis for mixed population
	RNA-Seq.
	PLOS ONE
	8(7): e67019. 
	

	

	Gardes,
	M. and Bruns, T. D. (1993).
	ITS primers with enhanced specificity for basidiomycetes-application
	to the identification of mycorrhizae and rusts.
	Mol Ecol
	2(2):
	113-118.

	

	Gloor,
	G. B., Macklaim, J. M., Pawlowsky-Glahn, V. and Egozcue, J. J.
	(2017). Microbiome
	datasets
	are compositional: and this is not optional.
	Front Microbiol
	8: 1-6. 
	

	

	Love,
	M. I., Huber, W. and Anders, S. (2014). Moderated
	estimation of fold change and dispersion for RNA-seq data with
	DESeq2. Genome Biol 15(12):
	550.

	

	Lozupone,
	C. and Knight, R. (2005). UniFrac:
	a new phylogenetic method for comparing microbial communities.
	Appl Environ Microbiol
	71(12): 8228-8235. 
	

	

	Martin,
	B. D., Witten, D. and Willis, A. D. (2020). Modeling
	microbial abundances and dysbiosis with β-binomial
	regression. Ann
	Appl Stat 14(1): 94-115.
	
	

	

	Martin,
	M. (2011). Cutadapt
	removes adapter sequences from high-throughput sequencing reads.
	EMBnet J
	17(1): 10-12.
	
	

	

	McMurdie,
	P. J. and Holmes, S. (2013). phyloseq:
	an R package for reproducible interactive analysis and graphics of
	microbiome census data.
	PLOS ONE
	8(4): e61217. 
	

	

	McMurdie,
	P. J. and Holmes, S. (2014). Waste
	not, want not: why rarefying microbiome data is inadmissible.
	PLoS Comput Biol
	10(4):
	e1003531. 
	

	

	Meyer,
	F., Paarmann, D., D’Souza, M., Olson, R., Glass, E. M., Kubal, M.,
	Paczian, T., Rodriguez, A., Stevens, R., Wilke, A., Wilkening, J.
	and Edwards, R. A. (2008). The
	metagenomics RAST server – a public resource for the automatic
	phylogenetic and functional analysis of metagenomes. BMC
	Bioinformatics 9(1): 386.

	

	Morgan,
	M., Anders, S., Lawrence, M., Aboyoun, P., Pages, H. and Gentleman,
	R. (2009). ShortRead:
	a bioconductor package for input, quality assessment and exploration
	of high-throughput sequence data. Bioinformatics
	25(19):
	2607-2608.
	
	

	

	Okansen,
	J., Blanchet, F. G., Friendly, M., Kindt, R., Legendre, P., McGlinn,
	D., Minchin, P. R., O’Hara, R. B., Simpson, G. L., Solymos, P.,
	Stevens, M. H. H., Szoecs, E. and Wagner, H. (2016). vegan:
	Community   Ecology Package.
	(2.4-0) [Computer software]. 
	

	

	Pagès,
	H., Aboyoun, P., Gentleman, R. and DebRoy, S. (2021). Biostrings:
	Efficient manipulation of biological strings
	(2.58.0) [Computer software]. Bioconductor version: Release (3.12). 
	

	

	Pedersen,
	T. L. (2020). patchwork:
	The Composer of Plots
	(1.1.1) [Computer software]. 
	

	

	Team,
	R. C. (2017). R:
	A Language and Environment for Statistical Computing
	(3.6.3) [Computer software]. R Foundation for Statistical Computing.
	
	

	

	Rivers,
	A. R., Weber, K. C., Gardner, T. G., Liu, S. and Armstrong, S. D.
	(2018). ITSxpress:
	software
	to rapidly trim internally transcribed spacer sequences with quality
	scores for marker gene analysis.
	F1000Res
	7: 1418.

	

	Rosen,
	M. J., Callahan, B. J., Fisher, D. S. and Holmes, S. P. (2012).
	Denoising
	PCR-amplified metagenome data. BMC
	Bioinformatics 13(1):
	283. 
	

	

	Schliep,
	K. P. (2011). phangorn:
	phylogenetic analysis in R.
	Bioinformatics
	27(4):
	592-593.
	
	

	

	Tipton,
	L., Zahn, G., Datlof, E., Kivlin, S. N., Sheridan, P., Amend, A. S.
	and Hynson, N. A. (2019). Fungal
	aerobiota are not affected by time nor environment over a 13-y time
	series at the Mauna Loa Observatory. Proc
	Natl Acad Sci U S A 116(51):
	25728-25733.

	

	Tipton,
	L., Zahn, G. L., Darcy, J. L., Amend, A. S. and Hynson, N. A.
	(2021). Hawaiian
	Fungal Amplicon Sequence Variants Reveal Otherwise Hidden
	Biogeography. Microb Ecol

	

	Wainwright,
	B. J., Zahn, G. L., Arlyza, I. S. and Amend, A. S. (2018).
	Seagrass-associated
	fungal 	communities follow Wallace’s line, but host genotype does
	not structure fungal community. J
	Biogeogr 45(4):
	762-770.

	

	Wainwright,
	B. J., Afiq-Rosli, L., Zahn, G. L. and Huang, D. (2019).
	Characterisation
	of coral-associated bacterial communities in an urbanised marine
	environment shows strong divergence over small geographic scales.
	Coral Reefs
	38: 1097-1106. 
	

	

	Wang,
	Q., Garrity, G. M., Tiedje, J. M. and Cole, J. R. (2007). Naive
	Bayesian classifier for rapid assignment of rRNA sequences into the
	new bacterial taxonomy. Appl
	Environ Microbiol 73(16):
	5261-5267.

	

	Wickham,
	H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R.,
	Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen,
	T., Miller, E., Bache, S., Müller, K., Ooms, J., Robinson, D.,
	Seidel, D., Spinu, V., Takahashi, K., Vaughan, D., Wilke, C., Woo,
	K. and Yutani, H. (2019). Welcome
	to the Tidyverse. Journal of
	Open Source Software 4(43):
	1686. 
	

	

	Zahn,
	G., & Amend, A. S. (2017). Foliar microbiome transplants confer
	disease resistance in a critically-endangered plant. PeerJ,
	5,
	e4020. https://doi.org/10.7717/peerj.4020

Public workspaceITS amplicon computational workflow

ITS amplicon computational workflow