What defines a microbe?

9/25/2019

Recently, a lot of conversation has been circulating regarding what constitutes a microbe. How do we know we captured the diversity in a system and have a representative sample or sequence? Are abundant, relevant members of a community culturable? If not, how can we be sure that the computational output is representative? Especially with the increased use of culture-independent methods in microbiome research, we are relying heavily on the molecular and computational methods to assess this vast diversity of microorganisms.

For this post, I will focus less on ideas of microdiversity (although this will come briefly later) within 16S defined OTU/ESVs. Instead, I want to highlight some of the more interesting conversations happening in microbial ecology regarding 1) culturability and 2) metagenome-assembled genomes or MAGs. This post largely spawned from the amazing dialogue being generated over Twitter over these two issues, and I just wanted to give my opinion on the topics. I will refrain from "taking sides" but focus on the arguments and the bigger picture implications for microbiome research.

High proportions of bacteria are culturable across major biomes

Let's start here because I think this is a SUPER important topic and obviously will relate to the second conversation below. For decades, microbiologists have leaned heavily on a major crutch:

paradigm that only 1% of microorganisms are culturable

A recent communication by Adam Martiny in ISME [1] challenged this paradigm in microbiology by arguing that, based on 16S rRNA similarity of bacteria, abundant members (35-52% of sequences and taxa, respectively) across major biomes have a representative member in a cultured isolate.

At first, I was delighted to see this communication published. Just weeks before this, I saw multiple seminars and job talks starting the Introduction with highlighting this paradigm. Inevitably, it came across, that "well we can't culture bacteria, so why even try?". Subsequently, the talks would go over culture-independent approaches to navigate this "massive hurdle" in microbiology. But this is a major disservice to all the hardwork done to culture bacteria. Everything, in theory, is culturable - it's a matter of figuring out how to do it. The question shouldn't be is X bacterium culturable? it should be how do we culture it? This ideology of yet-to-be cultured microbes is definitely painstaking work, but it is possible (everyone please check out the incredible work of Yoichi Kamagata to characterize anaerobic bacteria and their syntrophic relationships - some culturing issues were resolved just by switching the preparation of the media! [2]).

Much to my surprise, there was a lot of controversy with the release of the paper by A Martiny. Twitter was a whirlwind and it eventually led to a formal response in ISME by some outstanding microbial scientists [3]. This response concentrated on three major issues:

1. 16S rRNA gene is biased towards certain taxa
2. Database and analyses bias results (Figure)
3, Inferring 16S to cultures

Best hit identities from environmental study in soil to 16S rRNA RDP database [3].

Both of these papers make great points and address topics that every microbiologists should be thinking about. As I have said before, many studies infer far too much from descriptive datasets of 16S rRNA community analyses. It is almost impossible to relate this data or the observed taxa to ecological function. I have spent a lot of time laying out some of these arguments, but basically comes down to the 16S rRNA gene provides a coarse depiction of the microbial community. There are millions of years of divergence and adaptation masked by the use of conserved marker genes.

These are some of the similar points the Steen et al. paper lay out to argue against the idea that most organisms have been cultured. However, I do feel like this resolution, specifically with 16S rRNA studies, it is relevant to compare against databases. This is exactly what Martiny's response [4] details that the most common interpretation using a defined taxonomic unit (OTU) implies a lot of abundant members have cultured representatives in databases, certainly superseding the 1% "rule".

In any case, how you define whether a taxa is cultured or what is the real number so far really depends on interpretation. And based on this interpretation, researchers need to apply appropriate methods. If you are concerned with fine-scale diversity, do not use 16S rRNA surveys. If you want to target a specific group, design taxa-specific primers. If you want to infer function and avoid PCR and molecular biases, probably switch to metagenomes. And finally, if you wish to link genotype to phenotype, figure out a way to culture and target the abundant members of your system. Physiological data can provide a lot and was the basis for most of my work to target isolates.

Artificial construction of metagenome assembled genomes (MAGs)???

I think this is a nice segue into the next topic. Based on the vastly different ways to interpret culturability of a microbe, I think it is super interesting that MAGs aren't more of a topic of discussion. For those who are unfamiliar, MAGs are extracted from metagenomic data using computational approaches to assemble "community" data, bin assembled contigs into similar sequences based on genomic signatures (e.g., %GC, tetranucleotide frequencies), and assess contamination to create MAGs of otherwise yet-to-be-cultured ;) bacteria or archaea (schematic below).

Based on these computational methods, HUGE advances have been made to characterizing microbial communities, including the discovery of novel lineages. However, a recent preprint very much challenged the validity of these methods, insomuch, to declare that some of these novel lineages derived from MAGs are "unnatural constructs" that were stitched together from environmental DNA [5]. This was absolutely crazy! This paper decidedly claims that binning artifacts and MAGs in general provide a systematic problem for the way we assess microbial diversity.

As many members were quick to point out, cultured isolates exist for the novel lineages where high quality MAGs (i.e., complete genomes) match pretty nicely. As an ultra bonus, one of the "artificial" lineages detailed in the preprint recently was isolated and described in another preprint documenting a 12 year(!) isolation effort for Asgard Archaea - this is why science is so awesome! Further, a really talented grad student in the Banfield Lab, Alex Crits-Christoph, re-analyzed this data to address the main support against MAGs in the preprint:

The effect observed is due to differences in the phylogenetic distribution of genomes from metagenomes vs isolate genomes.

SPOILER, what he found was the main phylogenetic anomaly described in the preprint can be corrected and this effect disappears (side figure). Basically, the preprint relied on selecting MAGs and isolates and constructing phylogenetic trees for conserved, single-copy ribosomal proteins. They found the MAGs differ from the isolates mainly in the phylogenetic distance between trees was higher than comparisons to isolates (top panel reproduced by ACC). ACC attempted to mitigate these effects by having a 1:1 ratio of representative isolates at the genus and order level; for each MAG there was a similar isolate genome to compare. This reduced or removed the signal altogether (bottom panel).

I personally think the authors vastly overstate their conclusions but I do think they bring up a few good points, especially if we start to bring in the ideas from the previous section. MAGs are useful because we can essentially sequence the entire microbial community and "pull out" genomes without isolating strains. This was incredibly useful in simple microbial communities where low genetic diversity allowed for the relatively easy binning of MAGs. However, I do start to question how much of MAGs, especially in complex, genetically-diverse systems, are a representation of a mosaic of closely-related strains and/or lineages.

From my personal experience in soil microbial communities (leaf litter to be exact), I tried to assembly and bin MAGs for the most abundant bacterium in the system (yes, Curtobacterium). This system harbors >2000 OTUs (defined at 97% similarity) and Curtobacterium comprises anywhere from 8% (from metagenomic data [6]) to 16% (from 16S rRNA data) of the microbial community. We thought, since Curto is so abundant, let's assemble, bin, and analyzed the Curto MAG (top side figure). While we were able to extract two Curto bins, after curation, they did show some inconsistencies with our known isolates from the system using ribosomal genes (bottom side figure). In particular, they formed large outgroups or exhibited long branch lengths. For me, it seemed we are collapsing a lot of fine-scale genetic variation into a conglomerate Curtobacterium "genome", essentially creating a genus-specific mosaic genome. In the end, we decided to not publish the MAGs and concentrate on the genomic and phenotypic variation of our cultured isolates [6].

For me, this is where the MAG idea falls short or at least needs to assessed the same way researchers are doing for 16S rRNA. Specifically, start to ask the same questions, How do we know we captured the diversity in a system and have a representative sample or sequence? The same way we breakdown 16S data, we need to start questioning whether MAGs (or at least low quality MAGs) are actually representative members of the microbial community. Or are some of these MAG constructs providing the same resolution as 16S, collapsing relevant genetic variation into these coarse representations of the true diversity? I don't know the answer to these questions but I think researchers definitely need to start considering the quality of reported MAGs and genomes for that matter. Either way, these past few weeks have provided great conversations and stimulated a lot of relevant talking points for the field. I hope to see more in the future!

Coverage profiles of all coassembled contigs as a function of their GC content. Colors show top 10 most abundant bacterial genera identified by BLAST (listed in descending order) [6].

Multilocus phylogeny of four single-copy ribosomal proteins. All nodes on tree are from isolates except for two MAGs (denoted with * and bold branch lines)

Papers:

1. Martiny AC. (2019). High proportions of bacteria are culturable across major biomes. ISME 13: 2125-2128.

2. Kato S, Yamagishi A, Daimon S, Kawasaki K, Tamaki H, Kitagawa W, Abe A, Tanaka M, Sone T, Asano K, Kamagata Y. (2018). Isolation of Previously Uncultured Slow-Growing Bacteria by Using a Simple Modification in the Preparation of Agar Media. Applied Environmental Microbiology 84: e00807–18.

3. Steen AD, Crits-Christoph A, Carini P, DeAngelis KM, Fierer N, Lloyd KG, Thrash JC. (2019). High proportions of bacteria and archaea across most biomes remain uncultured. ISME.

4. Martiny AC. (2019). The '1% culturability paradigm' needs to be carefully defined. ISME.

5. Garg SG, Kapust N, Lin W, Tria FDK, Nelson-Sathi S, Gould SB, Fan L, Zhu R, Zhang C, Martin WF. (2019). Anomalous phylogenetic behavior of ribosomal proteins in metagenome assembled genomes. bioRxiv.

6. Chase AB, Karaoz U, Brodie EL, Gomez-Lunar Z, Martiny AC, Martiny JBHM. (2017). Microdiversity of an abundant terrestrial bacterium encompasses extensive variation in ecologically relevant traits. mBio 8: e01809-17.

1 Comment

Alex

9/27/2019 12:41:06 pm

I am posting a useful comment from Twitter regarding this post from Alex Crits-Christoph ( Twitter @acritschristoph ):

"""
Hi alex, a nice read. deserves a more long-form comment than twitter will allow...
Truth is in the genomes-from-metagenomes community a lot of answers to these questions exist in people's intuitions / heads / scattered across the literature but are not formalized or summarized.
but:

- Soil is the hardest. MAGs from CC 2018 Nature / Diamond 2019 Nat Microbio are qualitatively different than MAGs from acid mine drainage or HMP (and there is a continuum among soil mags).
- You might like this comparison between MAGs and SAGs:
https://t.co/q4gqh60gyt?amp=1
- see my recent preprint on genetic diversity of soil mags. These MAGs *do* imperfectly attempt to describe an average for populations, but so does every genome (e.g., human genome). Clonal isolate genomes are also a highly biased imperfect model of a true biological population

For true exogenous contamination, it is rare and I hand checked it for the 19 examples in my biorxiv (using the 30+ replicate genomes for each). It's on small contigs at CheckM freqs. Which is why we hand curate / check contigs when making biological conclusions from them

There's some ongoing work quantifying "how much each MAG is like an average of similar strains vs clonal genomes". Truth is, it differs. A lot of good papers on it are also already out there, wish I could summarize / link.

For your Curtobacterium it was a co-assembly right? Co-assemblies are more likely to mix strains together (and actually fragment assemblies), so MAGs from single samples are usually a bit better (we avoid co-assemblies, they scare me)

Here's a plot from simulated data mixing two genomes together and then reassembling them. When there are two ~98% strains in a sample, the assembly completely fails (2,000 contigs!)
So the blessing/curse is that in most "mixed strains" samples, you won't get a "good" mag at all

sorry for long thread. But if you believe that there are questions in understanding how accurately MAGs describe populations (and all of the caveats associated with that), I agree with that. Both some knowledge synthesizing (like a review) and some new analyses are needed here
"""