An RNA-centric view on gut Bacteroidetes

,


Intestinal Bacteroidetes thrive in a dynamic microenvironment
The human gut is broadly subdivided into the small and large intestine.Compared with the small intestine, the large bowel presents a relatively mild environment to colonizing microorganisms due in part to the relatively higher pH levels (pH 5.5-7), lower oxygen tension, and a reduced immunogenic milieu that favors balance over clearance.However, the nutritional environment, which is dominated by complex polysaccharides that cannot be readily absorbed in the small intestine, imposes strong metabolic pressure on colon-resident bacteria.Nevertheless, thanks to the evolution of sophisticated enzymatic repertoires to catabolize these carbon sources and metabolic co-dependencies, the large intestine represents one of the densest microbial ecosystems in nature.Of the 10 11 -10 12 bacteria per gram of fecal content that constitute the colon microbiota (Knight and Girling 2003), obligate anaerobic species comprise the largest fraction with two dominant phyla, the Gram-positive Firmicutes and Gram-negative Bacteroidetes (Huttenhower et al. 2012).
The Bacteroidetes are non-motile, non-spore forming, rod-shaped bacteria.Although Bacteroidetes species also occur outside the gut, here we focus on intestinal members.Among them, Bacteroides spp.constitute the most abundant bacterial genus in the human gut, where they contribute to the release of energy from dietary fiber and represent a major source of short-chain fatty acids.Biogeographically speaking, Bacteroides are enriched in the lumen and outer mucus layer along the colon (Donaldson et al. 2016) (Figure 1).Some mucin-degrading species, such as Bacteroides fragilis, can also colonize colonic crypts where they modulate the host immune system (Lee et al. 2013;Round et al. 2011).Bacteroides spp.appear in neonates at about four to six days after birth with relative abundance depending on the mode of delivery, diet, and gestational age and stably persist in the gut for a lifetime.Previous studies linked Bacteroides abundance in the human intestine with a lower risk of developing obesity (Ley et al. 2006) or colorectal cancer (Lee et al. 2018), but also with inflammatory disorders (Bloom et al. 2011)yet often cause and consequence are difficult to disentangle.The commensals Bacteroides thetaiotaomicron and B. fragilisthe latter also being an opportunistic pathogen outside the gut (Goldstein, 1996) are gaining increasing attention as model organisms for functional microbiota research.This is due to their prevalence, impact on host physiology and metabolism, and the relative ease with which they can be cultured and genetically manipulated (Bacic and Smith 2008).Their study has already revealed molecular processes that form the basis for successful colonization of the large intestine.
For instance, to stably colonize the colon, Bacteroides spp.evolved multiple arrays of paralogous gene clusters known as polysaccharide utilization loci (PULs) that allow them to feed on complex diet-and host-derived polysaccharides.Generally, PULs encode cell envelopespanning complexes consisting of glycolytic enzymes and outer membrane proteins such as SusCD homologs (for starch utilization system; historically the first described PUL system (Reeves et al. 1997)) that are required for glycan binding (SusD) and import (SusC).The regulation of PULs occurs at the transcriptional level by several mechanisms.SusR-like regulators (D'Elia and Salyers 1996) and hybrid two-component systems (HTCSs) (Sonnenburg et al., 2006(Sonnenburg et al., , 2010) ) both combine sugar-sensing and gene regulatory functions into a single polypeptide and activate the transcription of specific PUL operons.In parallel, PUL transcription is controlled by extracytoplasmic function sigma/ anti-sigma factor pairs whose dissociation is induced in the presence of appropriate glycans, releasing the cognate sigma factor to activate transcription of its target operon (Martens et al. 2008).
Bacteroides genomes further contain multiple polysaccharide biosynthesis loci for capsule formation.Capsular polysaccharides (CPSs) are essential for gut colonization as they contribute to cross-talk with the host epithelium and determine bacterial susceptibility to bacteriophage attack (Liu et al. 2008).CPS expression is regulated by invertible promoters and mediated by sitespecific recombinases (Coyne et al. 2003).Additionally, CPS expression is regulated co-transcriptionally by specialized NusG-like proteins of the UpxY family (where "x" designates the cognate CPS locus).Upon binding specifically to sequences within the nascent 5′ UTR of CPS operons, UpxY interacts with the transcribing RNA polymerase to prevent premature termination, thus allowing complete transcription of these 11-23 kb loci (Chatzidaki-Livanis et al. 2009).In turn, UpxZ (produced from a gene downstream of upxY in a CPS operon) inhibits the antitermination activity of UpxY proteins of different CPSs, thereby giving rise to a regulatory hierarchy independent of DNA inversions (Chatzidaki-Livanis et al. 2010).
Figure 1: Dynamics associated with the intestinal niches occupied by Bacteroidetes.The large intestine is associated with fluctuations in nutrient availability (1) and oxygen concentration.Colon-colonizing bacteria need to adapt to these changes in order to efficiently compete with neighboring microbes.Additionally, they defend themselves against antimicrobial compounds released by co-colonizing bacteria (2) and phage attacks (3), and cross-talk with the immune system of the host (4).Gut-associated Bacteroidetes occur in the lumen of the large bowel, but some species may also attach to mucosal surfaces at the host epithelium.As glycan generalists, Bacteroides spp.can feed on both, diet-or host-derived polysaccharides.They are bile-resistant and survive transient increases in oxygen levels.Individual entities are not drawn to scale.

RNA landscape of the Bacteroidetes
Complementing protein-mediated regulation of transcription, RNA-mediated gene expression control is widespread in bacteria.Over the past two decades, an astonishing versatility in RNA-centric mechanisms has been uncovered, particularly in model gastrointestinal pathogens.The regulatory RNAs that bring about these control mechanisms are grouped into different classes based on their genomic organization: regulatory elements within 5′ untranslated regions (UTRs) of mRNAs, including riboswitches (Breaker 2012) and RNA thermometers (Kortmann and Narberhaus 2012), cis-encoded antisense RNAs (Wagner et al. 2002), and trans-encoded small RNAs (sRNAs) (Storz et al. 2011;Wagner and Romby 2015).While some riboregulators function autonomously, the activity of others, e.g., that of many sRNAs, depends on assisting proteins (Holmqvist and Vogel 2018).
As inferred from the roles regulatory RNAs play in Proteobacteria, where they often function in the context of stress adaptation and metabolism (Bobrovskyy and Vanderpool 2013;Holmqvist and Wagner 2017), one would expect these molecules to also help commensal gut bacteria to rapidly adapt to their ever-changing microenvironment.Additionally, given the contribution of regulatory RNAs in bacterial pathogens to their interaction with host cells (Svensson and Sharma 2016;Westermann 2018), we postulate riboregulation may also underlie the cross-talk of anaerobic gut commensals with their host andpotentiallycompanion microbes.However, compared with our firm knowledge of regulatory RNAs in enteric pathogens, little is known about RNA biology in the beneficial bacteria colonizing our gastrointestinal tract and, in particular, in obligate anaerobic Bacteroidetes species.Recent findings from global transcriptomics (Cao et al. 2016;Ryan et al. 2020) now predict hundreds of noncoding RNAs and support the idea of a rich RNA world in Bacteroides spp., as will be reviewed in the following sections (summarized in Table 1).

"Housekeeping" RNAs
Amongst the most conserved bacterial noncoding transcripts are certain specialized housekeeping RNAs with functions in the maintenance of key cellular processes.Recent work from our laboratory verified the existence of such transcripts in B. thetaiotaomicron (Ryan et al. 2020).The transfer-messenger RNA (tmRNA, a.k.a.SsrA), for example, rescues stalled ribosomes (Withey and Friedman 2003).The two domains of the tmRNAthe tRNA-like domain and the downstream short open reading framecan be part of the same molecule or cleaved into two basepairing parts (Mao et al. 2009).Across the Bacteroidetes, tmRNAs show high sequence conservation and in B. thetaiotaomicron RNA-seq and Northern blot data revealed this RNA to be transcribed as a 507 nt precursor that is processed into the ∼400 nt mature form containing both domains (Ryan et al. 2020).
The signal recognition particle (SRP) directs target proteins to the membrane.In bacteria, the RNA component of this complexthe 4.5S RNA (∼110 nt long; encoded by the ffs gene)fulfills a scaffold function by forming a platform for the association of the proteinaceous SRP constituents (Peluso et al. 2000).Initially identified as candidate sRNA BTnc259 (Ryan et al. 2020), sequence comparison with known 4.5S homologs from other bacteria using Rfam (Kalvari et al. 2018) and BLAST suggested this transcript as the putative 4.5S RNA of B. thetaiotaomicron (Prezza and Ryan et al. in prep).
The M1 RNA (encoded by rnpB) is the catalytic RNA component of the ribonuclease (RNase) P holoenzyme, which is involved in the processing of tRNA, 4.5S RNA, and tmRNA precursor molecules (Altman 2011).While active even by itself under optimized in vitro conditions, M1 RNA requires the accessory RnpA protein (BT_3227 in B. thetaiotaomicron) for efficient functioning in the bacterial cell.Based on their secondary structures, bacterial M1 RNAs are grouped into two classestype A and Bwith Bacteroidetes M1 RNAs falling into the type A category.We recently validated the M1 RNA in B. thetaiotaomicron as a primary transcript of ∼390 nt with evidence of processing at the 3′ end, resulting in a mature transcript of ∼360 nt (Ryan et al. 2020).

Regulatory elements within 5′UTRs of mRNAs
The 5′ UTR of bacterial mRNAs may contain cis-regulatory features that control the expression of the downstream coding sequence (CDS).Riboswitches, for example, are cisregulatory elements in front of mRNAs for metabolic enzymes and transporters (McCown et al. 2017).Riboswitches consist of two domainsthe aptamer region binds with high specificity to a metabolite and this ligand binding, in turn, induces a conformational change in the so-called expression platform, thereby influencing mRNA expression at the level of transcription elongation or translation initiation (Lotz and Suess 2018).RNA thermometers, too, are regulatory elements within 5′ UTRs of mRNAs, but in this case a conformational change is triggered by temperature alterations, affecting the access of ribosomes to the downstream CDS (Kortmann and Narberhaus 2012).The length of an mRNA leader can thus hint at the presence of a cis-regulatory element.We recently determined the median and average lengths of 5′ UTRs in B. thetaiotaomicron (32 nt; 52 nt) with a fraction (13.5%) of mRNAs having unusually long (>100 nt) leader sequences (Ryan et al. 2020), arguing that cis-regulatory elements may be prevalent in Bacteroides.
Indeed, several riboswitches have been inferred from homology searches and characterized in Bacteroidetes.The thiamine pyrophosphate (TPP)-sensing riboswitch is the only known class to occur in all three domains of life and is involved in the regulation of metabolism and transport of TPP, the active form of vitamin B 1 , that is required for a range of cellular processes (Sudarsan et al. 2003).Comparative genomics predicted TPP riboswitches across the Bacteroidetes, typically in front of operons of

Ryan et al. ()
a Underlined species are the ones in which the respective RNA element was validated/characterized.
genes involved in the biosynthesis or import of thiamine (Costliow and Degnan 2017;Rodionov et al. 2002).TPP riboswitches illustrate how riboswitch aptamers and expression platforms can be mixed and matched, leading to different regulatory consequences.That is, the TPP riboswitches of Bacteroides vulgatus and Bacteroides uniformis, and a thiamine biosynthetic TPP riboswitch in B. thetaiotaomicron locate >50 nt upstream of the CDS and work at the level of transcription (Costliow et al. 2019).In contrast, the two TPP riboswitches governing thiamine import operons in B. thetaiotaomicron locate immediately upstream of their cognate start codon and function at the level of translation initiation, which led to the hypothesis that the distance between a riboswitch and the downstream start codon hints at its mode-of-action (Costliow et al. 2019) (Figure 2a).While transcriptional control is considered tighter, translational riboswitches enable faster and reversible responses to metabolite sensing.However, not only the mode-of-action, but also the ligand threshold concentrations differ between B. thetaiotaomicron biosynthetic (∼10 nM) and transport riboswitches (∼100 nM), suggesting hierarchical control of thiamine synthesis and import.
Vitamin B 12 is an essential cofactor for several enzymes.The majority of gut-associated bacteria contain importers for vitamin B 12 (Degnan et al. 2014) and homologs of the TonB-dependent transporter, BtuB, are widely distributed across the Bacteroidetes.B. thetaiotaomicron contains three btuB gene copies (BT_1489, BT_1953, BT_2094), and all of them are associated with a riboswitch (Degnan et al. 2014;Vitreschak et al. 2003).The predicted mechanism of regulation involves binding of adenosylcobalamin (AdoCbl)the biologically active form of vitamin B 12to the aptamer that in turn refolds the region around the ribosome-binding site (RBS) to block translation initiation (Vitreschak et al. 2003) (Figure 2b).Reminiscent of TPP riboswitches, AdoCbl riboswitches of individual transporter mRNAs respond at different ligand concentrations, again suggesting hierarchical control (Degnan et al. 2014).In our recent transcriptome study, we identified another putative AdoCbl riboswitch in the 5′ UTR of BT_1915, which encodes a pyruvate carboxylase subunit A protein (Ryan et al. 2020).
Thermo-sensing RNAs have been described mostly in bacteria that experience temperature changes throughout their lifecycles, such as pathogens that face a temperature increase when entering their mammalian host.From this point of view, it was somewhat surprising thatbased on sequence similarity to known RNA thermometers (namely ROSE_3 [RF02523] and PrfA thermoregulatory UTR [RF00038])thermo-sensing RNA candidates were predicted in B. thetaiotaomicron (Ryan et al. 2020) that, despite many other variable parameters (Figure 1), populates a niche with a fairly constant temperature.The validity of these predictions and their biological relevance, however, need further investigation.
The Roc protein (for regulator of colonization; BT_3172) of B. thetaiotaomicron is a HTCS regulator that activates transcription of a defined PUL (BT_3173-3174) which metabolizes host glycans (Townsend et al. 2013).The 5′ UTR of roc mRNA is relatively long (54 nt), pointing to a cis-regulatory RNA element, albeit too short to harbor a riboswitch.The roc leader sequence mediates repression of the associated CDS and downstream genes in that operon in the presence of glucose and fructose (Townsend et al. 2019).While the mechanism of Roc repression is elusive, judged from several highly conserved residues within the leader sequence, it was hypothesized that regulation could occur through the binding of a trans-acting sRNA or a regulatory protein (Townsend et al. 2019) (Figure 2c).From an evolutionary standpoint, this regulation may guarantee that B. thetaiotaomicron finds its proper niche in the host gut: colonization factors shall be expressed only in a microenvironment where this glycan generalist has a selective advantage over competing microbes that catabolize simple sugars, but fail to process complex polysaccharides.

Cis-encoded antisense RNAs
Bacterial RNAs partially overlapping with genes on the opposite strand were first identified on accessory genetic elements, where they fulfill specialized functions including the control of plasmid replication and conjugation or lysis/lysogeny decisions in phages (Wagner et al. 2002).The advent of RNA-seq in bacterial transcriptomics led to the unexpected finding that antisense RNAs are widespread also in the core genome (Georg and Hess, 2018).In B. thetaiotaomicron, for example, we detected ∼1100 antisense transcription start sites (TSSs), comprising one fourth of all identified initiation sites (Ryan et al. 2020).While absolute numbers should be interpreted with cautionas they may be heavily influenced by technical variation between RNA-seq protocols and analysis pipelinesthis value is of a similar magnitude to those determined for proteobacterial model organisms such as Salmonella enterica (13%; Kröger et al. 2012), Escherichia coli (37%; Thomason et al. 2015), and Helicobacter pylori (41%; Sharma et al. 2010).It is currently debated to what extent this plethora of antisense RNAs is functional or merely the result of spurious transcription (Lloréns-Rico et al. 2016).For a handful of antisense candidates in Proteobacteria and Firmicutes, however, functionality has been demonstrated (Wade and Grainger 2014).These RNAs affect expression of their cognate sense-overlapping target by a variety of mechanisms, including transcriptional interference andpost-transcriptionallybase-pairing and masking of the target mRNA's RBS or shielding or generating an RNase cleavage site within the CDS.
An exploratory RNA-seq study recently discovered a specialized class of cis-antisense transcripts in B. fragilisthe PUL-overlapping antisense RNAs (Cao et al. 2016).These 78-128 nt-long RNAs are enriched within PULs for host-derived glycan processing and divergently encoded to the respective susC homolog.Functional characterization of the Don (degradation of N-glycans) PUL system revealed the cognate antisense RNAtermed DonSto repress its PUL.The mode-of-action employed by DonS (and other PUL-associated antisense RNAs) is elusive, but either a transcriptional interference mechanism orsince they overlap the translation initiation region of the cognate susC homologstranslational inhibition appear plausible (Figure 2d).In case of the latter, post-transcriptional PUL repression by the antisense RNAs would complement transcriptional control through the corresponding anti-sigma factor, together ensuring tight PUL repression in the presence of a prioritized carbon source.Indeed, a ΔdonS B. fragilis mutant was unable to effectively shut down Don when glucose was added to the medium (Cao et al. 2016).
PUL-associated antisense RNAs were also predicted for related species, including B. thetaiotaomicron, B. vulgatus, and Bacteroides ovatus (Cao et al. 2016).Our laboratory further expanded on this class of PUL-overlapping antisense RNAs with the identification of five additional candidates in B. thetaiotaomicron, three of which (BTnc055, BTnc136, BTnc252) are antisense to the translation initiation region, while the remaining two (BTnc011, BTnc111) are antisense to the CDS of the cognate susC homolog.(Ryan et al. 2020) Trans-encoded small RNAs sRNAs are short RNA moleculestypically between 50 and 300 nt in lengththat are encoded by independent genes or arise from the UTRs of mRNAs.They can regulate target gene expression via two distinct, yet not mutually exclusive mechanisms: directly, by mediating imperfect base-pair interactions with specific trans-encoded target mRNAs, or indirectly, by titrating regulatory RNA-binding proteins (RBPs).Amongst the latter, are the sRNA antagonists of the translational regulator CsrA/RsmA (Romeo and Babitzke 2018), the Ro60-interacting Y RNAs with versatile cellular functions (Sim and Wolin 2018), and the 6S RNA (encoded by the ssrS gene), which sponges the RNA polymerase holoenzyme to globally tune transcriptional activity (Wassarman 2018).In Bacteroidetes, no CsrA system is known and Y RNAs have not been identified either; however, a 6S RNA homolog was predicted in this phylum (Wehner et al. 2014) and recently validated in B. thetaiotaomicron as a ∼190 nt-long transcript (Ryan et al. 2020).Generally, 6S RNA adopts the characteristic structure of a long hairpin with a central asymmetric bulge and sequesters RNA polymerase by molecular mimicry of the transcription bubble in genomic DNA.To reactivate transcription, RNA polymerase uses 6S RNA as a template, resulting in the synthesis of 14-20 nt-long product RNAs (Wassarman and Saecker, 2006) that, too, were detected in B. thetaiotaomicron (Ryan et al. 2020).This argues that this type of transcriptional control is an ultraconserved mechanism across the bacterial kingdom (Barrick et al. 2005).
In contrast to protein antagonists, base-pairing sRNAs anneal through short "seed" sequences with partially complementary target sites in mRNAs and repress or activate their targets through a variety of mechanisms (Wagner and Romby 2015).In Gram-negative species, sRNAs often depend on assisting RNA chaperones such as the Sm-like Hfq or the FinO domain-containing proteins (Holmqvist and Vogel 2018).In the best-understood scenario of sRNA-mediated target control from γ-Proteobacteria, Hfq binding stabilizes the sRNA in the cytosol and facilitates its annealing to the target sequenceclassically within the 5′ region of an mRNA, overlapping with the Shine-Dalgarno sequence and/ or start codon to block translation initiation (Hör et al. 2020).With respect to the Bacteroidetes, this raises several interesting questions: are trans-acting sRNAs prevalent in this phylum?If so, and given that obvious homologs of Hfq and FinO are missing, do Bacteroidetes sRNAs work in a proteinindependent manner or are there other global RBPs that chaperone the sRNAs?Since Bacteroidetes further lack the classical Shine-Dalgarno sequence of Proteobacteria (Nakagawa et al. 2010), would sRNAs still preferentially bind to the 5′ region of mRNAs and inhibit translation, or are other modes of target control more common in Bacteroidetes?
In our recent work, we addressed some of these questions: application of differential RNA-seq (Sharma and Vogel 2014) to refine the transcriptome annotation of B. thetaiotaomicron type strain VPI-5482, led to the discovery of 269 noncoding RNA elements, including 151 putative sRNAs (Ryan et al. 2020).This number is comparable with the sRNA complement in proteobacterial species (Dugar et al. 2013;Kröger et al. 2013Kröger et al. , 2018;;Sharma et al. 2010;Vogel et al. 2003).Of 14 selected sRNA candidates, nine were validated by Northern blot, including a 3′ UTR-derived sRNA.3′-derived sRNAs had previously only been described in Proteobacteria (Miyakoshi et al. 2015); this finding from Bacteroides now implies that the 3′ end of mRNAs may constitute a reservoir for regulatory RNAs throughout the bacterial phylogenetic tree.We further identified rarer cases of 5′ UTR-derived and intra-operonic sRNA candidates, suggesting an expanded sequence space for the origin of sRNAs (Adams and Storz 2020;Jose et al. 2019).Sequence conservation often is a predictor of functional importance.We therefore selected a core set of 49 intergenic sRNAsharboring both, a cognate TSS and a predicted Rho-independent transcription terminatorfor conservation analyses.Sequence alignment across the Bacteroidetes phylum revealed 22 of them to be conserved in two or more species, while the remaining 27 were B. thetaiotaomicron-specific (Prezza and Ryan et al. in prep;Ryan et al. 2020).
Since sRNAs are prevalent in the Bacteroidetes, how do they function in the absence of homologs of known global RBPs and classical Shine-Dalgarno sequences?Up to now, two trans-encoded B. thetaiotaomicron sRNAs have been functionally characterized.RteR (regulation of tetracycline resistance elements RNA) is a sRNA of 90 nt, encoded downstream of the exc gene (the rteR promoter overlaps the exc stop codon) within the excision region of the integrative conjugative transposon CTnDOT, and is widely conserved within Bacteroides spp.(Jeters et al. 2009;Waters and Salyers 2012).In B. thetaiotaomicron, RteR promotes discoordinate expression of the tra operon, whose products are required to assemble the mating apparatus for the transfer of CTnDOT.That is, while the mRNA levels of traAthe first gene in the operon, encoding a conjugative transposon proteinare not affected by the sRNA, the downstream traB-Q genes are repressed (Waters and Salyers 2012).However, RteR shows no obvious effect on the half-life of tra mRNA, speaking against a post-transcriptional effect.Rather, a co-transcriptional mechanism was proposed (Figure 2e): since a sequence resembling an intrinsic transcription terminator as well as stretches with partial sequence complementarity to RteR were identified within the traB CDS, RteR may interact with the nascent tra transcript, inducing a conformational change that results in premature termination of tra transcription.Experimental evidence for this mechanism is still required and it is currently also unknown whether the prematurely terminated transcript is still a substrate for TraA synthesis.Irrespectively, however, RteR inhibits conjugative transfer of CTnDOT and thereby influences the spread of antibiotic resistance genes (Waters and Salyers 2012).
That Bacteroides sRNAs can also mediate posttranscriptional control of gene expression was recently supported by our findings on GibS (GlcNAc-induced Bacteroides sRNA) (Ryan et al. 2020).In B. thetaiotaomicron, GibS is transcribed from an intergenic region in between a putative para-aminobenzoate synthase cluster (BT_0763-68) and a glycogen biosynthesis operon (BT_0769-71).GibS is associated with a Bacteroides promoter motif that is enriched in front of stationary phase-induced genes and, accordingly, GibS levels increase over growth in rich medium.GibS steady-state levels, however, increase even further when the bacteria grow in minimal medium with N-acetyl-D-glucosamine (GlcNAc)a monosaccharide constituent of host-derived glycosaminoglycansas the sole carbon source.Structural analysis by chemical and enzymatic probing revealed the conformation of this 145 nt-long sRNA, composed of a single-stranded 5′ region (∼40 nt), followed by two meta-stable hairpins and a Rhoindependent terminator.Genome-wide differential expression analysis guided the identification of GibS target operons, which are related to metabolic processes.In particular, GibS activates an operon comprising a galactosidase and a periplasmic glucosidase gene (BT_1871-BT_1872) and represses the BT_0769-BT_0771 operon harboring genes for glucan-branching enzymes, as well as BT_3893, which codes for a hypothetical protein.In silico prediction and in vitro validation experiments revealed the unstructured 5′ region of GibS, and two distinct seed regions therein, to be at the heart of this regulation.That is, GibS employs one or both of its seed regions, respectively, to anneal with sequence stretches spanning the start codons of BT_0771 or BT_3893 (Figure 2f).The physiological role of GibS needs further investigation; a ΔgibS B. thetaiotaomicron mutant shows no strong growth phenotype in rich medium, but grows slightly faster than an isogenic wild-type when feeding on GlcNAc (Ryan et al. 2020).

RBP candidates and ribonucleases in Bacteroidetes
Global RNA binding proteins such as Hfq and FinO-like proteins are mediators of sRNA-target interactions in many Gram-negative bacteria (Holmqvist and Vogel 2018;Woodson et al. 2018).The role of Hfq has been extensively studied in Proteobacteria, particularly in E. coli and S. enterica, where it forms a homohexameric ring with three RNA-interacting surfaces to stabilize the bound sRNA and facilitate its annealing to target mRNAs (Kavita et al. 2018;Santiago-Frangos and Woodson 2018;Vogel and Luisi 2011).FinO-like proteins have only recently emerged as global RNA binders in α-, β-, and γ-Proteobacteria, but resolved mechanisms of sRNAs associated with these proteins remain sparse (Olejniczak and Storz 2017).Bacteroidetes species lack homologs of both, Hfq and FinO.In contrast, proteins containing cold-shock or K homology domainswhich also have the ability to bind RNAare prevalent in the Bacteroidetes (Prezza and Ryan et al. in prep) and it remains to be investigated whether any of them functionally substitutes for Hfq or FinO-like proteins in this phylum.
Central to sRNA regulatory pathways in Gramnegative bacteria is the activity of RNases (Mohanty and Kushner 2018).In Proteobacteria, RNase E is a central player in sRNA regulatory processes: it is involved in the processing of several RNA species, including 3′-derived sRNAs (Chao et al. 2017), and can be recruited by sRNA-Hfq complexes to induce endonucleolytic cleavages within target mRNAs, initiating their rapid decay (Bandyra and Luisi 2018).In the Bacteroidetes phylum, several putative RNases (Table 2) have been annotated through automated homology searches (https://biocyc.org/)(Karp et al. 2019).This includes a member of the family of RNase E/G-like endonucleases (BT_1500 in B. thetaiotaomicron); whether or not it is involved in sRNA-guided target degradation might be a subject of future studies.The general paucity of information on these vital cellular enzymes in Bacteroidetes offers exciting avenues of future research.

Bacteroidetes CRISPR systems
The large intestine harbors a vast consortium of bacteriophages that shape the microbiota composition and impose a high selection pressure on gut-resident bacteria (Mirzaei and Maurice 2017) (Figure 1).CRISPR-Cas systems are present in many prokaryotic species and provide adaptive immunity against phage infections (Marraffini 2015).Computational predictions indicate that about half of the sequenced Bacteroidetes species possess at least one CRISPR locus (Makarova et al. 2020;Pourcel et al. 2020) (Figure 3).While the classical type II-C system is the most frequent in Bacteroidetes, almost all of the known occurrences of the type VI system are restricted to this phylum (Makarova et al. 2020).Type VI CRISPR systems involve Cas13, whichunlike most other Cas nucleasestargets RNA rather than DNA.However, once activated through the recognition of transcribed phage RNA, Cas13 degrades nearby bacterial RNAs through its nonspecific RNase activity, thereby inducing bacterial dormancy upon phage infection (Meeske et al. 2019).These findings stem from Listeria; whether Cas13 plays a similar role for the persistence of Bacteroidetes populations exposed to the gut phageome is not yet known.From an evolutionary standpoint, as of now we can only speculate as to why certain intestinal Bacteroidetes species contain, while others lack, CRISPR systems, despite occupying similar host niches.The presumed burden associated with a type VI CRISPR system, for example, that might attack random RNA could outweigh its benefits in relation to other anti-phage systems (Hampton et al. 2020).This would particularly hold true for species such as B. thetaiotaomicron that can switch between multiple surface CPSs, which already provides a certain degree of protection against phage adsorption (Porter et al. 2020).
Apart from global, species-level predictions, CRISPR systems remain mostly unexplored in specific Bacteroidetes members.A notable exception is B. fragilis.Detailed inspection of strains with available genome sequence revealed that most (100 of 109) of the analyzed B. fragilis genomes carry at least one CRISPR system of type III-B, II-C or I-B (Tajkarimi and Wexler 2017).Seventy one strains also harbor a CRISPR array that lacks any associated Cas protein and is consistently found directly  upstream of the hipAB operon, with a putative role in persister cell formation and antibiotic resistance.This genomic co-localization led the authors to hypothesize that the "orphan" CRISPR could affect hipAB expression (Tajkarimi and Wexler 2017), but experimental validation of this hypothesis is still needed.

Commonalities and differences between regulatory RNA activities in Bacteroidetes and Proteobacteria
With just two trans-encoded sRNAs (RteR [Waters and Salyers 2012] and GibS [Ryan et al. 2020]) and one class of cis-antisense RNAs (DonS-like RNAs; Cao et al. 2016) functionally characterized, Bacteroidetes RNA research is still in its infancy.However, recent global transcriptomic approaches to B. fragilis (Cao et al. 2016), B. thetaiotaomicron (Ryan et al. 2020), and even an extra-intestinal Bacteroidetes member (Hirano et al. 2012;Høvik et al. 2012) have revealed a plethora of regulatory RNA candidates and suggest a bright future for this field.With respect to the nomenclature of Bacteroidetes noncoding RNA candidates, we recently introduced an analogous concept to that used for Proteobacteria, namely to name RNA genes "BTncXXX" where "BT" designates the species (here: B. thetaiotaomicron), "nc" refers to noncoding, followed by a three-digits number according to the ranked position on the chromosome.Only upon functional characterization, this operational identifier may be replaced by a trivial (four-letter) name.If adopted by the community and used consistently, this nomenclature should facilitate cross-comparison between independent studies.The major housekeeping RNAs (tmRNA, 4.5S RNA, M1 RNA) are present in Bacteroides spp.(Prezza and Ryan et al. in prep;Ryan et al. 2020).High-resolution annotation of the B. thetaiotaomicron transcriptome (Ryan et al. 2020) further indicates that the numbers of representatives of individual regulatory RNA classese.g., 78 cis-antisense RNAs and 124 intergenic sRNAsare similar to that reported in well-characterized bacteria from other phyla.Conservation of these newly identified Bacteroides RNAs, however, barely extends beyond the genus level, except for specialized sRNAs such as 6S RNA.Limited conservation seems to be a general feature of noncoding RNAs and promises many new RNA biological aspects to be learned from the study of Bacteroidetes.Already now, careful inspection of the identified sRNA candidates in B. thetaiotaomicron enables some speculations regarding the commonalities and differences of Bacteroidetes sRNAs and their proteobacterial counterparts.For example, we found the average length and "structuredness" (i.e., the minimal free energy of in silico-predicted sRNA structures normalized by genomic GC content) of intergenic sRNAs in B. thetaiotaomicron to be similar to that of Proteobacteria (Prezza and Ryan et al. in prep).
Conversely, an obvious difference is the absence of Hfq homologs from Bacteroidetes.Whereas Grampositive bacteria exemplify that sRNA regulation can occur without an assisting chaperone (Brantl and Brückner, 2014), this is relatively uncommon for Gramnegatives.Rather, there could be unrelated Bacteroidetes RBPs that functionally substitute Hfq.A second difference relates to the absence of a classical Shine-Dalgarno sequence from Bacteroidetes translation initiation regions (Nakagawa et al. 2010).Instead, there is a characteristic enrichment of adenine residues in the 5′ UTR of mRNAs at positions −3, −6, and from −11 to −15 (relative to the start codon) that enhance translation efficiency (Baez et al. 2019).This region is targeted by GibS (in two out of three targets; (Ryan et al. 2020)) and is necessary and sufficient to repress roc mRNA, a mechanism that could be mediated by an unknown sRNA (Townsend et al. 2019).It is tempting to speculate that trans-acting sRNAs in the Bacteroidetes would compensate the absence of Hfq by evolving extended seed regions or utilizing multiple seeds to cover this entire region and mediate efficient regulation.If so, GibS-mediated repression of BT_3893which involves two seed regionswould be a case in point, although the generalizability of these findings is unclear.In fact, for repression of a second bona fide GibS target (BT_0771) a single seed seems to be sufficient.Both of these GibS targets were repressed at the mRNA level; however, if mRNA decay is a secondary effect of the interference with translation initiation, or whether an RNase is actively recruited for target degradation has to be seen.
In one of the densest microbial ecosystems, fitness depends on efficient competition with companion microbes for nutrients.From a physiological perspective, regulatory RNAs are often implicated in the regulation of metabolic processes in Proteobacteria (Bobrovskyy and Vanderpool 2013), and this is likely even more so the case in specialized glycan degraders.For example, Bacteroides spp.evolved the unusual ability to metabolize more than a dozen plant-and host-derived polysaccharides (McNeil 1984;Salyers et al. 1977).However, expression of the large protein machineries to bind, process, and import complex carbohydrates is energetically costly.Therefore, Bacteroides would pay a high price if not able to tightly control their metabolic capacities.Transcriptional control on its own can only induce or shut off de novo synthesis of the corresponding mRNAs; however, clearance of the pool of pre-existing mRNAs from the cytosol upon sensing a preferred carbon source requires post-transcriptional mechanisms.In addition, RNA-mediated expression control allows fast adaptation, which should provide another advantage in the face of rapid nutrient fluctuations and fierce competition.We therefore expect that many more regulatory RNAsin addition to the TPP and AdoCbl riboswitch, the DonS-like antisense RNAs, the roc leader, and GibS sRNAmodulate metabolism in Bacteroidetes.

Open questions and how to address them
This review provides an overview of the status quo of our collective knowledge of RNA biology in gut-associated Bacteroidetes.However, the field is just beginning to develop and many open questions remain to be addressed.For example, some of the best-studied trans-acting sRNAs in Proteobacteria regulate several dozens of target mRNAs, generating post-transcriptional regulatory networks with comparable complexity to regulons governed by transcription factors (Papenfort and Vogel 2009).Whether this applies also for Bacteroidetes sRNAs is currently unknown.The targets of the two Bacteroides sRNAs that have been functionally characterized were either identified by educated guesses (effect of RteR on the tra operon) or by differential expression analysis (ΔgibS mutant vs. wildtype vs. overexpression strain).The former bears the risk that additional targets are missed, whereas differential expression cannot distinguish direct from secondary effects.In contrast, technologies now routinely used for sRNA target screens in Proteobacteria, such as sRNA pulseexpression (Massé et al. 2005;Papenfort et al. 2006) (Figure 4a) or sRNA affinity purification followed by RNA-seq (Lalaouna et al. 2017), could be transferred to Bacteroidetes to identify direct sRNA target candidates in an unbiased, genome-wide manner.To validate bona fide targets, reporter systems such as lacZ (Huntzinger et al. 2005) or fluorescent target fusions (Urban and Vogel 2007)provided the samples are given enough time under normoxic conditions for protein maturation to occur or use of oxygen-independent alternatives (Chia et al. 2020)could be harnessed (Figure 4b).Alternatively, luciferase-based reporter constructsas used to dissect the mechanisms of TPP riboswitches in Bacteroides spp.(Costliow et al. 2019)could be adopted for sRNA target verification.
As of now, we do not know whether the function of Bacteroidetes sRNAs depends on assisting chaperones or if regulatory RNAs work in a protein-independent manner in this phylum.Recent technological breakthroughs (Gerovac et al. 2020;Queiroz et al. 2019;Shchepachev et al. 2019;Smirnov et al. 2017;Urdaneta et al. 2019) led to the identification of novel RBPs even in species that have served us as RNA research models for decades (Attaiech et al. 2016;Pagliuso et al. 2019;Smirnov et al. 2016) and should likewise foster RBP discovery in Bacteroidetes (Figure 4c).
Which RNAs are employed by Bacteroidetes to efficiently colonize host niches?Dual RNA-seq (Westermann et al. 2017) of hypoxic cell culture models colonized with Bacteroides spp.(Figure 4d), or hybrid selection RNA-seq (Donaldson et al. 2020) of Bacteroides colonizing in vivo tissues have great potential to discover regulatory RNAs induced during host interaction.Enhanced expression often indicates functional relevance under the given condition and this could subsequently be tested by fitness screening of the respective knockout mutants.Alternatively, genome-wide perturbation screens such as transposon insertion sequencing (Cain et al. 2020) have been applied to uncover genetic factors contributing to Bacteroides fitness in the host (Goodman et al. 2009;Wu et al. 2015).Random mutagenesis, however, is inherently biased toward the disruption of longer genes, typically resulting in an underrepresentation of sRNA mutants in the library.Targeted approaches such as CRISPR interference, whose applicability was already demonstrated for B. thetaiotaomicron (Mimee et al. 2015) and that could simultaneously knock down hundreds of sRNAs, appear to be promising alternatives (Figure 4e).
Bacteroides spp. release outer membrane vesicles (OMVs) to share polysaccharide processing machineries with neighboring bacteria (Rakoff-Nahoum et al. 2014) and to deliver anti-inflammatory compounds to mammalian host cells (Shen et al. 2012).As bacterial vesicles can also contain ribonucleoprotein complexes, extracellular RNA molecules of pathogenic species shuttled via OMVs were suggested to mediate the cross-talk with co-colonizing bacteria or mammalian host cells (Koeppen et al. 2016;Lécrivain and Beckmann 2020).RNA delivery may also occur in the opposite direction, i.e., from the host epithelium to bacterial microbiota members (Liu et al. 2016).Bacteroides spp.at the interface of host-microbe and microbe-microbe encountersare therefore promising model organisms to further explore the concept of functional extracellular RNA.
The Bacteroides community has made a strong commitment to make their global datasets easily accessible and easily usable by peers.It is thus worth emphasizing that multiple open-source databases exist that may be consulted for information on the Bacteroides genome and transcriptome, such as our Theta-Base (featuring transcriptome annotation and gene expression data for B. thetaiotaomicron; www.helmholtz-hiri.de/en/datasets/bacteroides; Ryan et al. 2020), the Fitness Browser (Tn-seq data for B. thetaiotaomicron under a variety of conditions; http://fit.genomics.lbl.gov; Liu et al. 2019), and PULDB (an overview of predicted and published PUL systems across Bacteroidetes; www.cazy.org/PULDB_new/;Terrapon et al. 2018).These platforms provide excellent entry points for any functional study.
Bacteroidetes RNA research is beginning to prosper.With this review, we highlight recent progress made in this new field and hope to boost future studies.The case examples, arising concepts, and open questions reported a, sRNA target identification via pulse-expression.An sRNA is induced for a short time period (∼10 min; symbolized by the timer) in the bacterial cell, followed by global RNA-seq to pinpoint altered mRNA levels in response to the sRNA pulse.b, Translational fusions of the predicted target region to a colorimetric, fluorescent or luminescent reporter gene can be employed to validate sRNA target transcripts andthrough the introduction of point mutationstarget sites.c, RBP identification with gradient sequencing (Grad-seq).Sedimentation of cellular RNA-protein complexes by density centrifugation and detection of RNA and protein molecules in the resulting fractions from low (LMW) to high molecular weight (HMW) by RNA-seq and mass-spectrometry (MS) can be used to screen for sRNA-binding proteins.d, Dual RNA-seq profiles bacterial and host gene expression during their interaction.It allows for the identification of in vivo-induced sRNAs and interspecies expression correlation may pinpoint host target processes of individual sRNAs.e, CRISPR interference (CRISPRi) screening for sRNA mutant fitness.Guide RNAs (gRNAs) designed against the sRNA complement of a bacterium and catalytically inactive Cas9 nuclease (dCas9) are introduced in a bacterial population (input), which is subsequently grown under a defined selection pressure (e.g., host colonization) and the remaining bacteria (output) are compared to the input pool to identify functionally important sRNAs under the screened condition.
herein shall serve the community as a common starting ground for the years to come.

Figure 2 :
Figure 2: Working principles of characterized Bacteroides regulatory RNA elements.Proposed mechanisms of the thiamine pyrophosphate (TPP) (a) and adenosylcobalamin (AdoCbl) (b) riboswitches, of the roc mRNA leader (c), of the DonS antisense RNA (d), and of the RteR (e) and GibS (f) sRNAs.Coding genes are indicated as grey arrows, mRNAs are in dark blue and regulatory RNA genes/elements in red.See main text for details.

D
.Ryan et al.: Bacteroides RNA biology

Figure 3 :
Figure 3: Prevalence of CRISPR-Cas systems across the Bacteroidetes.Phylogenetic tree of 485 Bacteroidetes genome assemblies generated with ETE 3 (Huerta-Cepas et al. 2016), color-coded based on presence/absence of CRISPR systems as per (Makarova et al. 2020) and with "landmark" species highlighted.The background colors group tree branches that belong to the same phylogenetic class.While type VI CRISPR systems (purple squares) are quite common in Bacteroidetes, they have been barely observed outside this phylum.

Figure 4 :
Figure 4: Technologies to foster functional studies of Bacteroidetes sRNAs.a,sRNA target identification via pulse-expression.An sRNA is induced for a short time period (∼10 min; symbolized by the timer) in the bacterial cell, followed by global RNA-seq to pinpoint altered mRNA levels in response to the sRNA pulse.b, Translational fusions of the predicted target region to a colorimetric, fluorescent or luminescent reporter gene can be employed to validate sRNA target transcripts andthrough the introduction of point mutationstarget sites.c, RBP identification with gradient sequencing (Grad-seq).Sedimentation of cellular RNA-protein complexes by density centrifugation and detection of RNA and protein molecules in the resulting fractions from low (LMW) to high molecular weight (HMW) by RNA-seq and mass-spectrometry (MS) can be used to screen for sRNA-binding proteins.d, Dual RNA-seq profiles bacterial and host gene expression during their interaction.It allows for the identification of in vivo-induced sRNAs and interspecies expression correlation may pinpoint host target processes of individual sRNAs.e, CRISPR interference (CRISPRi) screening for sRNA mutant fitness.Guide RNAs (gRNAs) designed against the sRNA complement of a bacterium and catalytically inactive Cas9 nuclease (dCas9) are introduced in a bacterial population (input), which is subsequently grown under a defined selection pressure (e.g., host colonization) and the remaining bacteria (output) are compared to the input pool to identify functionally important sRNAs under the screened condition.

Table  :
Partially characterized noncoding RNA elements in Bacteroidetes.
DonSRepression of donC (susC homolog of the Don PUL) via transcriptional interference and/or RNA-RNA interaction  nt Bacteroides fragilis; conserved across Bacteroides spp.Cao et al. () Intergenic sRNA induced in stationary phase and in the presence of N-acetylglucosamine as the sole carbon source; represses BT_ and BT_ by direct binding to their translation initiation regions; activates BT_ (directly or indirectly)  nt B. thetaiotaomicron; partially conserved across Bacteroides spp.
Genes in B. thetaiotaomicron that are annotated as putative ribonucleases and highly conserved across the Bacteroidetes phylum.The list is derived from https://biocyc.org/.The substrate classes were inferred from Georg and Hess () and Mohanty and Kushner ().D.Ryan et al.: Bacteroides RNA biology