Structure and phase separation of the C-terminal domain of RNA polymerase II

: The repetitive heptads in the C-terminal domain (CTD) of RPB1, the largest subunit of RNA Polymerase II (Pol II), play a critical role in the regulation of Pol II-based transcription. Recent ﬁ ndings on the structure of the CTD in the pre-initiation complex determined by cryo-EM and the novel phase separation properties of key transcription components o ﬀ ers an expanded mechanistic interpretation of the spatiotemporal distribution of Pol II during transcription. Current experimental evidence further suggests an exquisite balance between CTD ’ s local structure and an array of multivalent interactions that drive phase separation of Pol II and thus shape its transcriptional activity.


Introduction
RNA Polymerase II (Pol II) executes the synthesis of DNA into RNA in eukaryotic cells.The activity of Pol II during transcription is intimately related with the post-translational modifications (PTMs) of the C-terminal domain (CTD) of its largest protein unit, the RNA polymerase subunit B1 (RPB1) (Figure 1A) (Eick and Geyer 2013;Zaborowska et al. 2016).In different organisms the length of the CTD varies from 52 repeats in humans to 26 repeats in Saccharomyces cerevisiae.The N-terminal half of the human CTD closely resembles the sequence from yeast CTD while the C-terminal half is less conserved.However, while the CTD is generally well conserved, the associated phase separation is affected by CTD length and composition (Lu et al. 2019), potentially due to changes in multivalent interactions (Boehning et al. 2018).Increasing evidence further suggests that the role of the CTD in transcription is determined by the well-stablished patterns of PTMs, coining the "CTD code" (Eick and Geyer 2013), in combination with the ability of this intrinsically disordered region to undergo phase separation and drive the clustering of Pol II.Here we will review the findings reported for the CTD during the last four years focusing on the structural properties of the CTD inside the pre-initiation complex and the contribution of CTD phase separation to eukaryotic gene transcription.

CTD sequence and PTMs
The CTD is a low complexity domain comprising multiple repetitions of the consensus heptad Y 1 S 2 P 3 T 4 S 5 P 6 S 7 (Figure 1B) (Eick and Geyer 2013;Zaborowska et al. 2016).The high proline content is linked to the presence of serine/ proline-motifs, which are a preferred substrate for cyclindependent kinases (CDKs) which phosphorylate serine or threonine residues.The CTD structure changes in response to phosphorylation or other PTMs that occur in the CTD (Figure 1C) (Zhang and Corden 1991).Pol II phosphorylated at Ser5 in the consensus CTD is more concentrated near the promoter, while phosphorylation at Ser2 is connected to elongation, and Ser7 in the consensus CTD repeat might be phosphorylated or glycosylated.Ser5 and Thr4 of the heptad repeat may also be glycosylated, while Lys7 or Arg7 in non-consensus repeats are sites for acetylation, methylation, and ubiquitination.These specific patterns of PTMs have led to the creation of the term "CTD code".
Mutagenesis studies on CTD demonstrated the importance of its amino acid composition (West and Corden 1995).More details of particular amino acids and positions in the heptad were demonstrated, recently.For instance, Tyr1, Ser2 and Thr4 are required for efficient termination and for Pol II pausing at the 5′ end of genes (Collin et al. 2019).Tyr1 is important for substrate recognition by CDK7 (Ramani et al. 2020) and phosphorylation of Ser2 by P-TEFb (Mayfield et al. 2019).Phosphorylation of Tyr1 by c-Abl also promotes synthesis of strand-specific, damage-responsive transcripts, which force the formation of double-stranded RNA and the recruitment of p53-binding protein 1 and Mediator of DNA damage checkpoint 1 to endogenous double strand breaks (Burger et al. 2019).The high density of serine/ proline-motifs in the CTD encodes yet another level of control by prolyl isomerases, such as Ess1 in S. cerevisiae and its human homolog Pin1.Isomerization is part of the "CTD-code", which also regulates the recruitment of proteins required for transcription and RNA processing (Namitz et al. 2021), or to overcome energetic barriers to metabolic activities by concentrating enzymes and substrates via CTD phase separation (Palumbo et al. 2022).

CTD structure
The CTD lacks stable secondary and tertiary structure.However, this does not exclude the presence of local structure, i.e. structural bias, determined by the CTD amino acid sequence in particular the consensus Y 1 S 2 P 3 T 4 S 5 P 6 S 7 repeats (Jasnovidova and Stefl 2013).To gain insight into potential structural biases in the CTD, studies initially concentrated on small CTD peptides, which were sometimes circularized and frequently treated in solvents that promote turns at low pH to maintain structure (Cagas and Corden 1995;Kumaki et al. 2001).The structural features of non-repeating sections of the Drosophila melanogaster CTD have also been examined (Gibbs et al. 2017;Lu et al. 2019;Portz et al. 2017).In addition, the structures of short CTD fragments in association with CTD-binding partners were characterized (Noble et al. 2005), including the mRNA anti-terminator protein hSCAF4-CID binding to S2/S5-phosphorylated CTD (Zhou et al. 2022), the prolyl isomerase Ess1 in complex with CTD peptide (Namitz et al. 2021), and a CTD peptide comprising five consensus repeats bound to CDK kinases (Ramani et al. 2020).
Investigations of the unmodified distal portion of the human CTD (the C-terminal 26 heptad repeats) showed that lysine residues at position 7, which depart from the canonical Y 1 S 2 P 3 T 4 S 5 P 6 S 7 sequence, participate in electrostatic interactions with aspartic and glutamic acids of the three proteins fused in sarcoma (FUS), Ewing sarcoma (EWS) and TATA-box binding associated factor (TAF15) of the FET-family, and mediate the interaction with TAF15 fibril assemblies (Burke et al. 2015;Janke et al. 2018;Murthy et al. 2021).These findings proposed a heterotypic mode of electrostatic interactions between charged proteins and the CTD.Recently, cryo-EM studies have revealed the structure of the human Mediator at near-atomic detail (Abdella et al. 2021;Aibara et al. 2021;Chen et al. 2022Chen et al. , 2021;;Rengachari et al. 2021), revealing that the preinitiation complex/Mediator complex forms a "Head-Middle sandwich" that holds two CTD segments close to CDK7 for phosphorylation (Abdella et al. 2021;Chen et al. 2022Chen et al. , 2021) ) (Figure 2).The two U-shaped CTD segments are located between the Middle's Knob and the Head's HB1.Although the Head and the Middle do not have direct interactions, they are connected by two CTD repeats.Tyr1 from one CTD repeat makes contacts with residues MED8-Pro83, Knob-Phe96, Knob-Cys93, MED6-Arg118 and MED17-Val111.A second Tyr1 residue of the CTD inserts into the hydrophobic pocket formed by the two Mediator residues MED6-Tyr106 and MED8-Pro106.These CTD-Mediator interactions are further reinforced by other contacts (Chen et al. 2021).

Phase separation in transcription
Proteins and nucleic acids can be organized into cellular compartments through the formation of biomolecular condensates (Rawat et al. 2021;Shao et al. 2022).While the exact mechanisms how transcription foci and other biomolecular condensates form are still a matter of debate, transient multivalent interactions between intrinsically disordered protein regions are believed to play a key role (Chong et al. 2018;Martin et al. 2020).Such multivalent interactions may cause liquid-liquid phase separation of proteins and nucleic acids into highly dynamic cellular compartments (Guo et al. 2019(Guo et al. , 2020)).For example, the disordered regions of transcription factors can phase separate together with the Mediator coactivator into liquidlike droplets in vitro (Boija et al. 2018).The CTD is also able to phase separate into liquid-like droplets in vitro with the aid of crowding agents (Boehning et al. 2018) (Figure 3A), in agreement with the formation dynamic Pol II clusters in vivo (Cho et al. 2018;Lu et al. 2019).However, the ability of the CTD to form liquid-like droplets by itself without crowding agents in near physiological conditions requires further studies.
The number of heptad repeats in the CTD of RPB1 impacts the propensity for liquid-liquid phase separation (Boehning et al. 2018;Lu et al. 2019) and the length also correlates with gene density (Quintero-Cadena et al. 2020).Besides the number of CTD repeats, sequence variations in the heptad repeats are important.For instance, the distal part of the human CTD enhances CTD binding to the RNA-binding protein FUS, mediated by the lysine residues located in the seventh position of the distal heptads (Murthy et al. 2021).The lysine residues in the distal part of the CTD enhance the network of multivalent interactions and thus increase CTD's capacity to phase separate and organize into biomolecular condensates (Burke et al. 2015, Janke et al. 2018, Murthy et al. 2021).This suggests that it would be valuable to examine the role of each heptad amino acid in forming the molecular CTD structure required for liquid-liquid phase separation and gather additional information on how this is influenced by PTMs in the CTD.
Post-translational modifications play a crucial role in the catalytic activity of Pol II and are dynamically affecting it.Phosphorylation of CTD is linked to the different stages of transcription, although RNA synthesis occurs without the CTD (Zehring et al. 1988).The presence of droplets formed by the hyperphosphorylated CTD suggests that liquid-liquid phase separation may have a broad impact on various stages of transcription (Lu et al. 2018) (Figure 3B-D).For instance, phosphorylation of serine 5 by the human TFIIH subcomplex containing the CDK7 kinase dissolved CTD droplets in vitro (Boehning et al. 2018).In addition, phosphorylation of the CTD modifies the molecular interactions between the CTD, the Mediator complex and serine/ arginine-rich domains triggering a shift in CTD partitioning from transcription to splicing condensates (Guo et al. 2019).Supporting an important role of liquid-liquid phase separation in transcription, Pol II and the Mediator complex colocalize in nuclear condensates (Cho et al. 2018;Guo et al. 2019).Pol II also partitions into the phase-separated super elongation complex through heterotypic phase separation (Guo et al. 2020), and in presence of the negative elongation factor (Rawat et al. 2021).
Besides transcription factors, the co-activator Mediator and RNA-binding proteins, increasing evidence shows that RNA can facilitate the formation of nuclear condensates containing Pol II and other components of the transcriptional machinery.With its high negative charge RNA binds to positively charged disordered protein regions and changes the physicochemical properties of its surroundings, which can influence CTD phase separation (Shao et al. 2022).The CTD of Pol II also forms coacervates in presence of intracisternal A-particle RNA (Asimi et al. 2022).RNA thus can play an important role in the condensation of Pol II, contributing to its regulation and function during gene transcription.

Concluding remarks
Our understanding of how transcription is regulated is constantly evolving with new evidence of the presence of transcriptional hubs in vitro and in cells.Liquid-liquid phase separation provides a mechanism for the spatiotemporal organization during transcription, with the CTD playing a crucial role in regulating the activity of RNA Pol II through post-translational modifications and migration between coacervates.In addition, growing interest focuses on exploring the underlying physical and chemical forces that determine the selectivity of Pol II localization to different condensates, and to uncover how these forces can be targeted to modulate specific transcriptional hubs at different stages of transcription.This may open up new avenues for developing strategies to regulate transcription with greater precision and specificity.

Figure 1 :
Figure 1: The CTD of RNA polymerase II.(A) Schematic representation of RNA Pol II and its low complexity C-terminal domain comprising heptad repeats.(B) Stereochemical representation of a YSPTSPS repeat of the CTD of RNA Pol II.(C) Possible phosphorylation sites in the consensus heptad repeat.

Figure 2 :
Figure 2: CTD structure bound to the PIC.(A) Two structured segments of the CTD in complex with the pre-initiation complex (PDB ID: 7ENC) determined by cryo-EM.(B) RNA Pol II (blue) and Mediator (cyan) complex, extracted from the full pre-initiation complex determined by Chen et al. (2021) (PDB ID: 7ENC).Some molecules are omitted for clarity.(C) Two segments of CTD local structure bound to the pre-initiation complex resolved by Abdella et al. (2021) (PDB ID: 7LBM).

Figure 3 :
Figure 3: CTD phase separation.(A) Differential interference contrast (DIC) and fluorescence microscopy revealing concentration-dependent formation of liquid droplets of MBP-hCTD in the presence of 16 % dextran.Reproduced from Boehning et al. (2018).(B) Transcription factors, Mediator and RNA Pol II can undergo LLPS creating "transcriptional hubs".The role of "transcriptional hubs" for transcription initiation (blue droplets) is currently being studied.(C) Site-specific phosphorylation of the CTD causes the transfer of RNA Pol II from initiation condensates to elongation condensates activating the synthesis of pre-mRNAs (green droplets).(D) The role of the CTD in RNA processing (splicing, capping, polyadenylation, etc.) is enigmatic.PTMs in the CTD are believed to regulate the transcriptional activity of RNA Pol II.