Structures and nucleic acid-binding preferences of the eukaryotic ARID domain

: The DNA-binding AT-rich interactive domain (ARID) exists in a wide range of proteins throughout eukaryotic kingdoms. ARID domain-containing proteins are involved in manifold biological processes, such as transcriptional regulation, cell cycle control and chromatin remodeling. Their individual domain composition allows for a sub-classification within higher mammals. ARID is categorized as binder of double-stranded AT-rich DNA, while recent work has suggested ARIDs as capable of binding other DNA motifs and also recognizing RNA. Despite a broad variability on the primary sequence level, ARIDs show a highly conserved fold, which consists of six α -helices and two loop regions. Interestingly, this minimal core domain is often found extended by helices at the N- and/or C-terminus with potential roles in target speci-ﬁ city and, subsequently function. While high-resolution structural information from various types of ARIDs has accumulated over two decades now, there is limited access to ARID-DNA complex structures. We thus ﬁ nd ourselves left at the beginning of understanding ARID domain target speci ﬁ cities and the role of accompanying domains. Here, we systematically summarize ARID domain conservation and compare the various types with a focus on their structural differences and DNA-binding preferences, including the context of multiple other motifs within ARID domain containing proteins.


Introduction
DNA-binding proteins (DBPs) control a plethora of biological processes, such as transcriptional regulation and DNA replication (Luscombe et al. 2000). Typically, DBPs comprise one or more DNA-binding domains (DBDs) that recognize either single-stranded (ss) or double-stranded (ds) DNA. The structured DBDs are often interspersed by intrinsically disordered regions (IDRs) that are involved in protein-protein interactions (PPIs) (Liu et al. 2006;Minezaki et al. 2006) or modulate the transition from non-specific to specific DNA-binding (Doucleff and Clore 2008). Target DNA specificities of DBPs cover a broad spectrum. Non-specific DNA polymerases or histones do not discriminate DNAs on a sequence level. In turn, highly specific transcription factors (TFs) precisely recognize short, conserved motifs within promoters of their target genes, where specificity is mainly driven by a combination of DNA sequence and shape, i.e. groove width and bending (Slattery et al. 2014).
One of the most common structured DNA-binding elements is the evolutionary highly conserved helix-turnhelix (HTH) motif, present in a large number of repressor and activator proteins [e.g. summarized in (Aravind et al. 2005)]. With its minimalistic arrangement of only two α-helices connected by a turn, the HTH is a widespread component of DBPs (Gribskov and Burgess 1986;Littlefield et al. 1999;Ogata et al. 1996). While similar in the general composition, vastly differing structural contexts allow the HTH to act as a versatile element (Aravind et al. 2005).
Among the HTH-comprising DBDs, the AT-rich interactive domain (ARID) is one of the most ancient representatives, and found across eukaryotic kingdoms. ARIDs had their first mention in the mid-1990s, whenin search for AT-rich DNA-binderstwo groups discovered the murine B cell-specific trans-activator of IgH transcription (Bright) and Dead ringer protein (Dri) of Drosophila melanogaster, respectively (Gregory et al. 1996;Herrscher et al. 1995). Identification of the two responsible DBDs revealed a highly conserved domain, introduced as 'ARID', reflecting its suggested binding preference. Meanwhile, a large number of ARID containing proteinsin higher mammals often termed Aridshas been described, involved in biological processes like cell cycle regulation or chromatin remodeling [summarized in (Patsialou et al. 2005;Wilsker et al. 2002)]. Accordingly, Arid proteins predominantly locate to the nucleus, and for some, increased cytoplasmic protein levels have been correlated with cancer (Animireddy et al. 2021) (Table 1).
Structural information of ARID domains in complex with target DNAs is rare. Hence, a detailed understanding Table : Overview of the available structural information of ARID domains in the  human Arid proteins within seven sub-families and six additional eukaryotic, non-human Arids from S. cerevisiae (S.c.), A. thaliana (A.t.), Mus musculus (M.m.) and D. melanogaster (D.m.).

Subfamily
Protein (  of how Arid proteinsand more specifically the ARID domainsbind DNA is still missing, also based on a very limited knowledge about the preferred target sequences for individual ARIDs. While their name implies a clear preference for AT-rich sequences, several exceptions describe binding of either non-sequence specific dsDNA (Gong et al. 2021) or a preference for GC-rich sequences (Tu et al. 2008) (Table 2). Increasing evidence emphasizes   the pivotal role of Arids in a large number of human diseases, especially cancer. While some members of the Arid family actively function as tumor suppressors (Bluemn et al. 2019;Chang et al. 2019;Luchini et al. 2015), overexpression of others provokes opposing effects in tumor growth [reviewed in (Lin et al. 2014)]. This underlines the high clinical relevance of Arid proteins and creates a need for the comprehensive and precise understanding of the versatile Arid protein functions, in particular DNA-binding. In this review we summarize available data on the ARID domains found within individual protein subfamilies. After comparing sequence and fold conservation, we provide a detailed overview of the current data on DNA sequence specificities for ARID containing proteins, including the few complex structures with DNA. This overview mainly concentrates on the human Arid proteins, yet we include the available structural information of Arids from Saccharomyces cerevisiae (Swi1), D. melanogaster (Dri and Kdm5) and Arabidopsis thaliana (Arid4) for comparison (Table 1, Figure 1).

Classes of Arid proteins
In human cells, to date a total of 15 ARID containing proteins have been described, divided in seven sub-families (Table 1). Most human Arid proteins were discovered as key players in epigenetic regulation, either by direct interaction with chromatin structures, e.g. represented by the Jarid1 sub-family , or as part of chromatin remodeling complexes as shown for Arid1a/b or Arid2 (Euskirchen et al. 2012). The underlying mechanisms are often incompletely understood and typically involve multiple co-factors.
The classification of Arid proteins is based on their domain context (with exception of Arid5, see below) and co-occurrence of specific domains is a significant feature of individual sub-families ( Figure 1). A clear correlation of ARID sequence conservation and the corresponding domain architecture suggests co-evolution of domains (Sandhya et al. 2018) (Figure 2). In many cases, ARIDs appear together with further DBDs, chromatin reader domains or motifs/domains for PPIs. This indicates combinatorial interactions with DNA or other proteins, e.g. subunits of chromatin remodeling complexes, as recently shown for ARID-PHD and for ARID-HMG cassettes, both found in plants (Hansen et al. 2008;Tan et al. 2020).
Arid1a and Arid1b are distinct subunits of the SWI/SNF chromatin remodeling complex (Euskirchen et al. 2012). Both members of the Arid1 sub-family are large multidomain proteins harboring at least two folded domains. Besides a centrally located ARID they comprise the C-terminal BAF250_C, including an Armadillo (ARM) repeat (Sandhya et al. 2018) suggested to be involved in PPIs within SWI/SNF (Zinzalla 2016). Interestingly, the ortholog protein Swi1 in S. cerevisiae suggests a similar domain setup with a still unexplored domain predicted at its C-terminus.
Arid2 is a specific subunit of the chromatin remodeling complex PBAF that otherwise shares many elements with the SWI/SNF complex (Wang et al. 2004;Yan et al. 2005). Arid2 harbors an N-terminally located ARID, an RFX-like DBD defined by a winged-helix type motif (Gajiwala et al. 2000) and a twin zinc finger (tZnF) located C-terminally (Hatayama and Aruga 2010). As part of the PBAF complex, Arid2 plays an important role in the regulation of cell differentiation (Xu et al. 2012).
The sub-family of Arid3 counts three paralogs -Arid3a, b and cthat feature a characteristic combination of ARID and REKLES. The REKLES domain is required for selfassociation and also harbors a nuclear export signal (NES) allowing Arid3 members to shuttle between nucleus and cytoplasm (Kim et al. 2007;Kim and Tucker 2006) (Figure 1, Table 1). Murine Arid3aalso known as Brighttogether with the D. melanogaster ortholog Dri are founding members of the large family of ARID containing proteins (Gregory et al. 1996;Herrscher et al. 1995).
Arid4a and Arid4b both are part of the mSin3/HDAC1 complex involved in epigenetic regulation and gene repression. In addition, they function as co-regulators of transcriptional activation mediated by the androgen receptor (Lai et al. 2001;Wu et al. 2013). Three of the five domains in Arid4 belong to the 'Royal family domains', which are potentially able to bind to methylated histone tails: Tudor, PWWP, and chromo barrel. Yet in case of Arid4a, only the latter is able to recognize these epigenetically relevant marks (Gong et al. 2012). While Tudor has been demonstrated to possess dsDNA-binding activity (Gong et al. 2014), PWWP domains in general are suggested to play a role in PPIs (Stec et al. 2000). The Arid5 sub-family comprises two members -Arid5a and Arid5bthat share high sequence similarity exclusively within the ARID domain (Figures 2 and 3A) in an otherwise largely unstructured context. It is noteworthy that both proteins differ significantly in size, which is unique for Arids of one sub-family. Structure prediction within the N-terminus of the significantly longer Arid5b by AlphaFold (https://alphafold.ebi.ac.uk) (Jumper et al. 2021;Tunyasuvunakool et al. 2021) indicates the presence of an additional BAH domain; a motif that often occurs in chromatin-interacting proteins (Yang and Xu 2013). Arid5a was recently described as RNA-binding protein (RBP) that can shuttle into the cytoplasm and stabilize mRNAs (Masuda et al. 2013;Nyati et al. 2019). Thus, for the Arid5 sub-family we currently face highly divergent functions of its two members (see last chapter of this article).
Members of the Jarid1 sub-family ( Jarid1a-d) are histone demethylases and thus, transcriptional repressors. They contain a Jumonji C and N (JmjC, JmjN) domain, required for removing H3K4 methyl marks, which is e.g. excellently summarized in . A comparable domain architecture is found in the D. melanogaster homolog Kdm5 ). Current literature controversially discusses, whether the ARID domain in Jarids is essential for demethylase activity (Horton et al. 2016;Johansson et al. 2016;Tu et al. 2008;Xiang et al. 2007).
The Jarid2 protein is a further member of the large Jumonji histone demethylase family (Franci et al. 2014), yet it is catalytically inactive due to sequence alterations (Landeira and Fisher 2011). Still, it fulfills essential gene regulatory functions by interacting with the polycomb repressive complex 2 (PRC2) (Li et al. 2010;Peng et al. 2009). (Li et al. 2010) claim its ARID domain supports the C-terminal zinc finger in directing PRC2 to target sequences.

Phylogenetic correlation between Arid proteins
For Arid proteins, the highest homologies of full-length sequences are found between the members of one subfamily ( Figure 2A). This correlates with a highly similar overall domain architecture ( Figure 1) and is reflected e.g. in 52-86% identity between the four Jarid1 proteins and around 50% between individual members of the subfamilies Arid1 to Arid5, respectively. In contrast, homologies for proteins across sub-families are low, with identities of only 10-20% (e.g. 12% identity between Arid1a and Arid4a), which altogether suggests an early branching of Arid proteins during evolution. In line with that, sequence comparison of the respective ARID domains alone reveals identities of 60-89% between sub-family members ( Figure 2B). Interestingly, the similarity of ARID domains across species is remarkably high for Arids with a similar overall domain organization, e.g. when comparing the ARID sequences of human Arid3 members (a to c) with the Drosophila ortholog Dri (Figures 2B and 3A). Their ARID sequences are more than 78% identical, and also the cooccurring REKLES domain is shared among human and fly Arid3 proteins (Kim et al. 2007).

Conservation and variation of ARID domains
The ARID domain is in average 100 amino acids long and shows a broad sequence variety. Yet, there are 13 highly conserved residues, five out of which are identical among all domains ( Figure 3A). To our knowledge, 23 ARID structures have been released in the Protein Data Bank (PDB, https://www.rcsb.org, as of November 1st, 2021)  within the last two decades. Alignment of their secondary structure elements shows a conserved arrangement of six helices, H1-H6, which together form the minimal ARID core domain. The ARID core can be extended by either N-or C-terminal helices H0 and H7, respectively, referred to as eARID (Kortschak et al. 2000). Representatives of eARID containing proteins are the Arid3 sub-family members ( Figure 3B).
Interestingly, ARID sequence homology is not directly correlated with the degree of ARID fold conservation ( Figure 4A and B). Z-scoresderived from ARID structure comparison using the DALI server (Holm 2020)allow to pairwise examine the 3D convergence of domains ( Figure 4B). For example, ARID of Arid1b shares 78% sequence identity with Arid1a ARID, andwith a z-score of 11.2their structures are very similar (r.m.s.d 3 Å). A similar structural homology is found between ARIDs of Arid1b and Jarid1c (z-score 10.7; r.m.s.d. 3.05 Å), despite only 24% sequence homology. Furthermore, z-score comparison impressively shows the large deviation of the yeast Swi1 ARID from all other ARID structures ( Figure 4B), while this difference is remarkably less pronounced on the sequence level ( Figure 4A).
Between helices H1 and H2, all ARIDs contain an extended loop L1 ( Figure 3B). A second loop L2 is located between H3/H4 and H5, which together form the HTH motif responsible for interaction with dsDNA. In comparison to the canonical HTH motif, ARIDs comprise an atypical HTH For the latter, their total pI is indicated by a color code explained below. Solid lines cover the sequence range found in a corresponding PDB entry (method given). Core motif helices H1 to H6 are depicted as yellow, extending helices H0 and H7 as light-yellow cylinders, respectively. Significant amino acid insertions or deletions relative to the conserved sequence (see panel A) are indicated by broken lines or dots, respectively.
with an unusually long "turn" L2 (Iwahara and Clubb 1999) ( Figure 4C). The HTH motif is stabilized by helix H2, which is important for the compact globular fold (Huffman and Brennan 2002). In analogy to other HTHs, H5 represents the so-called DNA recognition helix, which mediates contacts with the DNA major groove (Brennan and Matthews 1989;Iwahara and Clubb 1999).
Comparative mapping of ARID residues with the highest versus those with the lowest conservation reveals an evident feature: while non-conserved residues are predominantly located on the surface of the structure, highly conserved residues show a side chain orientation towards the hydrophobic core of the domain, formed between helices H2 and H5 ( Figure 4D and E). (Chandler et al. 2013) suggested the preservation of this hydrophobic pocket to be an important aspect of ARID functionality. In their work, a centrally located valine (V1067) within ARID of Arid1a was found essential for DNA-binding. Recent work indicates that the hydrophobicity of valine at this position significantly supports thermal stability and its mutation to glycine (V1067G) enhances unfolding of the ARID domain (Sandhya et al. 2018).
Interestingly, residues directly within the DNA-binding interface of the HTH motif are little conserved across Arids in general. Yet, they show high conservation among members of each sub-family ( Figure 3A). Similarly, leastconserved residues among ARIDs show high conservation within individual sub-families ( Figure 3A). This is in accordance with the suggested co-evolution of domains in Arids. Sandhya et al. (2018) claim certain surface exposed ARID residues are conserved only within proteins of an Arid1-like domain architecture, arguing for a co-evolution with the BAF250_C domain. As observed early for other DBDs (D'Elia et al. 2001), evolution might have impacted the tertiary fold of the ARID domain, rather than its primary sequence.  Table 1 are shown as spheres and conservation plotted on the Arid1b ARID domain (2EH9). Conservation plots have been performed with the ConSurf server (Ashkenazy et al. 2016). (E) The HTH motif from panel C as embedded within the ARID structure. Sidechains of the most conserved residues within the hydrophobic core (see D) are shown as sticks.

Different flavors of ARID domains
On a structural level, the helices H1-H6 together with the embedded loops L1 and L2 ( Figure 5A) form the minimal core domain in all ARIDs ( Figure 5B), which can e.g. be found in Jarid1d ( Figure 5C). Most ARID structures reveal deviations from this prototypic organization, which appear either in their N-and C-termini or within the loop regions and can occur alone or in combination ( Figure 5D-H). In this context, a prevalent feature found in more than 60% of ARIDs is a β-hairpin in L1, that has first been described for Dri (Iwahara and Clubb 1999). The two short antiparallel β-strands significantly reduce flexibility of loop L1, which is involved in DNA contacts (Cai et al. 2007;Iwahara and Clubb 1999). The eARID includes additional helices at the N-(H0), the C-(H7), or both termini ( Figure 5D-F). H7 allows additional contacts with the target DNA, thereby enlarging the binding surface (Iwahara and Clubb 1999).
H0 was described to fold back onto the core domain, and very recently for Arid4a, to auto-inhibit through mimicking DNA and to play a key role in specific chromatin interaction (Gong et al. 2021;Liu et al. 2010). Strikingly, the termini flanking the ARID core domains differ immensely in their net charge. E.g., a highly acidic N-terminal stretch is common to all ARIDs with an additional N-terminal H0 (with exception of Jarid1c) ( Figure 3B). This hints to similarly important roles for these extending regions as described for Arid4a. Further ARID variations are manifested in significantly altered loop regions. L1 and L2 of the A. thaliana Arid4 ARID domain are prolonged, while L2 of S. cerevisiae Swi1 is shorter compared to the typical core domains.
Of importance, most of the currently available ARID structures are derived from isolated domains or found within tandems (like ARID-PHD or PWWP-ARID) (Table 1). Yet, high resolution crystal structures of the aminoterminal half of Jarid1a impressively show the integration of ARID into a multidomain context (Labadie et al. 2016;Liang et al. 2017;Vinogradova et al. 2016). In 2016, Vinogradova and colleagues reported the isolated ARID structure to be well superimposable with the domain in its Jarid1a embedment with only minor differences in loop conformations and an r.m.s.d value of 1.6 Å (Vinogradova et al. 2016). This supports the autonomy and stability of the ARID fold (Labadie et al. 2016;Vinogradova et al. 2016). However, structural data for ARIDs within an extended protein or even multi-protein complex is rare and, astonishingly, ARIDs are unresolved in available cryo-EM and crystal structures of yeast and human chromatin remodeling complexes, respectively (Grau et al. 2021;Han et al. 2020;He et al. 2020). In fact, Reyes, Marcum and He very recently suggested this observation to be based on a high conformational flexibility, even upon DNA-binding (Reyes et al. 2021).

Structures of ARID domains bound to DNA
The DBD character of ARIDs was identified in structural studies, which revealed a modified HTH motif ( Figure 4C) (Cai et al. 2007;Iwahara et al. 2002;Kim et al. 2004;Zhu et al. 2001). Only three experimental high-resolution structures of ARIDs in complex with DNA exist to date (Cai et al. 2007;Iwahara et al. 2002;Kim et al. 2004;Zhu et al. 2001), which comprise an ARID core, an eARID (both determined by NMR) and a crystal structure of a plant ARID domain in concert with a PHD motif ( Figure 6). In all three of them, we find very similar or identical interfaces of ARIDs with dsDNA as visible after a structural alignment of both protein and DNA components, and a renumbering of respective sequences for convenient comparison ( Figure 6A-D). While the three complexes reveal different extents of nucleotide-specific interactions, those are consistently formed between the DNA major groove AT base pairs and protein sidechains in L2 and H5, which are part of the HTH motif ( Figure 6A). Additional non-specific interactions are found between the ribose-phosphate backbone of the adjacent minor groove and ARID residues in L1 and (optionally) the C-terminal region (Iwahara et al. 2002;Zhu et al. 2001) (Figure 6B-D). Within Dri, two additional specific contacts are mediated by an arginine within helix H7, a unique feature of eARID domains. This extension of the DNA-binding interface is suggested to result in higher affinities for its target DNA compared to core ARID domains (Table 2) (Iwahara and Clubb 1999) and interestingly involves a GC-TA tandem base pair ( Figure 6C). Within the Arid4 ARID-PHD complex structure, one additional non-specific contact from the PHD domain contributes to DNA-binding ( Figure 6D), demonstrating possible combinatorial interactions of ARID along with co-occurring domains and likely not the single case.
In each of the three ARID-DNA complex structures, specific contacts are formed with at least one central AT base pair. This contact is centered around the crucial ARID residue Thr71 (numbering with respect to normalized sequences as shown in Figure 6A), which is located at the transition from L2 to H5 and forms a hydrogen bond with either Ade or Thy ( Figure 6B-D). In 2007, Cai and colleagues solved the NMR structure of human Arid5b in complex with an AT-rich dsDNA supported by paramagnetic relaxation enhancement caused by spinlabeled DNA (Cai et al. 2007). This elegant approach revealed an ensemble of complexes where Thr71 can specifically interact with one of two neighboring Thy bases of an AT-tandem base pair ( Figure 6E), which at this time underlined the preference of the ARID core for AT-DNAs.
AT or not AT: sequence preferences and affinities across Arids (Tu et al. 2008) recently emphasized the correlation of threonine at position 71 within ARIDs (numbering with respect to normalized sequences as shown in Figure 6A) with a preference for AT-rich sequences. ARID domains that share a conserved lysine at the equivalent position were found to target GC-rich sequences, whereas a serine is correlated with non-specific DNA-binding ( Figure 6F). However, we have to note that those findings are to date not corroborated by experimental structural data, which uniquely would allow to derive atom-resolved correlations of specificity between ARID sidechains and multiple non-AT base pairs.
In general, the available data on target sequences reveals our limited in-depth knowledge about preferences, specificities and determinants of ARID-DNA interactions. As shown in Table 2, a large fraction of available data refers to SELEX-or ChIP (chromatin immunoprecipitation)-based approaches. Here, the enrichment of DNAs specifically bound by Arid proteins has been carried out with fulllength proteins or protein fragments. As a result, it is difficult to draw conclusions on DNA-recognition that are transferable to the isolated ARID domain. In 2005, Moren and colleagues performed a systematic analysis of DNA-binding across all human Arids and found no obvious sequence specificity in five out of the seven sub-families (Patsialou et al. 2005). However, their data derive from a mere qualitative approach using pulldown experiments and do not yield a quantitative measure for specificity. It also ignores DNA length, which has a significant influence on stoichiometry and thus, affinities (Maulik et al. 2019). In contrast, two groups have suggested a GC preference within the Jarid1 sub-family (Scibetta et al. 2007;Tu et al. 2008). In other studies, the highly divergent methodological setups have led to experimentally observed K D values over more than the orders of magnitude, i.e. low nanomolar to micromolar ( Table 2).
The latter is best exemplified by the controversial findings for the ARID domain of Arid1a (Table 2), where a large experimental variance in both DNA lengths and respective ARID domain boundaries complicate data interpretation. The ARID domains of Arid1a/b and Arid2 have been described to bind DNA non-specifically and with inconsistent affinities (Kim et al. 2004;Patsialou et al. 2005;Wang et al. 2004). Their individual incorporation as subunits into SWI/SNF or PBAF complexes resulted in distinct roles in cell cycle control (Nagl et al. 2007). This hints towards an intrinsic target DNA-specificity of each Arid, emerging only in combination with further domains. Possibly, ARID domains are only able to exhibit their subtle preferences for certain DNAs when properly positioned with the help of additional SWI/SNF components. Important Arid-mediated PPIs are also suggested in the growing number of cryo-EM structures of nucleosomes (Han et al. 2020;He et al. 2020;Kasinath et al. 2021).
Of note, a number of studies have taken into account the potential role of molecular dynamics in DNA-recognition by ARIDs (Iwahara et al. 2002;Kim et al. 2004;Kusunoki et al. 2009;Maulik et al. 2019) using solution NMR spectroscopy. Overcoming failure to co-crystallize, the unique and valuable information of chemical shifts has allowed to model or experimentally determine complexes of ARIDs with their suggested DNA motifs (Cai et al. 2007;Maulik et al. 2019), and NMR has further helped in identifying the role of flexibility in loops within ARIDs (Kim et al. 2004).
Molecular dynamics (MD) are also possible to target with computational simulations. For Arid3a, (Invernizzi et al. 2014) used MD to postulate specific "hub" residues in the ARID domain involved in DNA-binding. They showed that Tyr119 plays a key role in long-range communication to conserved loop residues in the DNA interface, which allowed speculating about coupling DNA-binding to intramolecular communication.

Why only DNA: RNA-binding by Arid5a
As a unique occurrence among Arids, in 2013 Kishimoto and colleagues described Arid5a as a novel RBD (Masuda et al. 2013). In this and a number of later studies the protein was found to stabilize mRNAs by specifically binding to short 3′-untranslated region stem-loops with no particular consensus sequence (Hanieh et al. 2018;Masuda et al. 2016;Zaman et al. 2016). Those publications suggested that Arid5a blocks binding of the immune-suppressive RBPs Regnase and Roquin, both known to tightly engage with identical stem-loops or overlapping regions in mRNAs (Janowski et al. 2016;Jeltsch et al. 2014;Mino et al. 2015;Schlundt et al. 2014). Thus, Arid5a was categorized as proinflammatory RBP [e.g. reviewed in (Nyati et al. 2020)], with evidences for an ARID-mediated interaction with particular RNAs (Hanieh et al. 2018;Masuda et al. 2013Masuda et al. , 2016. Of note, to our knowledge there is no study that provides binding data of the isolated Arid5a ARID domain to one of the designated RNA elements. As of now, it remains to clearly disentangle the (simultaneous) DBP-from RBP functions of Arid5a, especially in the upregulation of genes during immune responses, as e.g. for IL-6 (Ikeuchi et al. 2021;Masuda et al. 2013). However, we have to speculate about the general participation of Arids in dual-functioning as DNA/ RNA-binding proteins, which occur increasingly abundant at the chromatin-RNA interface (Hudson and Ortlund 2014;Huo et al. 2020;Xiao et al. 2019).

Conclusion and future perspectives
As summarized in this and other overview articles, Arids fulfill cellular functions centered around gene regulation including potential roles in tumorigenesis (Kortschak et al. 2000;Lin et al. 2014;Patsialou et al. 2005;Wilsker et al. 2002). We need to revalidate ARIDs with respect to their original definition as pure AT-rich dsDNA binding domains, and rather conceive their core shape as bona fide dsDNA-binding domain (D'Elia et al. 2001;Gregory et al. 1996;Wilsker et al. 2005). In this, we also suggest to consider the role of DNA tertiary structure, e.g. its bending, looping, supercoiling in nucleosomes and temporary triplex structures (Neidle 2021;Wolberger 2021). Certain DNA stretches might experience an unexpected preference given by a pre-organized geometry that would support an increased on-rate in ARID binding.
A systematic biochemical and structural analysis of ARID-DNA complexes will improve our understanding of target preference and predictability. Perspectively, it allows engineering of ARIDs against DNA sequences of choice as shown for other NA-binding domains (Chen and Varani 2013;Inamoto et al. 2021;Sera and Uranga 2002;Zhou et al. 2021). A combination of SELEX and ChIP-seq will relate high-affine motifs to natural sequence contexts (Kharchenko et al. 2008;Narlikar and Jothi 2012). Also, this approach will disentangle contributions of ARIDs to DNA targets in MD-DBPs.
Much light will be shed on the atomic basis of specific ARID-DNA interactions via additional high-resolution complex structures. In addition, the systematic analysis of permutated DNA motifs against an ARID, e.g. by NMR-based principal component analysis (Collins et al. 2015) will explain subtle differences in affinities between motifs. Additional sources of structure-providing methods are to be exploited, including homology-based modelling, docking and MD simulations (Koukos and Bonvin 2020;Trnka et al. 2019). E.g. solution-oriented integrated structural biology (Rout and Sali 2019) was recently applied to a bacterial promoter DNA-protein complex (Schlundt et al. 2017).
Consequently, future work will explore larger DNA-Arid complexes that to date do not provide an unambiguous picture of the ARID domain, e.g. in the recently determined cryo-EM structures of nucleosomes (Han et al. 2020;He et al. 2020;Kasinath et al. 2021). Also, for most Arid proteins we still lack the meaning of the extended unstructured, but highly-conserved regions (Arid5a). In particular for TFs, the recent years have revolutionized our view on IDRs as potent contributors to specific DNA interactions (Brodsky et al. 2020;Guo et al. 2012;Vuzman and Levy 2012). In Arids, such IDRs might as well act in concerted action with the ARID domain.
Finally, systematic research is needed to understand mammalian ARIDs in their full capacity including potentially hidden capabilities of protein interaction domains as known from plants (Tan et al. 2020;Zhu et al. 2008). Altogether, despite their relatively early discovery, ARID domains have left us with many unknowns and atomresolving technology will remarkably help to fill the many gaps in understanding functions and mechanisms of Arids mediated by their ARID domain.
Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: This work was funded by Johanna Quandt Young Academy at Goethe (2019/AS01) and Deutsche Forschungsgemeinschaft (SCHL2062/2-1, SFB902-3/B16). Conflict of interest statement: The authors declare no conflicts of interest regarding this article.