A Social Network Analysis of the Oceanographic Community: A Fragmented Digital Community of Practice

: What does a digital Social Network Analysis revealabout onlineoceanographic communities onTwitter? We examine the structure of a digital community of practice of oceanographers and ocean-related stakeholders on Twitter using a Social Network Analysis (SNA) approach to understand digital aspects of information production and information flow in oceanography, mapping the social ties between members of a community of practice concerned with the study of the oceans. We carried out the SNA using Docteur Tweety TwExList for data collection, and Gephi to visualize scraped data, and found that although the oceanographic community on Twitter is an active vibrant community, fragmentation between sub-communities exist. Further qualitative sampling revealed where these fragmentations occur between individual researchers, institutions, funding bodies, government agencies, and news outlets as a result of practice, time zones, and geography. The findings also revealed which groups are utilizing Twitter consistently, and which accounts have the potential to connect isolated groups. We recommend that if training were available to assist ocean scientists in understanding the affordances of Twitter, it would be possible to utilise it for better collaboration, community integration, and more effective public outreach.


Introduction
What does a digital Social Network Analysis reveal about online oceanographic communities on Twitter? We examine the structure of a digital community of practice of oceanographers and ocean-related stakeholders on Twitter using a Social Network Analysis (SNA) approach to understand digital aspects of information production and information flow in oceanography, mapping the social ties between members of a community of practice concerned with the study of the oceans.
We carried out an SNA on 470 accounts and their followers using Docteur Tweety TwExList 1 for data collection, and Gephi 2 to visualize scraped data and calculate modularity class (mClass) and a variety of centrality measures including Betweenness centrality rankings (BCR), Closeness Centrality (CCR), Harmonic Closeness Centrality (HCCR), and in-degree and out-degree centrality (IDR; ODR), and found that although the oceanographic community on Twitter is an active, vibrant community, fragmentation between sub-communities exist. It is important to note that the relationships examined in this network are based on the technical affordances of the platform examined (Twitter) and the software used to interpret them, and do not fully capture the complexity of human relationships, nor do they an extensive coverage of all possible social connections. Furthermore, qualitative sampling of the data revealed where these fragmentations occur between individual researchers, institutions, funding bodies, government agencies, and news outlets. The findings also revealed which groups are utilizing Twitter consistently, and which accounts have the potential to connect isolated groups.
We recommend that if training were available to assist ocean scientists in understanding the affordances of Twitter, it would be possible to utilize it for better collaboration, community integration, and more effective public outreach. Critical global changes on both the environmental and political fronts relating to oceanographic research and policy have been headline news in an alarming way over the past years: the depletion of the great barrier reef (Climate Hot Map 2017; Knaus and Evershed 2017;Normile 2017), and a trend of new highest temperature records each year since 2014 (Dean 2017;Hancock et al. 2017;National Centers For Environmental Information 2018;The Weather Company 2018); as well as the U.S. withdrawal from the Paris Accord in the aftermath of the 2016 presidential elections (Bailey 2017;Zhang et al. 2017). Simultaneously, media, and in particular social media, have also moved environmental and climate change debate beyond boardrooms, research institutions, and governmental offices (Kunelius et al. 2016). In some cases, social media has had positive effects in encouraging climate change knowledge and exchange (Anderson 2017), while at other times negative impacts (William and Gregory 2014).
The use of Twitter in particular, as a channel of communication, has amplified rapidly over the past few years, especially for the study of communities of practice 3 (Kane 2017;Komorowski, Huu, and Deligiannis 2018;Wenger 2008), and for academic use (Carrigan 2016;Mahrt, Weller, and Peters 2014;Van Noorden 2014;Veletsianos 2016;Veletsianos and Kimmons 2013). This research studies information practices in oceanography, as a set of communities of practice, by applying a digital SNA approach to examine ties between community members. SNA has proven useful for qualitative research of communities (McKenna, Myers, and Newman 2017;Whelan et al. 2016) and for overall studies of the information environment (Otte and Rousseau 2002).
Some studies have examined other communities of practice online using SNA (e.g. Abdelsadek et al. 2018;Grandjean 2016;Komorowskia, Huu, and Deligiannis 2018). Previous studies also investigated information aspects in oceanography; however, these have focussed on exploring content rather than ties (e.g. Otero et al., 2014). As Lee, Van Dyke, and Cummins argue, "[s]cience communication research has consistently emphasized problems with traditional science communication models" (2018,275). This includes Sahu (2015), who investigated oceanography from an information service perspective by looking at "Scientometrics" indicators; Belter's (2013Belter's ( , 2014 oceanographic information research on data curation and bibliometric research; Lee, Van Dyke and Cummins' (2018) study on whether or not NOAA 4 utilizes social media effectively; Obioha's (2005) research on the impact of ICTs for the Nigerian Institute for Oceanography and Marine Research, and Hesse et al.'s (1993) study of Information and Communication Technologies (ICTs) by investigating email use for collaborative oceanography using network analysis in the form of surveys.
We have utilized SNA to address this gap in the shortage of research on understanding information flow online as opposed to content circulation for oceanography by re-examining information practices via Twitter, a current major communication platform. Social Network Analysis (SNA) is a socio-centric approach that examines a large group of people in a digital network. The aim is to explore where power and resources are, or can be, concentrated and which information channels can or are a focal point within the community. We utilize SNA as "a broad strategy for investigating social structures" (Otte and Roussea, 2002, 441) using Chung, Hossain, and Davis' definition of a social network: "a set of actors and relations that hold these actors together." (2005,1). Furthermore, actors "can be individuals or aggregate units such as departments, organizations. Actors form social networks by exchanging one or many resources with each other. Such resources can be information, goods, services, social support or financial support" (Chung et al. 2005, 1). This is important because it expounds various roles that take part in shaping information production, curation, and sharing between diverse entities within the community, beyond the individual scientist.
There are manifest distinctions between mapping a community network using traditional methods in a physical environment (e.g. Hesse et al. 1993) and community mapping in an online digital space (Gulbrandsen and Just 2011;Rey and Jurgenson 2013). We treat this dichotomy between the digital space and the natural world as an extension to one another rather than a means of separating realities. Therefore, ICTs, which include social media (Kane 2017), are not treated as alternatives to traditional communication, but as supplemental resources that extend knowledge about a phenomenon given their similar affordances (Kane 2017). In fact, some studies (Meyer et al. 2011;Veletsianos 2016) have 3 The concept of Communities of Practice was developed by Wenger (2008) and predates digital Social Network Analysis but has been used within SNA in recent years. According to Nicolini (2012, 2) "[s]tarting with the 1970s, practice-oriented approaches have become increasingly influential and applied to the analysis of phenomena as different as science, policy making, language, culture, consumption, and learning." established that ICTs indeed extend the amount of information we have access to, rather than change how research is done.
Twitter is useful for our specific purpose because the connections between network nodes are more uniform, forming clear directional connections-or edges ). All users have the same type of account (regardless of how an account is branded) and can follow, or are followed by another user. Sometimes a user can follow and be followed by another account mutually, forming a reciprocal connection represented as an undirected relationship. It is worth pointing out here that this representation, as undirected relationships, is a limitation of software, which does not fully capture the nature of directed overlapping relationships (i.e. follow and followed simultaneously). This is not to mention the various other types of connections 6 that users can have with various fragments of content. Articulated relations on Twitter can thus be "used to structure the flow of communication and to filter information" (Schmidt cited in Weller et al. 2014, 4).
Relationships on Twitter, unlike other platforms such as LinkedIn for example, do not require reciprocity . This is useful for studying oceanography as a community to distinguish between actual collaborators and popular key figures within the community. Once these ties are established, relationships between users can be examined against case studies that investigate information practices utilizing traditional data collection methods such as interviews (e.g. multi-organization project members and their partners). The results from such analysis reveal where more integration can stand to benefit enhancing collaboration for multidisciplinary scientific studies.
For this research, Twitter is fitting because of its democratizing attributes for facilitating information sharing, as well as accessibility to its data. Its affordances are similar to those enabled by apps designed for smart device notifications and alerts that feed its users with short concise information (Morstatter et al. 2013;Weller et al. 2014). These factors, taking into account the large number of community members active on Twitter, make it a preferable choice along with the simplicity of accessing its data. The data is also unique in that it provides straightforward direct (but also indirect) connections through userfollower information. Mapping a network through Twitter reveals the types of relationships that exist between nodes. In this SNA, this is done through visualization to identify weak and strong ties, important influencers, latent relationships, and unexpected or potential connections that would otherwise go undetected.
We have not been able to find any prior research which examines the oceanographic community on Twitter.
Considering the substantial usage of Twitter by a large number of oceanographic community members, and the global interest in many oceanographic subject areas including climate change, this research provides both a means of understanding multidisciplinary information practices online as well as how this particular community is structured digitally using SNA. This approach provides a novel approach to the study of multidisciplinary information practices through an examination of the digital oceanographic community. Oceanography is indeed a community of multiple communities of practice (e.g. physical oceanography, chemistry, geoscience, arctic research, environmental, and climate change policymaking), and is central to current political and scientific discussions globally. Conducting an SNA thus provides a holistic view of the social environment attached to a formal scientific oceanography and reveals more about the social practices of scientists in the current information environment, as well as how they are engaging in public debate, with a larger audience, regarding their research methods, practices, and results.

Data Collection
Data was scraped from Twitter around the subject of oceanography using Docteur Tweety TwExList. 7 Data collection began with a purposive "snowball" sample (Bloor and Wood 2006, 154). The prerequisite for this SNA was to identify and aggregate a list of key influential oceanographic community members active on Twitter. Since this research is related to further research that case study information practices at the British Oceanographic Data Centre 8 (BODC) and the Ocean Data Interoperability Platform 9 (ODIP) project (Dahlan 2018), the compilation of core nodes began by identifying both BODC partners and ODIP members and partners, as well as search results for oceanographic organizations and institutions via Wikipedia and Google search. 10 Further users were identified using Twitter's Advanced Search 11 through a keyword search (Table 1).
The final list was examined and filtered for duplicates, inactive accounts, and private (inaccessible) accounts. A total of 470 public-facing accounts were highlighted as core nodes (see Table 2 for full list). The process required cross-checking aggregated potential users with possible Twitter profiles, and then verifying authenticity by reviewing the biography section on that account's Twitter page, 12 as well as by finding any references to said account in designated formal resources such as an official website.   The 470 accounts consist of a list of BODC and ODIP related partners, complemented by Twitter, Google, and Wikipedia information on organizations.
For non-verified Twitter accounts with no clear formal ties, personal branding is qualitatively analyzed through the biographical information on the user's page (e.g. selfidentified researchers, scientists, centre directors, etc.). Users without clear designation are identified as enthusiasts or amateur community members. A similar approach is adopted to identify members of the Digital Humanities community in Grandjean (2016). By combining data-driven and researcher-intensive research, we compiled a list of relevant core nodes within this community.
Next the subscribers of the 470 users were scraped. These followers and followings were aggregated via Docteur Tweety TwExList, organized using Microsoft Excel and Microsoft Access, and then visualized in an iterative process using Gephi. Twitter APIs 13 facilitate programmatic access for users and services such as TwExList. Because these datasets were available publicly, this research received ethical clearance by UCL Department of Information Studies, 14 which approved our methods prior to implementation. Between January 20 and 27, 2017, these publicly available APIs were used to scrape Twitter follower and following data for the predefined list of accounts resulting in 3,461,189 connections (edges; followers and followed users) between 2,184,989 accounts (nodes).
The data was further refined using Microsoft Excel and Access, keeping users with at least five core node connections, which reduced these to 817,159 connections and 96,898 users. The list of core nodes and their subscribers provides a snapshot of the community at a specific point in time when climate change and other important events were news headlines. Furthermore, it also allows us to establish where influential figures and organizations are located within the broader online oceanographic community.

Limitations and Delimitations
The purpose of the SNA is to find out which nodes connect different groups and learn from their success in connecting various communities. One of the major obstacles confronting the implementation of this SNA was the choices and Table : Core node classification.

Commercial Businesses and industrial-based accounts Conference
Twitter accounts associated with an event, meeting of conference Data Data-specific projects or institution accounts providing information about a related dataset Education University, college, school, or program accounts Enthusiast Accounts that resulted from initial keyword search that do not belong to other classifications. These accounts are often individual users who are interested in ocean issues, news, or are involved in some leisure activities such as snorkelling or scuba diving Government Account Government officials, departments, news outlet accounts News Outlet Accounts concerned with disseminating information about ocean or climate change related policy, science Non-English Account Users or accounts that are not described in English or that were aggregated based on keywords that also appear in non-English tweets Organization Accounts, belonging to non-research institutions and organizations, which deal with ocean-related areas.
Although organizations can involve news, projects, or concern public outreach and activism, or even produce non-English based content and news, the key distinguishing factor between these types of accounts and other accounts is that these organizations are often NGOs or transnational organizations Professional Non-scientific (or academic researcher) professionals who work in related areas such as policy, awareness, law, business or industry Project Accounts designated for specific projects to update stakeholders, the public, and/or members of a given project Public Outreach, Activism, Awareness, and Campaigns Accounts that self-brand or tweet content that is related to activism, informing the public, or to promote campaigns involving the oceans Research Centre Accounts that are tied to archival, record-keeping, information management, or data collection centers concerned with processing ocean data Researcher Personal accounts of self-identified researchers in a related field decisions that had to be made regarding what constituted a multidisciplinary community concerning oceans and climate change.
To remedy this, instructions from Latour (2004, 62-63) were considered. For the purposes of this study, members of a community can be classified into 13 categories, which were applied to the core nodes of the network. Together, they form a community of practice (Wenger 2008) that is interested in matters of the ocean. The categorizations do not influence or affect the network visualization. Instead, they are used as a comparative reference to understand how software has grouped the data and whether or not software grouping matches these classifications. These references are listed and defined in Table 3.
These categories are used for reference only, as a means to sift through users during filtering in Gephi.
The issue with such classification, should they be utilized as units of analysis rather than of observation, is that it enables a kind of tunnel vision based on preconceived notions and research biases, which interdisciplinarity seemingly aims to overcome (Naess 2010, 54). Visualizing these data, however, using Modularity algorithms helps transcend these limitations (Blondel et al. 2008), by situating the groups without altering the research's focus on examining the entire network as a community of practice. These categorizations therefore do not influence or affect the network visualization in and of themselves. Instead, they were used as a comparative reference to understand how Gephi has grouped data and whether or not they match our classifications. This highlights another limitation brought about by software. Christakis and Fowler (2009, 13) suggest that software are designed to visualize the most connected users, or nodes in the centre of the graph, and the ones not connected as much in the periphery. Adequate knowledge of the software is thus needed in order to avoid misinterpretation.
Due to the many ways Twitter data can be explored, this research suffices with the study of connections between various types of accounts of interest. What is not included in this study involves edges that form between users based on, or surrounding, content (e.g. who tweets, produces, likes, retweets, and whether there are hashtags to augment them). This is a consideration for further research, along with network scope expansion to explore the connections between the aggregated connections of the core nodes and their connections' connections.
Furthermore, the SNA is based on an egocentric (as opposed to closed sociocentric) network since not all nodes are known (due to data refinement and selective sampling) (Chun, Hossain, and Davis 2005, 3). This is because the online oceanographic community is extensive, dynamic, and there are no set criteria for who can be part of it. Whelan et al. (2016, 3) argue that, Social network data, while invaluable for characterizing the ties between for example individuals, have little or nothing to say about how social networks are experienced or about how they are embedded within social, spatial, or temporal contexts.
That is, the data at hand captured a certain point in time, and may not be characteristic of future or past states. The network is dynamic in real life, but static in the way it is presented. As Christakis and Fowler (2009, xi) further explain, "the nodes in our networks are thinking human beings. They can make decisions, potentially changing their networks even while embedded in them and being affected by them." Although evidence suggests that the sample is representative of the oceanographic community, it may not be definitive given data refinement, which further reduced the dataset to a practicable size, and which means that not all relationships were mapped and explored. To account for this, qualitative sampling was done on a small section of each cluster and is a consideration for extended future work.

Operationalization
This SNA visualizes and examines the topological characteristics of a network of ocean-related community members, including researchers, data scientists, organizations, NGOs, governmental bodies, events, and projects. To do so, the digital community has been examined using Actor-Network Theory (ANT) (Latour 2004), which requires identifying and describing varying actants, or nodes, interacting with one another in a certain setting as part of a network. The operationalization for this SNA, however, is concerned with mapping a community of users interested in marine science and who use Twitter as a platform, where a list of Twitter accounts, or core nodes (n), was compiled based on follow relationships.
For each core node, follower (x) and followed (y) information was retrieved. Followed/following information (out-degree) refers to accounts that core nodes follow (N → Y), while follower information (in-degree) refers to accounts that follow these core nodes (X → N). The SNA aims to identify ties that exist between aggregated nodes. We then considered the mutual ties between these core nodes, whether they have shared followers, and/or shared accounts that they themselves follow.
The non-core nodes discovered include several influential accounts such as @blindspotting, which belongs to the CEO of think tank Blindspot that deals with climate change policy, and @oceanwire, an account spreading news, knowledge, and information about the oceans to promote positive action.
The SNA included an examination of space and communication channels (the affordances of technology and media). Since ANT does not see the importance of categorizing actants independently 15 (Cressman 2009;Latour 2004), it makes sense to disregard distinctions between the actants (as described in Table 4) during data collection and visualization and focus on looking at the community as a network of varying actants (links) interacting with one another in a certain setting (chain).
The first operationalization of this concept is to regard the digital space as an extension to the material, physical, or natural space (Rey and Jurgenson 2013). Therefore, ICTs are treated as means to extend knowledge about a phenomenon. In fact, studies (Gibbs et al. 2013;Leonardi 2017;Meyer et al. 2011) do suggest that, arguably, 16 ICTs have the potential to extend the amount of information we have access to, rather than just changing how we access them. This is how this SNA aims to shed light on several practical concerns, including how an oceanographic community looks like on social networks by visualizing connections occurring across sub-communities; identifying key nodes, where, and in what context.
The SNA also aimed at identifying outliers in the network as well as how various nodes compare and differ. To visualize these connections, nodes and edges were defined based on Garton, Haythornthwaite, and Wellman (1997). Garton, Haythornthwaite, and Wellman (1997, para. 14-15) defines a tie as the connection of "a pair of actors by one of more relations." It is a type of SNA unit of analysis aimed at understanding computermediated communication (CMC). Ties can be simple (one relation; e.g. members of a research group), or complex (multiple relations; e.g. project partners or conference members). Ties can be strong or weak, depending on context (Christakis and Fowler 2009). According to Garton, Haythornthwaite, and Wellman (1997, para. 16), weak ties are "generally infrequently maintained, nonintimate connections, for example, between co-workers who share no joint tasks or friendship relations," while strong ties involve "combinations of intimacy, selfdisclosure, provision of reciprocal services, frequent contact, and kinship, as between close friends or colleagues" (Garton et al. 1997: para. 16). Pairs that maintain strong ties are more likely to share or exchange For example, to only study a group based on their formal designation (researchers, journalists, teachers, clowns, etc.). 16 See Veletsianos (2016) and Carrigan (2016) for studies that explore alternative positions.
resources (Garton, Haythornthwaite, and Wellman 1997). However, nodes that maintain weak ties are more likely to provide a more diverse range of resources due to their varying social networks (Garton, Haythornthwaite, and Wellman 1997). It is often in these "weak ties" that we see smaller clusters merging with larger social systems (Grandjean 2016).

Visualization & Data Analysis Using Centrality Measures
The full graph was analyzed by applying several built-in Gephi statistical algorithms to identify several parameters. After applying Force Atlas 2, 17 the nodes were colour-coded using Gephi Modularity 18 classification (mClass) measures as illustrated in Figure 1. This approach identifies distinct clusters in the graph based on the strength of node relationships (Cherven 2015, 189). The "[o]utput for this function is simply an integer value starting at 0" (Cherven 2015, 197). The integer value for this graph is 1.4, resulting in six different clusters, or mClasses. These groups are composed of Twitter users, whose connections form a dense sub-network within the graph, demonstrating the quality of their ties to one another, compared to their weaker ties outside of their designated cluster. This figure illustrates the six clusters within the graph representing the closer bonds between users within a cluster. This is a representation of the expanded network, consisting of core nodes and their connections. Despite their strong bonds, it is unclear whether their ties are based on interest, digital research collaboration, inperson relationships, or common topic(s) they share. However, as further findings show and illustrate in the graph on the right, the homogeneity of clusters can be attributed to geography and time zones. Original visualization by Kinda Dahlan © 2017. Next, various centrality measures were implemented and compared (see Figure 2). Overall, centrality measures determine "the role of an individual within a society, [and] its influence or the flows of information on which [s/]he can intervene" (Rochat 2009, 1). Betweenness Centrality ranking (BCR) is an algorithm that determines the robustness of the network for information flow by determining the shortest path to and from a given node (Cherven 2015: 200;Estrada et al. 2009). For each node, BCR is defined as "the number of shortest paths going from y to z" (Boldi and Vigna 2014, 10). High BCR nodes act as bridges in the network. Eigen vector centrality helps determine the quality and 17 Force Atlas 2 is a continuous algorithm available as a built-in function in the Gephi software. It "is a force directed layout: it simulates a physical system in order to spatialize a network" (Jacomy et al. 2014, 2). 18 Another algorithm built in Gephi. Modularity measures the quality of community partition obtained by different measures (such as Force Atlas 2). "The modularity of a partition is a scalar value between −1 and 1 that measures the density of links inside communities as compared to links between communities" (Blondel et al. 2008, 2). importance of connections, or which nodes are central in the community. 19 This figure depicts four centrality measures available in Gephi. Clockwise from the top left, the graph reveals that most nodes have an average CCR, the next graph shows that only a select few (illustrated in red and sized larger than the rest) have high BCRs and are distributed equally and centrally within the clusters and overall graph. The graph below it depicts high ECR nodes, which are located centrally in the graph (sized larger), their followers are ranked highly due to their connections (coloured in dark blue). The reason for this is based on ties. The green colored nodes that span the center horizontally indicate that these nodes are connected to low ECR nodes (those on the fringes), however the low ECR nodes are in dark blue suggesting that they are connected to high ECR nodes. Finally, the graph in the lower left corner simply depicts the nodes with the highest numbers of degrees. Weighted degrees are assigned based on core nodes or non-core nodes and hence the wider difference in rankings (core nodes connected to other core nodes are weighted more and depicted larger and redder).
Once these measures were calculated, it was evident that some nodes had several mutual degrees while others were not well connected in the graph. To account for this, a third measure was applied to understand this occurrence. Harmonic Closeness Centrality (HCCR) 20 is calculated to determine the relationship between unlinked nodes. HCCR "provides a sensible centrality notion for arbitrary directed graphs" (Boldi and Vigna 2014, 25). This measure is derived from its parent, Closeness Centrality (CCR), which identifies the nodes that are most likely to reach other nodes in the network the fastest. As so, it could be said that BCR reveals nodes that are most likely to enable information flow by connecting otherwise disconnected nodes to one another. As the upper right graph shows on Figure 2, these nodes are not necessarily located centrality. For a list of discovered non-core nodes that ranked in the top 200 BCR, consult Table 4. CCR indicates which node transmits information (how far connected), while HCCR, an extension of CC, accounts for influencers relative to their own community and compared to the graph. Finally, because some nodes are connected to several clusters, or mClasses as identified earlier in Figure 1, Eigen Vector Centrality helps determine the influencers in the network across these disconnected clusters (Aleskerov, Meshcheryakova, and Shvydun 2016).

Findings
Our findings explicate the structure of the oceanographic online community and visualise user social patterns and network distribution amongst various groups. One substantial result is that there is no one-size-fits-all approach when it comes to how and why users use Twitter. Some users use it to remain connected with their peers, others to keep up with the broader community, while some use it for their personal interests or to disseminate their work. The following findings detail some emergent themes, while the 19 The difference between BCR and ECR is that Eigen Centrality (ECR) is a measure of the influence of a node in a network based on an assessment of relative scores for all network nodes, such that highscoring nodes contribute more to the score of the node in question, than equal connections to low-scoring ones (Aleskerov, Meshcheryakova, and Shvydun 2016). Betweenness Centrality (BCR) on the other hand, measures all the shortest paths between every pair of nodes in the network. It then counts how many times a node is on the shortest path between two others (Estrada, Higham, and Hatano 2009). 20 Harmonic Closeness Centrality rank (HCCR) is based on an algorithm that calculates Closeness Centrality (CC)-or the average path between a node and all the others in a connected graph (Boldi and Vigna 2014;Rochat 2009)-in a non-connected graph. The difference between HCCR and its parent CCR is that the first measures the average path of one node in a non-connected cluster whilst the latter does so in a highly connected one. analysis benefits from several works (e.g. Christakis and Fowler 2009;Grandjean 2016).

Graph Statistics
This section presents basic statistics about the dataset including a statistical overview of core nodes (or the 470 users compiled to build the dataset) and their followers and followings-users that the core nodes follow and are followed by, represented by out-degree and in-degree edges. Although the data was analyzed more thoroughly after refinement, data prior to Gephi manipulation produced some interesting findings. These include the identification of top-ranking nodes based on various centrality scores (Table 5) where the highest ranked nodes in each centrality measure compete for the Top 10 ranks (Table 5).
Nodes ranking first in all but out-degree (who they follow) belong to mClass 5 (Blue). The out-degree top ranked node is @Alex_Verbeek, an independent advisor on global issues related to climate, water, and energy, 21 located in mClass 2 (Red). A closer look into his verified Twitter account reveals that he is an active user that produces original content, engaging with the public consistently about environmental and climate challenges and content. Although ranked high in out-degrees at more than 65K after data refinement, @Alex_Verbeek also has a significantly high number of in-degrees (10K + after data refinement) at the time of data collection. Aleskerov, Meshcheryakova, and Shvydun (2016, 5) suggest that accounts with "[h]igh values of in-degree centrality mean that a node is strongly affected by its neighbours. Alternatively, low values of in-degree centrality identify nodes that are not influenced by other nodes". Arguably, in the context of this research, high in-degrees, or followers, could suggest otherwise. It may be the case that a node has an influential role within the community and is hence followed by many users. To ascertain this further, qualitative sampling was carried out. Quantitatively however, Table 6 provides a comparison between top 10 in-and out-degrees.
Contrastingly, nodes with high out-degrees (e.g. @Alex_Verbeek and @oceanleadership) tend to be influenced by their neighbours, and are perhaps more gregarious than those that follow less nodes. This begs the question, what does it mean if in-to out-degree ratio is small, such as in the case of @Alex_Verbeek? Grandjean (2016, 5) suggests that this could be due to a "social function" whereby a user is notified about a subscription from this user, discovers new content, and potentially follows back. As such, these users are not as significant to the research at hand as those that are deemed "stars" (Grandjean 2016, 4-5) in the community. Star users tend to have high numbers of followers (in-degrees), which are qualitatively determined to be part of the community.
It goes without saying that not all users that have a large number of followers are influential. This SNA accounts for this by limiting the data to users who have at least two core node followers (in-degrees), and a total of five core node connections (out-degrees). Despite apparent limitations, this helps ensure a reliable foundation for future expansion of data analysis of the broader community in this dataset.
Lastly, depending on the various combinations of centrality scores, each of the core nodes may have different significance. While total number of degrees and BCR are important measures of node influence, in-and out-degrees also say something about a node. More out-degrees indicate that a node is more likely a recipient of a message
rather than a producer of information. More in-degrees indicate that the node has higher visibility and possibly an influencer in its community but not necessarily involved or central to one specific community (e.g. oceanographic), subject (e.g. ocean acidification), or event (e.g. Paris Accord). Core nodes are central to the community because they disseminate information about the oceans (e.g. @oceanexplorer), promote marine conservation action through education (@savingoceans), and/or engage with the public (e.g. @noaaocean). Data visualization further revealed interesting findings. For example, several NOAA accounts were shown to have exceedingly high numbers of in and out-degree shown to cluster near the borders of mClasses 4 and 5. The data also shows that user distribution is varied and benefited from refinement since it covered different types of accounts-from pastors (@pastorsmalley and @rev_zoerb) to plumbers (e.g. @best_plumbers), and politicians (@Dreynders). Reasons are different, but most accounts are directly or indirectly impacted by waterrelated science and information (e.g. hurricane forecasts, natural disaster aid programs). Table 7 details the relationship between core nodes where the majority follow at least one other core node, and where more than half follow more than 10 core nodes. Almost 50% are followed by 10 or more core nodes, and at least 85% are followed by two core nodes. Of the total 470 core nodes, 37 nodes are not followed.
Moreover, 61 core nodes make up a group categorized as the Top 10%. These users have more than 10K followers or followings, or both. As a result, not all of their edges were aggregated due to some technical limitations imposed by Twitter APIs and computing resources. The Top 15 from this category are listed in Table 8.
Ultimately, further analyses need to be done to determine the full effect of these super nodes on the graph. An initial look into what occurs if these nodes are removed from the dataset also produces interesting patterns. A comparison of layouts for the Top 10% of users can be seen in Figure 3 and Table 9, which illustrates this by node size and compares super users by rank, respectively. A comparison between Figures 1 and 3 also reveals how the expanded network compares to the core nodes.
The left graph shows the distribution of the data based on groupings which are color coded to represent the different mClasses. The graph on the right visualizes core nodes by total number of degrees. That is, core nodes with high numbers of total degrees are sized larger and color coded in green. As shown, BODC is located in mClass 5 Core node statistics Total core nodes = 

Percent of users (%)
Following the core nodes Followed by the core nodes*  core nodes following another core node  core users are not followed by other core nodes % Follow at least one core node % Are followed by  or more % Follow two or more % Followed by more than  % Follow no one % Followed by  or more .% Follow more than  % Followed by more than  % Follow  or more users % Are not followed by other core nodes <% Follow  or more users *A total of  nodes followed by other core nodes This table details core node statistics based on their connections with other core nodes. This table provides a quantitative comparison between the listed nodes' ranking based on who they follow (out-degrees) and who they are followed by (in-degrees).
(blue on the left graph) where it connects mClasses 2, 4, and 5, and has a relatively small total of connections compared to other core nodes (right graph). Original visualization by Kinda Dahlan © 2017. The Top 10% of nodes are core nodes with the highest numbers of follower count, which can potentially skew the graph.

Graph Topology
Approaching centrality scores with the goal of understanding what they mean, relative to the context of this research on oceanographic communities of practice, necessitates a visual analysis of the graph. Not surprisingly, the majority of core nodes are clustered in three areas: mClass 3 (Green) located on the upper right corner of the graph, 4 (Magenta) located upper right, and 5 (Blue) located centrally (see Figure 1). These clusters can be examined in more details to determine their characteristics and grouping. mClasses 3 and 4 include the majority of scientists and researcher accounts. With some exceptions accounted for, mClass 3 consists of mostly European and UK-based accounts. Cluster 4 includes a large network of various geoscience research centres, scientists, and programs. It is also made up of mostly UK, US, and Australian accounts. By This table lists core nodes that represent mostly governmental agencies, organizations, and charities, although there are a number of individuals such as @Dreynders, the Belgian deputy prime minister and foreign affairs minister, and @Slebid, a scientist in nanotechnology and renewable energy in air and water. These accounts are named super nodes for having so many connections, not all of which were scraped due to computational limitations. examining the data qualitatively in this way, patterns from the data visualization begin to emerge-i.e. why certain groups cluster together. Esteemed professor of History of Science and Affiliated Professor of Earth and Planetary Sciences, Naomi Oreskes (@NaomiOreskes), a non-core node, is also located in the central periphery of mClass 4. This location overlaps with mClass 5 and mClass 2 where information flow is most facilitated between different nodes. Another prominent node, belonging to renowned oceanographer Sylvia Earle, 22 was also located in the graph. @SylviaEarle is a verified account. This user is also a non-core node and is part of mClass 5. @SylviaEarle ranks twelfth for most followed node by core nodes. She also ranks in the top 500 for all centrality scores, a significantly important ranking considering there are more than 90K nodes in the graph. The more central a node is, the higher its Betweenness Centrality (BCR) score-the more influence on the community.
Of the top 10 BCR, mClass 2 has five nodes (four enthusiasts and one professional as per the classification in Table 3 and as will be discussed below), mClass 3 has two (both researchers) and one project, and mClass 5 has three (enthusiast, education, and research centre). The top 20 nodes for BCR only contain one non-core node @oceanwire, ranked nineteenth. This node has a low in-degree rank at 156, and low out-degree at 148. Of the top 200 BCR, there are 31 researchers, 23 organizations, 23 enthusiasts, followed by 22 educational institutions, 18 public outreach and activist users, and 18 research centres. There are also 23 none-core nodes. These nodes are, listed from highest to lowest rank: @oceanwire, @ocean_networks, @sailorsforsea, @hootsuite, @scinewsblog, @cechr_uod, @therightblue, @interior, @antarcticreport, @missionblue, @paul_rose, @esri, @earthisland, @seawildearth, @hakaimagazine, @congareenps, @jimharris, @zosterar, @seasaver, @cleanwaterwedå, @jasonlrobinson, @usfws, and @blindspotting. This helps determine the influence of a node in facilitating the flow of information within a network. As such, with a BCR of 345 and 151 core nodes following her, it is apparent that @SylviaEarle not only belongs to the oceanographic community, but also plays an influential key role in it. This can be established prior to qualitatively examining the account's profile on Twitter. Examining this user's profile however, shows an active Twitter feed. The user is involved in sharing original or authored content consistently.
Contrastingly, of the core nodes, @EU_Mare, an account belonging to the European Union Commission's maritime affairs, follows the most core nodes (231). @NOAA is the most followed node with a following of 221 core nodes. All mClasses have a noticeable number of UK-based nodes. This is not surprising since the core nodes began with a list of this research's UK-based case studies (ODIP & BODC) and their partners.
Assessing the quality of core nodes in terms of the different centrality scores (in-and out-degree, BCR, HCCR, and EVC) reveals that there are a few that do not connect with other core nodes (refer to Figure 4). The outlying core node belonging to mClass 0 is @northsea_energy. There are two in mClass 1, @cfldickson and @marinesafetywa. MClass 2 has @ili_zuyd and @global_env1. In mClasses 3 and 4 there are @institutrb and @marinemarlag, respectively. Core nodes in mClass 5 are @enea_steresa, @espmasonu, @mast_sandyhook, @thewhaling, and @unhmarine, all of which are concerned with research, science, and policy in an official governmental and/or educational capacity.
On the right side, modularity classes are illustrated using color codes for each cluster whilst size is consistent for all nodes. Adding node sizing based on the total number of degrees results in the graph on the left. As illustrated, the nodes are redistributed based on gravitational pull or how well these core nodes connect to each other. On the far left are core nodes that do not connect to other core nodes. The large circles situate core nodes associated with the two case studies on the British Oceanic Data Centre and the Ocean Data Interoperability Platform within the graph, the majority of which belong to research and policy clusters (3 and 4). Original visualization by Kinda Dahlan © 2017.

Discussion
The guiding research question for this SNA was: what does a digital SNA reveal about online oceanographic communities on Twitter? The study at hand captures a fragment of an organic, living, and changing network whose boundaries are ever shifting. One of the most prolific themes to emerge from this research is that nodes within the graph are not fully integrated, revealing a fragmented community of sub-groups, often built around their own core nodes. This section details how to understand this, and what this means for positioning oceanographic organizations in the network. Another theme is that there is potential for further public engagement given the demographic associated with the community. The following sections discuss these issues as they relate to the findings from the analysis.

Community Fragmentation
Overall, three out of the six identified mClass clusters (0, 1, 2, 3, 4, and 5) appear to connect the various clusters to one another (see Figure 5). Interconnections, or connections between different mClasses, account for about 44% of the edges in the full graph. The dark blue cluster, mClass 5, is the largest and most central to the graph. It acts as a bridge between various clusters, despite there being direct connections between any two mClasses. For example, and perhaps due to the relatively small size of the cluster, it seems that nodes in mClass 1 (Orange) are the most isolated group with the least connections to other classes. However, mClass 1, which consists of mostly Canadian accounts, does have well-established connections with mClass 3 -relative to the number of its nodes and inner edges-in some disciplinary areas such as shared polar and arctic research with Scotland, Denmark, and Iceland.
This graph depicts findings after visualization and qualitative sampling of the data. It reveals how mClasses are composed of nodes clusters based on geography as well as practice and interest. It illustrates for example how President Donald Trump's Twitter account is located in mClass 2 where most activists, enthusiasts, and single individuals exist. His account appeared in the data as a result of several core nodes following him (including @WMO mClass 4; @NOAA_HurrHunter mClass5; and @Slebid8 mClass 2), depsite missing reciprocity revealing that some important nodes may not be part of the community of practice but are unquestionably influential on the overall community network. Original visualization by Kinda Dahlan © 2017. Furthermore, both mClasses 3 (Green) and 4 (Magenta) can be considered independently homogeneous in terms of constituent nodes. Both for example have a high number of research and data centres, educational institutions, and research and scientists accounts. mClass 3 and 5 have the bulk of BODC partners, though there are a number of them in mClass 4. This begs the question, why are they not further integrated?
To answer this question another level of data aggregation is needed. This involves collected non-core node followers and followings to understand whether or not they connect in other ways beyond the listed core nodes. Perhaps it could be that this indeed reflects the structure of scientific communities worldwide, whereby each cluster represents a group, each group has a similar distribution but are isolated from other clusters in terms of their research collaboration. Nonetheless, the answer also lies in the nodes that connect two clusters and in the border areas between mClasses in the graph. By looking at the accounts that fall in the middle between the two clusters, 3 and 4, we can see that they are heavily tied to mClass 5, or where most celebrity scientists' and mClass communicators are located. Opposite to this cluster from the bottom, the area between mClass 2 and mClass 5 seems to draw more international followers that consist of various activists, NGOs, and actors that are involved with environmental causes. It is where most activist accounts emerge. Individual accounts are more vocal and specific about issues that involve the oceans such as climate change (@EnviroIntel), or that identify as niche professionals in related areas (@AlgorithmLab). In designing further research therefore, aggregating the edges for non-core nodes in these locations could prove useful to further investigate the oceanographic community beyond western communities. Of course, other considerations must be taken into account such as language (as seen with non-English based accounts) and communities with preferred ICT platform alternatives that match Twitter in other settings.

Community Integration
The SNA produced six different sub-communities (mClasses 0-5). Each community was formed based on the ties between their respective nodes. The boundaries between these communities are formed by the strength of these ties. This means that edges between mClass 0 nodes, for example, have high transitivity locally but that nodes from this mClass also have low transitivity across other modularity classes 1, 2, 3, 4, and 5. This suggests that although the classification of nodes in each mClass may have some similarities, they remain fully autonomous from one another. This further indicates that despite tool or technology, information flow is limited to relatively homogenous groups, impacted by geography and time zones. They can also be based on identity (mClass 5), location (mClass 1), industry (mClass 0), practice (mClass 3, 4), or interest (mClass 2) as qualitative analysis further reveals.
Researchers in mClass 1 for example, located at the far right of the visualized graph ( Figure 5) and who are involved in arctic and polar research may be weakly connected to other researcher in mClass 4 (located on the opposite side of the graph, visually). Why are they less connected? Is this a digital mirror of non-digital connections? Can they benefit from stronger connections? And how can these ties be improved? These questions frame further research opportunities that can address the apparent homogeneity of clusters within this graph by case studying individual accounts. Even so, Otero et al. (2014), who examined content production and consumption on Facebook, Twitter, and the Google search engine, provide one possible explanation from their own study on oceanography and social network sites-"that requirements differed slightly among the various user groups" of oceanographic data (Otero et al. 2014, 139), and that "there is still a gap for users between their needs and their ability to obtain and manipulate the information" (Otero et al. 2014, 144). The study however does not explore the actual connections that form between users to understand how information flows but further suggests there is a need to do so to enhance targeted content and information discovery.
Surprisingly, Otero et al. (2014, 144-145) also reveal that users interested in "sea ice" data are mostly popular on Twitter. Comparing this to the findings of this research, analysed through a qualitative sampling of each mClass, it seems that this group, mostly clustered near mClass 1, is the most isolated on the graph. Further in-depth research into this mClass can reveal more about their connections and perhaps an unidentified community sitting just at the periphery of the current dataset. As suggested earlier, looking at core node connections' connections can help resolve this issue. By doing so current mClasses can be further integrated, allowing for a better understanding of user composition and enhanced information flow.
Additionally, further qualitative sampling, as will be detailed in future work, can benefit from the fragmentation of users-or, as Hine et al. (2017) call it, "segmentation." That is, understanding how these clusters are broken down into smaller homogeneous groups can help identify potential users that are strategically situated to integrate isolated clusters into the graph but can also lend organizations, policy makers, scientists, and other stakeholders alike the power to target intended users. One qualitative example emerging from this dataset shows that user @AdamLeadbetter can potentially further integrate Canadian users from mClass 1 (a seemingly isolated group) with UK and European based mClasses 3, 4, and 5 on matters of arctic and polar research. Members of each mClass can thus benefit from catering their output to target specific users within their groups. Such targeting can potentially isolate graph distribution further. For example, it may be more difficult for organizations to reach out to users belonging to mClass 1 and 2 (southern parts of the graph) to further impact the political climate on climate change than it is to disseminate relevant research within the confines of research and policy groups (northern parts of the graph). It is not surprising then, given the fragmentation of clusters, that there are indeed "significant challenges" for scientists "attempting to engage the public about climate change" (Hine et al. 2017, 1).

Public Outreach
As stated previously, this paper is part of a wider research project that aims to understand current information practices in oceanography (Dahlan 2018). The findings from this study were used in combination with data from two case studies using semi-structured interviews and observations to examine and inform on information practices and how best to utilize ICTs for oceanography. One affordance of ICTs, and social media in particular, is the ability to engage with a broad audience.
The benefits of public outreach and engagement are numerous. This includes impacting policy and social change, maintaining practical and useful research that is economically and environmentally viable, and securing research grants and funding (Grand et al. 2015, 12). As it stands, public engagement is at the top of the agenda for various research funders (Grand et al. 2015). Ocean sciences, given its multi-and interdisciplinary nature, can and do benefit from improving outreach with the public as well as for research collaboration. The findings from this research reveal that groups within the identified Twitter community of ocean-relevant users are fragmented, highlighting another key challenge: to explore the types of communication and information exchanged between these groups.
However, the data, having been selectively analysed both quantitatively and qualitatively, shows that there are several ways to improve information flow for the existing community on Twitter. Reiterating from the findings section, while total number of degrees and BCR are important measures of node influence, in-and out-degrees also say something about a node and its star status (Grandjean 2016). In assessing the data with the idea of enhancing public outreach in mind, it appears that it is necessary for data and information producers and distributers to understand the affordances of a given platform in order to target end-users and meet their demands.
For example, according to Lee, Van Dyke, and Cummins (2018, 280) "NOAA is not interacting with publics to create a place for conversation" on Facebook. But given the star node status that multiple-not a single-NOAA accounts occupy on Twitter (see Tables 5 and 8), it may then be more practicable for NOAA to address this "missed opportunity" (Lee, Van Dyke, and Cummins 2018) of engaging with the public by utilizing Twitter further, given its slightly differing design and affordances. Twitter, with its simplicity and relatively straightforward functions, could provide a more manageable platform to foster a "dialogic space" (Lee, Van Dyke, and Cummins 2018, 281) between NOAA and their followers. Of course, further qualitative analysis is required to better understand the quality of NOAA's Twitter use and user satisfaction. Similar case studies have been conducted where BODC performance has been done using the dataset described in this research. In so doing, new avenues for further research are created. The results from this analysis can potentially be compared to findings describing other networks, and other metric indicators, such as citation counts and publication numbers.
We chose to limit the SNA to one platform, Twitter, for similar reasons, with the knowledge that future work may include data scraping from other social networks. However, in the wake of Cambridge Analytica, it may prove difficult to run similar analysis on Facebook data in the same way. With this in mind, other limitations that can affect how an SNA is carried out include API limitations, which often limit data scraping only to publicly available information, and which can depend on the kinds of questions asked. It can be time consuming, such that the current SNA began in November 2016 and was completed in June 2017. It can also be expensive, depending on available computational resources and on whether or not paid services are utilized.

Conclusion
In conclusion, this research has compiled a list of core nodes based on some of the world's most influential names in oceanography in order to map a network of oceanographers on Twitter. Results from this digital mapping revealed that despite varied Twitter use, and the creative branding of users online, oceanographic Communities of Practice remain relatively fragmented, indicating that the SNA can be used in conjunction with traditional information practice approaches that combine qualitative and quantitative methods to examine user behaviour and plan for better information flow. It addressed how information flow is produced and managed, and how organizations can utilize these channels to establish their identities online for scientific research in order to further improve audiencetargeting vis-à-vis such mapping. In doing so, it contributes to Information Studies and Oceanography by highlighting areas where further data curation, engagement, and outreach research can be done. It has shown that ICTs are not utilized to their potential by the oceanographic community, whereby online communities remain relatively fragmented; that more funding, training, and resources are needed to optimize data curation and community integration, and that the findings will therefore be useful to a wider interdisciplinary audience undertaking multidisciplinary projects that look at enhancing community ties and data sharing.

Highlights
-Understanding ICT affordances helps oceanographic information producers enhance public outreach, collaboration, and communication efforts. -Individual accounts are more vocal and specific about issues that involve the oceans such as climate change. -Coupling Social Network Analysis with traditional qualitative analysis, in the study of social networks, results in the emergence of new patterns for analysis. -The oceanographic community online is a fragmented community as a result of both practice and geography.