Skip to content
BY 4.0 license Open Access Published by De Gruyter Open Access December 3, 2021

Self-mention and uncertain communication in the British Medical Journal (1840–2007): The decrease of subjectivity uncertainty markers

Ilaria Riccioni, Ramona Bongelli and Andrzej Zuczkowski
From the journal Open Linguistics

Abstract

The communication of a scientific finding as certain or uncertain largely determines whether that information will be translated into practice. In this study, a corpus of 80 articles published in the British Medical Journal for over 167 years (1840–2007) is analysed by focusing on three categories of uncertainty markers, which explicitly reveal a writer’s subjectivity: (1) I/we epistemic verbs; (2) I/we modal verbs; and (3) epistemic non-verbs conveying personal opinions. The quantitative analysis shows their progressive decrease over time, which can be due to several variables, including the evolution of medical knowledge and practice, changes in medical research and within the scientific community, and more stringent guidelines for the scientific writing (regarding types of articles, their structure and rhetorical style).

1 Introduction

The communication of a scientific finding as certain or uncertain differently affects its translation into practice by the scientific community and the National Governments as well. Thus, distinguishing certain and uncertain information is a crucial need both in the scientific field in the strict sense and in the popular scientific domain.

Given the importance of this topic in determining practical decision-making, the study of hedging, uncertainty, mitigation, and the like in scientific writing has received increasing attention from scholars since the 1990s (e.g. Crompton 1997, 1998, Dudley-Evans 1994, Hyland 1994, 1998a, 1998b, 2004, 2014, Hyland and Milton 1997, Rozumko 2017, Salager-Meyer 1994, 1997, Skelton 1997, Vold 2006). Additionally, recently, researchers in the NLP community have focused their attention on the detection of certainty and uncertainty markers (UMs) and their linguistic scope (e.g. Agarwal and Yu 2010, Bongelli et al. 2012, Farkas et al. 2010, Kim et al. 2009, Omero et al. 2020, Özgür and Radev 2009, Szarvas et al. 2012, Vincze et al. 2008, Zhou et al. 2011, 2015, Zou et al. 2013).

In a similar way, also the expressions of subjectivity and self-mention in academic writing have been extensively studied by adopting prevalently cross-cultural and cross-disciplinary approaches (e.g. Fløttum 2005, 2006, 2012, Gao 2017, Hyland and Jiang 2016, Khedri 2016, Mur-Dueñas and Šinkūnienė 2016, Rongen Breivega et al. 2002, Salager-Meyer 1999a). As for diachronic studies concerning the expressions of the authors’ subjectivity in biomedical articles, we can mention Atkinson (1992, 1996) and Salager-Meyer (1999b). As for the first one, a corpus of articles of different genres from the Edinburgh Medical Journal (1735–1985) was rhetorically and linguistically analysed. Particularly, rhetorical analysis investigated how authors represent or “place” themselves in their texts, text forms, and discourse structures used to report research, and the nature of the “discourse communities” in which texts were situated; linguistic analysis, conducted by using Biber’s (1988) multidimensional approach to register analysis, considered five polar dimensions: involved vs informational productions, narrative vs non-narrative concerns, situation dependent vs explicit reference, overt persuasive vs non-persuasive expressions, and abstract vs non-abstract information. The results of Atkinson’s analysis showed an evolution of medical research writing in relation to significant changes, regarding the advance of both medical knowledge and practice, the growth of a professional medical community, etc. As a general outcome, this evolution implies a shift from a narrative and author-centred perspective to a non-narrative and information-centred one. Similar results are achieved by Salager-Meyer (1999b), even if her analysis (conducted on a corpus of medical articles published in 34 different British and American journals, between 1810 and 1995) specifically concerns the academic conflict in the English medical discourse, considering both overt attacks and covert criticism. Her results outlined an evolution from a more direct, personal, polemical, and aggressive linguistic attitude to a more indirect, mitigated, and “fact/finding” centred one. This evolution shows an increase in the degree of deference towards the authors being criticised and a decrease in the writer’s commitment towards the uttered criticism, and “reflects the shift from an author-centred and privately-based medicine to a fact-invoking, professionalized and highly competitive scientific community” (Salager-Meyer 1999b, 371).

Nonetheless, to the best of our knowledge, specific studies that deal with hedging functions of markers of subjectivity are less numerous (Grabar et al. 2016, Hyland 2001, 2002, Hyland and Jiang 2018, Shehzad 2007, Walková 2019) and no work has been specifically conducted on subjectivity UMs in medical corpora, performed by adopting a diachronic perspective.

In this study, we have qualitatively and quantitatively analysed 80 articles randomly selected from the British Medical Journal (BMJ), which cover a time span of 167 years (1840–2007), with the main aims to identify

  1. (a)

    which and how many UMs of subjectivity are present, testing whether there are significant variations in their use over time;

  2. (b)

    the linguistic scope (i.e. the linguistic influence) of each subjectivity UM, calculating the total amount of uncertainty communicated by them.

Consistently with our hypotheses, and in line with the aforementioned studies asserting an evolution away from an author-centred perspective, the results of the analyses show a progressive decrease in the number of these markers with time and in the percentage of uncertainty communicated by them.

These results – even if without delving into questions concerning the wider field of the epistemology of science – are discussed taking into account the parallel evolution of the medical research (and the sciences in general), on the one hand, and the scientific writing, on the other, considering the linguistic and rhetorical means used by the authors for communicating their findings (Atkinson 1992, 1996, Bazerman 1988).

This descriptive study aims to contribute to the research on diachronic variation in scientific writing, particularly regarding hedging and subjectivity strategies.

2 Our previous studies on UMs. Theoretical and methodological approach

The study presented in this article originates from the main results of a Research Project of National Interest, which was funded by the Italian Ministry of Education University and Research, aiming at analysing the communication of certainty and uncertainty in the three corpora of medical articles (both scientific and popular) from a diachronic perspective. The three corpora were, respectively, composed of:

  1. 80 articles randomly selected from BMJ 1840–2007 (available at PubMed Central, http://www.ncbi.nlm.nih.gov/pmc/journals/3/, last access February 2012);

  2. 12 articles randomly selected from BMJ 2013 (http://www.bmj.com/archive of the BMJ, section “Research”);

  3. 12 popular articles randomly selected from Discover Magazine 2013 (http://discovermagazine.com, section Health and Medicine http://discovermagazine.com/topics/health-medicine).

In our previous studies (Bongelli et al. 2012, 2019, Omero et al. 2020, Zuczkowski et al. 2016), we adopted a mixed procedure of analysis, which combines a top-down and a bottom-up approach for identifying UMs, and an epistemic stance perspective (Zuczkowski et al. 2017) on certainty and uncertainty, which exclusively takes into consideration the point of view of the writer/s (=the author/s of the article) in the here and now of their communication. This procedure differs from that of authors interested in studying hedging in scientific writing, as for example Holmes (1988), Hyland (1995), and Hyland and Milton (1997), who mainly adopted a top-down approach, by extracting the UMs to be applied in their analyses from grammar books and dictionaries.

All the articles were first manually edited and then evaluated in two phases by a group of analysts for detecting all the UMs (Zuczkowski et al. 2016). Second, they were converted into plain text files (.txt) for refining the manual analysis through WordSmith Tools (Scott 2012).

Such analysis led us to identify seven categories of UMs, namely:

  1. epistemic verbs in the simple present (e.g. I/we think, believe, suppose, imagine; to seem, etc.);

  2. modal verbs in the simple present, in their epistemic (non-deontic) use (e.g. I/we can, may, must, etc.);

  3. modal verbs in the conditional mood, in their epistemic use (e.g. I/we could, might, etc.);

  4. epistemic non-verbs (adjectives, adverbs, nouns, expressions as, for example, possible, perhaps, doubt, in my opinion, etc., which, sometimes, can even present themselves together with a non-epistemic verb, for example, as in the case of we are in doubt);

  5. if clauses (by excluding the zero conditional, as, in this case, if can be paraphrased by a temporal conjunction that communicates certainty);

  6. uncertain questions (i.e. questions that communicate not something unknown by the author but something uncertain, which is to say that it implies a hypothesis to be confirmed, as for polar, alternative, and tag questions; see Zuczkowski et al. 2016, 2021, Bongelli et al. 2018, 2019, Riccioni et al. 2018);

  7. epistemic future.

The identification of these categories of markers was partially consistent with those proposed by other authors, who recognised the hedging function of epistemic verbs and modal verbs (Crompton 1997, Hyland 1994, Myers 1989, Salager-Meyers 1994, Skelton 1988), probability adverbs (Crompton 1997, Hyland 1994, Myers 1989, Salager-Meyers 1994), probability adjectives (Crompton 1997, Hyland 1994, Salager-Meyers 1994), and if clauses (Crompton 1997, Hyland 1994). Our classification is distinguished by the choice of considering only the writer/s’ perspective and includes, in addition to the previous ones, uncertain questions and epistemic future.

After identifying and calculating the quantity of UMs, their linguistic scope was quantified. The scope of an UM is the stretch of language affected by it, i.e. “the part of the sentence that is modified by the cue” (Szarvas et al. 2012, 353), the constituents that fall “within the uncertain interpretation” (Farkas et al. 2010, 3). For example, in the following extract from the corpus of 80 articles (BMJ 1840–2007) the linguistic scope of the marker It is not quite clear to me includes 15 tokens (in bold):

It is not quite clear to me how the infection is conveyed in such cases. We have always assumed that the streptococci on the fauces will readily find their way into the saliva and be transferred by droplet spray on coughing, etc. (Colebrook 1933, 724)

Furthermore, all the .txt files have been imported in Knowtator annotation text tool [1] (integrated with Protégé knowledge representation system)[2] and the manual annotations of the UMs (including their linguistic scope) were replicated in order to construct and train an algorithm for the automatic detection of such markers (Omero et al. 2020).

These procedures allowed us to know the total amount of uncertainty (i.e. the UMs + their linguistic scope) present in each article and in the whole corpus and, for difference, also that of certainty. The amount of certainty was indeed calculated on the difference between the total number of words of each article and those of uncertainty.

According to our perspective, certainty and uncertainty are two epistemic contraries encoded within linguistic communication (see Zuczkowski et al. 2017, cap. 4, for an experimental demonstration). The theoretical basis of our model (Zuczkowski et al. 2017, 2021) concerns epistemic stance as a linguistic, communicative notion, not a mental one, conveyed by the linguistic expressions used by speakers/writers in a given context. The model identifies three epistemic positions: the Known/Certain, the Uncertain, and the Unknown (lack of information).

Information can indeed be communicated either as certain (known) – when it is communicated as if it were true (regardless of whether it is actually true) – or as uncertain (not known whether or believed) – when it is communicated as if it were possibly true or possibly false (regardless of whether they are actually true or false). In other terms, certainty and uncertainty have to do with the speakers/writers’ epistemic commitment towards the truth of the information, which can be therefore communicated as certain/probable/possible, etc.

Uncertainty, that in our perspective includes both possibility (as expressed, for example, by the epistemic use of modal verbs and expressions such as it is possible/probable, etc.) and subjectivity (i.e. the communication of the writers’ point of views, such as the expressions in my opinion, according to my view, I think, etc.) is communicated through the aforementioned list of markers. Certainty, in its turn, is communicated through epistemically unqualified declarative sentences (Aijmer 1980), rhetorical questions, as well as through markers such as evidential and epistemic verbs (e.g. I/we know, I/we see, I/we remember.), verbal expressions (e.g. I/we have no doubts, I’m/we are convinced …), adverbs (e.g. certainly, surely.), adjectives (e.g. it is sure, certain …), etc. In other terms, certainty is communicated both by declarative sentences with any markers and by explicit markers of certainty, which function – contrary to those of uncertainty – as markers of reinforcing (or boosting, by using Holmes’ (1984) terminology). For example, in our view, in the whole following excerpt the author communicates certainty, even though in the last sentence it is boosted by the use of the adverb “certainly”:

There is a fever in the tropics (for want of a better name, I call it tropical fever) which possesses certain characteristics of its own. Akin to malarial fever, and also to enteric, it cannot correctly be designated by either name. It is certainly sporadic, and usually attacks adults. (Sherman-Bigg 1882, 607)

Specifically, as for one of the three corpora mentioned above – the same investigated in this very study, composed of 80 articles randomly selected from the BMJ from 1840 to 2007 and stratified in four distinct time periods (namely 1840–1880, 1881–1920, 1921–1960 and 1961–2007, see Section 3.1.1), our analysis revealed that: the percentage of certainty and uncertainty in the whole corpus is, respectively, 80% and 20%; although these percentages vary along the four periods, from 77% to 84% (that of certainty) and from 16% to 23% (that of uncertainty), such differences are not significant (Zuczkowski et al. 2016).

This means that certainty and uncertainty remain stable over the 167-year span, i.e. that the writers have been using uncertainty in similar quantities and always in a smaller percentage as compared to certainty.

3 The present study: Subjectivity UMs

In this study, we have focussed our attention on a specific type of UMs, which is the subjectivity uncertainty markers (SUMs). By using this term, we refer to those markers of uncertainty and/or subjectivity in which there is an explicit reference to the author/s through the use of personal pronouns (I, we, me, us), adjectives (our, my, mine, etc.), or adverbs (personally).

Particularly, out of the seven categories of UMs presented in the previous section, we were interested in investigating the uncertainty communicated by: (1) I/we modal verbs in the first singular and plural person; (2) I/we epistemic verbs in the first singular and plural person; and (3) epistemic non-verbs conveying personal opinions.

The main research questions were as follows:

  1. (1)

    Which and how many SUMs are present in the corpus?

  2. (2)

    Are there any significant variations in their use over time?

  3. (3)

    Which is their linguistic scope (i.e. their linguistic influence) and what is the total amount of uncertainty communicated by them?

3.1 Material and method

3.1.1 Corpus

As we were interested in studying the use of SUMs in biomedical scientific writing from a historical perspective by hypothesising possible variations over time, we chose to analyse the biggest of our three corpora previously annotated for uncertainty, which is composed of 80 articles randomly selected from the BMJ, which cover a time span of 167 years (1840–2007).

In this analysis, we maintained the distinction in four time periods (a conventional differentiation based on significant phases both in history and medical evolution):

  1. 20 articles published between 1840 and 1880. This is a fascinating period for medicine in that physiology and experimentation were beginning to come up in Europe. Among the most important medical discoveries, we can mention: the first uses of anaesthesia (Wells/Morton 1844/46), the prevention of the transmission of puerperal fever (Semmelweis 1847), the development of the syringe (Pravaz and Woo 1853), the foundation of cellular pathology (Virchow 1858), the laws of inheritance (Mendel 1865), and the use of antiseptic surgical methods (Lister 1867).

  2. 20 articles published between 1881 and 1920. The period includes the turn of the century, with all the excitement about the apparent lack of boundaries regarding what science could do and the belief that it could predict everything. Among the most important medical findings, we can mention: the discovery of TB and cholera bacilli by Koch (1882, 1884), the first effective vaccine for rabies (Pasteur 1886), the discovery of antitoxins against tetanus and diphtheria (von Behring 1890), X rays (Röntgen 1895), radioactivity (Becquerel 1896), the synthesis and manufacture of aspirin (Bayer 1899), the system to classify blood (Landsteiner 1901), and the pioneering use of the ECG (Dudley 1913).

  3. 20 articles published between 1921 and 1960. These are prevalently the times of war all over the world, with the bad and the good they brought to science. Among the most important medical discoveries, we can mention: the first use of insulin to treat diabetes (1922), the first vaccines for diphtheria (1923), tuberculosis and tetanus (1927), penicillin (Fleming 1928), the first vaccine for typhus (1937), the effect of smoking on lung cancer (Doll 1950), the cardiac pacemaker (1950), DNA (1953), the first kidney transplant (Murray 1954), the first polio vaccine (Salk 1955), and the first commercialisation of the contraceptive pill in USA (1960).

  4. 20 articles published between 1961 and 2007. The extraordinary development of technologies allowed discoveries and much more sophisticated medical techniques (concerning diagnosis, surgery, pharmacology, etc.). We can mention among others: the first liver transplant (Starzl 1963), the first human heart transplant (Barnard 1967), the first test of MRI (1977), the birth of the first test-tube baby (1978), the eradication of smallpox (1980), the pharmaceutical use of insulin for humans (1982), the identification of HIV (i.e. the virus that causes AIDS), the invention of the artificial kidney dialysis machine (1985), Dolly the sheep became the first clone (1997), and the first vaccine for human papillomavirus (2006).

These 80 articles deal with different topics. Descriptions of diagnostic techniques, surgical and pharmacological treatments, and epidemiological studies all concerning prevalently physical diseases are the most frequent.

The random selection collected different types of articles, with a different distribution over the four periods, as shown in Table 1.

Table 1

Types of articles over the four periods

Types/periods First Second Third Fourth Total
Freq. % Freq. % Freq. % Freq. %
Reviews 4 20 4 20 6 30 9 45 23
Clinical cases 12 60 8 40 2 10 0 0 22
Original research 1 5 2 10 9 45 10 50 22
Reports 0 0 2 10 3 15 0 0 5
Lectures 1 5 4 20 0 0 0 0 5
Letters 2 10 0 0 0 0 1 5 3
Total 20 100 20 100 20 100 20 100 80

While the 40 articles of the first and second period are predominantly made up by Clinical cases and Reviews, those written in the third and fourth period are mostly Original research papers. This last type of article, unlike the previous ones, presents a more systematic structure.

As a matter of fact, in the third and fourth period 21 articles (11 for the third and 10 for the fourth) present the IMRaD (Introduction, Method, Results and Discussion) structure or something similar (Gross et al. 2002). These articles are Original research papers (19) or Clinical cases (2). The other 19 articles of these two periods are Reviews, Reports, and Letters and, obviously, do not have an IMRaD structure. In our analysis, we considered the IMRaD model also in its proto-versions, in which it is possible to observe a defined structure, even if not yet standardised. The first article presenting this proto-version of IMRaD structure belongs to the third period (E. Spriggs, The early recognition and treatment of cancer of the stomach, 1928), whereas the first article in our corpus with a well-defined IMRaD structure is dated 1958 (J. H. Burn and M. J. Rand, Action of nicotine on the heart); a standardised IMRaD structure seems established since 1968 (C. P. Lowther and R. W. D. Turner, Guanethidine in the treatment of hypertension, 1963).

The greater section of the 80 articles has a single author, with British English mother tongue (the native British English speaker status was assessed on the basis of the authors’ last names and of institutional affiliation, as for Salager-Meyer 1999b). This applies to 19 articles of the first period, all the 20 articles of second, 14 articles of the third, but only 5 of the fourth, where groups of authors (in some cases not British) prevail.

3.1.2 Procedure and data analysis

Starting from Knowtator tags, we extracted from each article those UMs in which there was an explicit self-reference (by a personal pronoun, an adjective, or an adverb).

On the basis of the seven categories described above (see Section 2), we created an excel document in which we inserted for each article, both SUMs and the string of text in which they appear, their scope and other relevant information about the article itself.

Out of the seven categories of UMs previously listed, SUMs appear for the following three:

  1. I/we modal verbs in the first singular and plural person, in their epistemic (non-deontic) use, as in the following example in which we can notice the modal verb may in the first plural person (even if the article has a single author, “we” seems to be used for generically referring to the scientific community see Fløttum 2006, 2012):

    1. (1)

      We may have underestimated the effect of REM sleep in this study as the decline overnight may partially result from an accumulative effect of periods of REM sleep throughout the night. (Shapiro et al. 1986, 1163)

  2. I/we epistemic verbs in the first singular and plural person, as in the following example where two verbal expressions in the first singular person are present. The author, at first, presents a cautious hypothesis about the causes of the patient’s recovery (“I am inclined to think”), and then he assumes a “not knowing whether” stance (Zuczkowski et al. 2017, 2021) by expressing his uncertainty about the role of the milk as a medium for bacterial cultures (“I […] do not even know whether”), due to a lack of expertise in the field of bacteriology:

    1. (2)

      I am inclined to think that she owes her recovery to the dilution of the toxins by, and the envelopment of the pathogenic bacteria in, the large quantity of milk which was free in the peritoneal cavity. I am not a bacteriologist, and do not even know whether milk is a good culture medium for the various bacteria which one finds in the stomach and the upper part of the incision closed by suture. (Roper 1908, 786)

      Within this category, we distinguished the sub-category represented by the verbs to seem and to appear. Although they have an epistemic meaning with all the personal pronouns (I, you, he, she, it, we, they), as they always refer to the I/we of the author/s, we included in the SUM category only those with an explicit reference to the author(s) (i.e. seem/s or appear/s to me/to us), as in the following example. Here the author expresses his personal opinion in favour of the use of an antiseptic for preventing puerperal fever:

    2. (3)

      These results seem to me to promise a substantially greater margin of safety than heretofore, since the hands of the doctor and midwife (and also the vulva) treated with dettol or iodine will be provided with a lasting chemical barrier against infection. (Colebrook 1933, 726)

  3. Epistemic non-verbs conveying personal opinions. In the following excerpt, we can observe two occurrences of this SUM category (“In my opinion” and “I am farther of opinion”), both aimed at explicitly expressing a personal point of view about the fecundation in human females:

    1. (4)

      In my opinion, therefore, it is not the ovum of a past menstrual period but a younger and more readily responsive ovum which is invariably fertilized, and I am farther of opinion that the extrusion of such from the ovary is determined largely, if not entirely, by the mere presence of vigorous spermatozoa in the female genital tract. (Oliver 1907, 1568)

      In some cases, UMs in general and, specifically here, SUMs can occur in a cluster, as in the two following excerpts. In example (5), we see an epistemic verb (“I think”) immediately followed by an epistemic non-verb conveying a personal opinion (“from my own experience”): the author’s point of view is grounded on his personal clinical experience. Additionally, in the same excerpt, we can underline the presence of other expressions of the author’s subjectivity (but without epistemic meanings), as “I am justified in advocating it” and “according to our present obstetric rule.”

    2. (5)

      Notwithstanding the unfavourable aggregate results of the Caesarean section in Great Britain and Ireland, I think, from my own experience, shown in the above statements, I am justified in advocating it as an operation of election, not merely having recourse to it as one of necessity, according to our present obstetric rule, when no other means can suffice, but to give it a preference over the use of the crotchet, in cases when neither premature labour, the long forceps, these two operations combined, or turning, will meet the exigencies of the case. (Radford 1849, 460)

      In example (6), we have, at first, an epistemic verb (“I believe”) and then a modal verb in the first plural person (“we may”). Furthermore, we can highlight the presence of the UM perhaps. The author expresses his cautious opinion about a probable erroneous scientific judgement about the use of potato in bread, mixed with wheaten flour, a mix that – he believes – most of the bakers use. His opinion is connected to the results of chemical studies (of other authors: he is a physician) demonstrating that potato, since it contains a vegetable acid, can help to counteract the onset of scurvy:

    3. (6)

      I believe very few bakers omit to use it, and perhaps we may have acted unwisely in condemning this adulteration. (Barrett 1848, 177)

From these data, we performed a quantitative analysis by: (a) calculating frequencies and percentages of SUMs and their linguistic scope, both for the corpus taken as a whole and for each time period and (b) comparing, globally and for each time period, frequencies and percentages of UMs and their linguistic scope with frequencies and percentages of SUMs and their linguistic scope.

4 Results

As shown in the following sections, in line with our hypotheses, the quantitative analysis reveals a progressive decrease of SUMs over the four periods. In particular, it is possible to observe a gradual eclipse of the uncertainty communicated by the authors’ self-mentions between the first two (1840–1920) and the second two (1921–2007) time periods.

4.1 Quantitative results: SUMs and Non-SUMs in the whole corpus and in each period

Out of the 80 articles of our corpus, 52 (65%) present at least one SUM: 15 out of 20 in the first period (75%), 17 in the second period (85%), 13 in the third period (65%), and 7 in the fourth period (35%).

As shown in Table 2, out of the total number of UMs (2,808) present in the whole corpus, non-SUMs are 2,492 (88.75%) while SUMs are 316 (11.25%). Such difference is statistically significant (χ 2 (1, N = 2,808) = 1686.24, p < 0.0001) and extremely high: the ratio between non-SUMs and SUMs is indeed equal to 7.88, and this means that the presence of SUMs is, in our corpus, almost eight times less frequent than that of non-SUMs, suggesting that scientific writers prefer to communicate uncertainty without resorting to explicit self-mentions.

Table 2

Frequencies and percentages of SUMs and non-SUMs in the whole corpus

UMs Frequencies %
Non-SUMs 2,492 88.75
SUMs 316 11.25
Total 2,808 100

Although the difference between SUMs and non-SUMs varies along the four time periods (Table 3), SUMs are always lower than non-SUMs and they progressively decrease over time, ranging from 16.41% of the total UMs to 2.91%.

Table 3

SUMs and non-SUMs along the four time periods

Periods Total UMs Non-SUMs SUMs Nf2/Nf1 non-SUM to SUM ratio
Freq. Freq. % Nf1* (100,000) Freq. % Nf2 (100,000)
First 585 489 83.59 83589.74 96 16.41 16410.25 5.09
Second 820 707 86.22 86219.51 113 13.78 13780.48 6.25
Third 957 863 90.18 90177.63 94 9.82 9822.36 9.18
Fourth 446 433 97.09 97085.20 13 2.91 2914.79 33.36
Total 2,808 2,492 88.75 88746.43 316 11.25 11253.56 7.88

*Nf = normalised frequencies.

For each period, the difference between SUMs and non-SUMS is always statistically significant. Specifically, first period: (χ 2 (1, N = 585) = 264,1, p < 0.0001); second period: (χ 2 (1, N = 820) = 430,28, p < 0.0001); third period: (χ 2 (1, N = 957) = 617,93, p < 0.0001); and fourth period: (χ 2 (1, N = 446) = 395.51, p < 0.0001).

The histogram in Figure 1 graphically shows the inverse relation between SUMs and non-SUMs along the four periods. Where the non-SUMs increase, the SUMs decrease, until they almost disappear from the biomedical articles, in the final period.

Figure 1 
                  The inverse relation between SUMs and non-SUMs along the four time periods.

Figure 1

The inverse relation between SUMs and non-SUMs along the four time periods.

4.2 Quantitative results: SUMs in the whole corpus and in each period

Specifically, as for SUMs, the quantitative analysis reveals that in the whole corpus the most numerous category is that constituted by I/we modal verbs followed by that made up of I/we epistemic verbs (Table 4). Taken together, they represent 80% of the total SUMs in the whole corpus. Similarly, epistemic non-verbs conveying personal opinions and the subcategory of the verbs “to seem” and “to appear” represent only 20% of the total SUMs in the whole corpus. This indicates that when scientific writers have to communicate uncertainty by using explicit self-mentions, they seem to prefer to resort to modal and epistemic verbs rather than to epistemic non-verbs conveying personal opinions and to verbs such as seem(s) to me/us and appear(s) to me/us.

Table 4

Frequencies and percentages of SUM categories in the whole corpus

SUM categories Frequencies %
I/we modal verbs 136 43.04
I/we epistemic verbs 117 37.03
Epistemic non-verbs conveying personal opinions 49 15.51
Seem(s)/appear (s) 14 4.43
Total 316 100

If we consider separately each SUM category, their differences result statistically significant (χ 2 (3, N = 316) = 278,3, p < 0.0001).

Taking into consideration the four time periods separately (Table 5), the analysis reveals the following:

  1. (a)

    SUMs are mainly present in the first three periods (respectively, 96 out of 316 = 30.38%; 113 out of 316 = 35.76%; and 94 out of 316 = 29.75%);

  2. (b)

    SUM categories are differently used over time. While,

    1. in the first period, the most used SUMs are the epistemic verbs (51 occurrences out of 96 SUMs, which represent 53.13% of the total),

    2. in the second and third period, the most used SUMs are the modal verbs (54 out of 113 SUMs, i.e. 47.79% of the total, and 45 out of 94 SUMs, i.e. 47.87% of the total),

    3. in the fourth, although the total amount is very low for each category, the most used SUMs are the epistemic non-verbs conveying personal opinions (5 occurrences out of 13 SUMs, which represent 38.46% of the total).

Table 5

Frequencies and percentages of SUM categories in each time period

SUM categories
I/we modal verbs I/we epistemic verbs Seem(s) to me/us Epistemic non-verbs Total %
Appear(s) to me/us
Periods First Freq. 33 51 3 9 96
% (34.38) (53.13) (3.13) (9.38) (100) 30.38
Second Freq. 54 36 4 19 113
% (47.79) (31.86) (3.54) (16.81) (100) 35.76
Third Freq. 45 26 7 16 94
% (47.87) (27.66) (7.45) (17.02) (100) 29.75
Fourth Freq. 4 4 0 5 13
% (30.77) (30.77) (0) (38.46) (100) 4.1
Total Freq. 136 117 14 49 316
% (43.04) (37.03) (4.43) (15.51) (100) 100

In the following sections, we present the results of the quantitative analysis concerning each identified SUM category, both regarding the whole corpus and each time sub-corpus.

I/we modal verbs in the whole corpus. Out of I/we modal verbs category (which represents 43.04% of the total SUMs), as shown in Table 6, the most used is the subcategory I/we may (34.56%). If we add to I/we may also I/we can (47 + 24), the percentage increases, reaching over 50% of the total.

Table 6

Frequencies and percentages of I/we modal verbs in the whole corpus

I/we modal verbs Frequencies % within I/we modal verbs (136)* % within SUM category (316)*
I/we may 47 34.56 14.87
I/we should 28 20.59 8.86
I/we can 24 17.65 7.59
I/we would (not) 20 14.71 6.33
I/we ought 7 5.15 2.22
I might 5 3.68 1.58
I/we could 4 2.94 1.27
I must 1 0.74 0.32
Total 136 100 43.04

*This table shows the frequencies and percentages of the I/we modal verbs in the whole corpus. Percentages have been calculated both within I/we modal verb category and within SUMs.

I/we modal verbs for each period. The use of I/we modal verbs displays (Table 7) a fluctuating trend over time: it increases from the first (24.26%) to the second period (39.71%) and decreases from the second to third (33.09%) until it almost disappears in the fourth (2.94%).

Table 7

Frequencies and percentages of I/we modal verbs in the four time periods

Periods I/we modal verbs
Frequencies %
First 33 24.26
Second 54 39.71
Third 45 33.09
Fourth 4 2.94
Total 136 100

The most frequent I/we epistemic verbs in the whole corpus. As shown in Table 8, out of the 117 occurrences of epistemic verbs in the first singular and plural person, those that are more frequently used are in the order “I/we think” (and similar expressions, like as “I am inclined to think,” “we are bid to think,” “I venture to think,” and so on) and “I/we believe” (and similar expressions, such as “we are every reason to believe,” and “we do not believe”). Taken together, they represent almost 70% of the total I/we epistemic verbs. Other verbs and verbal expressions, such as “I am not quite sure” have been labelled as “Other verbs.”

Table 8

Frequencies and percentages of the most frequent I/we epistemic verbs in the whole corpus

I/we epistemic verbs Frequencies % within I/we modal verbs (117) % within SUM category (316)
I/we think 58 49.57 18.35
I/we believe 23 19.66 7.28
Other verbs 36 30.77 11.39
Total 117 100 37.03

I/we epistemic verbs for each period. As shown in Table 9, the I/we epistemic verbs decrease progressively over time, ranging from 43.59% in the first period to 22.22% in the third one. In the fourth period, they almost disappear.

Table 9

Frequencies and percentages of I/we epistemic verbs in the four time periods

Periods I/we epistemic verbs
Frequencies %
First 51 43.59
Second 36 30.77
Third 26 22.22
Fourth 4 3.42
Total 117 100

The most frequent expressions of personal opinions in the whole corpus. As shown in Table 10, out of 49 occurrences of epistemic non-verbs conveying personal opinions, the most used are “in my opinion” (10 = 20.41%) and “in my/our experience” (7 = 14.29%), followed by “personally”[3] (5 = 10.20%) and “my (own) opinion” (4 = 8.16%). Other epistemic non-verbs, such as “to my mind,” “in our judgment,” etc., have been labelled as “Other expressions.”

Table 10

Frequencies and percentages of the most frequent epistemic non-verbs conveying personal opinions in the whole corpus

Epistemic non-verbs Frequencies % within epistemic non-verbs (49)* % within SUM category (316)
In my opinion 10 20.41 3.16
In my/our experience 7 14.29 2.22
Personally 5 10.20 1.58
My (own) opinion 4 8.16 1.27
Other expressions 23 49.94 7.28
Total 49 100 15.51

Expressions of personal opinions for each period. The use of epistemic non-verbs conveying personal opinions displays (Table 11) a fluctuating trend over time. It increases enormously from the first period to the second (from 18.37% to 38.78%), then it diminishes from the second to third (from 38.78% to 32.65%) and from the third to fourth (moving from 32.65% to 10.20%).

Table 11

Frequencies and percentages of epistemic non-verbs conveying personal opinions in the four time periods

Periods Epistemic non-verbs
Frequencies %
First 9 18.37
Second 19 38.78
Third 16 32.65
Fourth 5 10.20
Total 49 100

Seem(s)/Appear(s) to me/to us verbs in the whole corpus. Finally, the verbs “seem(s) to me/us” and “appear(s)” are the least used category of SUMs in the whole corpus (Table 12).

Table 12

Frequencies and percentages of the verbs “seem(s) to me/us” and “appear(s) to me/us” in the whole corpus

Seem(s)/appear(s) to me/to us Frequencies % within seem(s)/appear(s) to me/us (14) % within SUM category (316)
Seem(s) to me/us 8 57.14 2.53
Appear(s) to me 5 35.71 1.58
Do not seem to us 1 7.14 0.32
Total 14 100 4.43

Seem(s)/Appear(s) to me/to us verbs for each period. The verbs “seem(s) to me/us” and “appear(s)” increase slightly during the first three periods although their presence remains very low (three, four, and seven occurrences, respectively), and then they completely disappear in the fourth period (Table 13).

Table 13

Frequencies and percentages of the verbs “seem(s) to me/us” and “appear(s) to me/us” in the four time periods

Periods Seem(s)/appear(s) to me/to us
Frequencies %
First 3 21.43
Second 4 28.57
Third 7 50
Fourth 0 0
Total 14 100

“I” vs “weTable 14 shows that out of the 316 SUMs present in 52 articles of our corpus:

  1. 193 refer to the first singular person (e.g. “I think,” “seems to me,” etc.);

  2. 123 refer to the first plural person (e.g. “we believe,” “seem to us,” etc.).

Table 14

SUMs in the first singular and plural person in the whole corpus and in each time period

Periods SUMs in the first singular person SUMs in the first plural person Total Single author Two or more authors
First Freq. 73 23 96 15 0
% (76.04) (23.96) (100)
Second Freq. 77 36 113 17 0
% (68.14) (31.86) (100)
Third Freq. 43 51 94 11 2
% (45.74) (54.26) (100)
Fourth Freq. 0 13 13 0 7
% (0) (100) (100)
Total Freq. 193 123 316 43 9
% (61.08) (38.92) (100)

Additionally, it is interesting to note that “we” is used, as a rhetorical device, also when the articles are written by a single author, such as in the first three periods.

4.3 Scope

Analogous to what has been observed previously for the UMs, it is possible to notice that the amount of uncertainty (UMs + their linguistic scope) communicated by SUMs is always lower than the amount of uncertainty communicated by non-SUMs both in the whole corpus and in each time period.

Out of the total amount of uncertainty present in the whole corpus, the percentage of uncertainty communicated by SUMs is 15.97%, while that communicated by non-SUMs is 84.03% (Table 15).

Table 15

Uncertainty communicated by SUMs and by non-SUMs both in the whole corpus and in each time period

Periods Total uncertainty (no of words) Uncertainty communicated by SUMs % Nf 1 (100,000) Uncertainty communicated by non-SUMs % Nf 2 (100,000) Nf2/Nf1 non-SUM to SUM ratio
First 7,446 1,652 22.19 22186.40 5,794 77.81 77813.59 3.50
Second 11,018 2,181 19.79 19794.88 8,837 80.21 80205.11 4.05
Third 12,545 1,837 14.64 14643.28 10,708 85.36 85356.71 5.83
Fourth 5,671 188 3.32 3315.11 5,483 96.68 96684.88 29.12
Total 36,680 5,858 15.97 15970.55 30,822 84.03 8402.94 5.26

Although such difference changes over the four time periods, in each of them the amount of uncertainty communicated by SUMs is always lower than that communicated by non-SUMs.

Specifically, the uncertainty communicated by SUMs ranges from:

  1. 22.19% in the first period to

  2. 19.79% in the second one to

  3. 14.64% in the third to

  4. 3.32% in the final one.

Similarly, the uncertainty communicated by non-SUMs ranges inversely from:

  1. 77.81% in the first period to

  2. 80.21% in the second one to

  3. 85.36% in the third to

  4. 96.68% in the final one.

As in the case of the UMs, also in this case, the difference between uncertainty communicated by SUMs and by non-SUMs is statistically significant both considering each period separately (first period (χ 2 (1, N = 7,446) = 2304, p < 0.0001); second period (χ 2 (1, N = 11,018) = 4020, p < 0.0001); third period χ 2 (1, N = 12,545) = 672.9, p < 0.0001); and fourth period χ 2 (1, N = 5,671) = 4943.92, p < 0.0001)), and considering all articles together (χ 2 (1, N = 36,680) = 16,990, p < 0.0001).

In other words, there is a progressive disappearance of the uncertainty communicated by the first person, which is in favour of a progressive increase of uncertainty communicated without resorting to authors’ self-mentions.

Specifically, SUMs in the first period are as follows:

  1. 1.12 times higher than those present in the second (first period SUM to second period SUM ratio);

  2. 1.51 times higher than those present in the third (first period SUM to third period SUM ratio);

  3. 6.69 times higher than those present in the fourth (first period SUM to fourth period SUM ratio).

Two possible variables that could be taken into account for explaining these results are the types of articles and a more progressively stable use of the IMRaD structure.

4.4 SUMs in different types of articles

As for the types of articles, as mentioned before, due to the random selection, our corpus appears rather diversified. If we take into consideration such a variable, we can observe interesting differences in the amount of uncertainty communicated by SUMs and by non-SUMs (Table 16). Lectures are indeed the type of article in which the percentage of uncertainty communicated by SUMs is the highest (28.86%); original research papers are, on the other hand, the type of article in which the percentage of uncertainty communicated by non-SUMs (i.e. without resort to self-mention) is the highest (89.28%).

Table 16

Uncertainty communicated by SUMs and non-SUMs according to types of articles

Type of article Total uncertainty (no of words) Uncertainty communicated by SUMs (+their scope) % Nf 1 (one hundred thousand = 100,000) Uncertainty communicated by non-SUMs (+their scope) % Nf 2 (one hundred thousand = 100,000) Nf2/NF1 non-sum to sum ratio
Clinical cases [22] 6,631 1,493 22.52 22515.45 5,138 77.48 77484.54 3.44
Lectures*[5] 3,974 1,147 28.86 28862.60 2,827 71.14 71137.39 2.46
Letters [3] 371 53 14.29 14285.71 318 85.71 85714.28 6.00
Original research papers* [22] 10,898 1,168 10.72 10717.56 9,730 89.28 89282.43 8.33
Reports [5] 874 108 12.36 12356.97 766 87.64 87643.02 7.09
Reviews [23] 13,932 1,889 13.56 13558.71 12,043 86.44 86441.28 6.37
Total 36,680 5,858 15.97 15970.55 30,822 84.03 8402.94 5.26

If we compare the percentage of SUMs in the Original research papers with those present in Lectures and Clinical cases, we obtain that they are:

  1. 2.69 times lower than those present in the Lectures (Lecture SUM to Original research paper SUM ratio);

  2. 2.10 times lower than those present in the Clinical cases (Clinical case SUM to Original research paper SUM ratio).

Although, as claimed above, there are interesting differences in the amount of uncertainty communicated by SUMs and by non-SUMs in different types of articles, nonetheless, the uncertainty communicated by non-SUMs is always greater than that communicated by SUMs. Also in this case, the difference between them (i.e. uncertainty communicated by SUMs and by non-SUMs) is statistically significant. Specifically:

Clinical cases: χ 2 (1, N = 6,631) = 2003.6, p < 0.0001;

Lectures: χ 2 (1, N = 3,974) = 710.21, p < 0.0001;

Letters: χ 2 (1, N = 371) = 189.28, p < 0.0001;

Original research papers: χ 2 (1, N = 10,898) = 6726.72, p < 0.0001;

Reports: χ 2 (1, N = 874) = 495.38, p < 0.0001;

Reviews: χ 2 (1, N = 13,932) = 7400.49, p < 0.0001.

4.5 SUMs in IMRaD articles’ third and fourth periods

For the second variable, i.e. the IMRaD vs non-IMRaD structure of the articles, as mentioned above, our corpus is also rather diversified: out of the 80 articles, 21 (11 in the third period and 10 in the fourth one) have an IMRaD structure (or similar), while 59 do not present such structure.

Taking into account such variable, it is possible to observe some differences in the amount of uncertainty communicated by SUMs and non-SUMs (Table 17).

Table 17

Uncertainty communicated by SUMs and non-SUMs in the whole corpus and in IMRaD and non-IMRaD articles

Structure of the articles Total uncertainty (no of words) Uncertainty communicated by SUMs (+ their scope) % Nf 1 (one hundred thousand = 100,000) Uncertainty communicated by non-SUMs (+ their scope) % Nf 2 (one hundred thousand = 100,000) Nf2/NF1 non-sum to sum ratio
59 non-IMRaD 25,811 4,728 18.32 18317.77 21,083 81.68 81682.22 8.61
21-IMRaD 10,869 1,130 10.39 10396.54 9,739 89.60 89603.45 8.62
80 IMRaD + non-IMRaD 36,680 5,858 15.97 15970.55 30,822 84.03 8402.94 5.26

Furthermore, the following should be noted:

  1. the percentage of uncertainty communicated by SUMs in the whole corpus is 15.97% of the total uncertainty (i.e. the uncertainty communicated by SUMs and by non-SUMs);

  2. the uncertainty communicated by SUMs in the 59 articles without IMRaD structure is 18.32% (i.e. almost three-point percentage higher than that in the whole corpus) of the total uncertainty;

  3. the uncertainty communicated by SUMs in the 21 IMRaD articles (in third and fourth periods) is 10.39% (i.e. almost six-point percentage lower than that in the whole corpus) of the total uncertainty present in this sub-corpus.

This means that SUMs in IMRaD articles are 1.76 times lower than those present in non-IMRaD articles (non-IMRaD SUM to original IMRaD SUM ratio).

Nonetheless, the uncertainty communicated by non-SUMs is greater than the uncertainty communicated by SUMs both in non-IMRaD articles and in IMRaD ones, and it is statistically significant (non-IMRaD articles = (χ 2 (1, N = 25,811) = 10,363, p < 0.0001; IMRaD articles = (χ 2 (1, N = 10,869) = 6,819, p < 0.0001)).

5 Discussion and conclusion

This study is a part of a wider research project aiming at exploring the communication of certainty and uncertainty in medical articles from a diachronic perspective. The main results of this project regarded both theoretical and practical contributions. The latter concern the construction and training of an algorithm for the automatic detection of UMs for the English language (Omero et al. 2020). The purpose of the current study is mainly to describe the variations over time in the use of subjectivity UMs in a diachronic corpus of medical articles, aiming at contributing to the research on diachronic variation in scientific writing, particularly regarding hedging and subjectivity strategies.

Scientific writing is a symbolic and rhetorical practice, historically and socially constructed, which evolves over time (Bazerman 1988).

“There is no ‘faceless’ writing, and all stance choices are important rhetorical decisions that affect how the message is received and the ways readers react to a text” (Hyland and Jiang 2016, 258).

The results of our investigation, aiming at focussing on those markers of subjectivity by which the authors communicate uncertainty, are consistent with other diachronic studies on the expressions of the authors’ subjectivity in medical articles showing a progressive decrease of self-mentions (e.g. Atkinson 1992, 1996). Several other non-diachronic studies have highlighted this trend, which seems to be related to an increase in the use of passive voice (e.g. Amdur et al. 2010, Hyland 2001, Rundblad 2007, Segal 1993).

For that matter, as Hyland and Jiang (2016, 265) claim, in the sciences in general, it is common, consolidated practice

for writers to downplay their personal role to highlight the phenomena under study, the replicability of research activities, and the generality of the findings, subordinating their own voice to that of unmediated nature. Such a strategy subtly conveys an empiricist ideology that suggests research outcomes would be the same irrespective of the individual conducting it.

As, to the best of our knowledge, there are neither studies that deal specifically with hedging functions of markers of subjectivity focused on diachronic medical corpora, it is difficult to perform specific comparisons.

However, our results seem, on the one hand, to be consistent with both those by Hyland (1998, 364), according to whom there is a “predominant view of science as an impersonal, inductive enterprise,” and to some extent, those by Hyland and Jiang (2016) who, although by analysing non-medical corpora, highlight a progressive decrease (from 1965 to 2015) in the use of both hedging and author/s’ self-mentions, surprisingly, more evident in the soft fields and, particularly in applied linguistics, than in hard sciences (as electronic engineering and biology): they try to explain this trend with the increasing movement in these soft fields toward more an “author-evacuated” prose, which mimic hard science practices and goes together with the current orientation toward more “objective,” empirically grounded and quantitative approaches. Nonetheless, we can mention a recent work by Poole et al. (2019), based on the analysis of a diachronic corpus of articles (1972–2017) pertaining to a non-medical hard field (biochemical research), whose results diverge from those by Hyland and Jiang (2016). This study reveals a consistent decrease in the use of epistemic stance items indexing uncertainty and an increase in boosters, but the authors explain this trend associating it with the peculiarity of the corpus analysed: not a generic collection of biochemical articles, but a specific selection based on a specific common topic (chemotaxis). In their view, high levels of confidence and certainty are connected to the specialised corpus and they conclude that “author presence as reflected in epistemic stance features becomes less overt as a discipline adopts a shared understanding of a phenomenon” (Poole et al. 2019, 9).

On the other hand, our results seem to diverge from those by Millar et al. (2013), who noted a fewer use of passive constructions in a corpus of randomised control trials taken from BMJ (2005), rather than in other medical journals, probably due to the BMJ guidelines (https://www.bmj.com/about-bmj/resources-authors/house-style, last accessed on July 2019), which recommend the usage of the active voice and the first person where necessary. This does not seem to be valid for SUMs. Indeed, the main results of our study show that in the articles of BMJ that we have analysed, SUMs decrease over time, until they progressively almost disappear.

The overall results of our investigation, particularly, reveal that:

  1. a)

    SUMs are less used than non-SUMs in the whole corpus and in the articles referring to each period;

  2. b)

    SUMs diminish over time;

  3. c)

    among SUMs the most used are I/we modal verbs followed by I/we epistemic verbs, but they decrease in the third and almost disappear in the fourth one;

  4. d)

    although SUMs in the first singular person are globally more numerous than SUMs in the first plural person, their proportion changes in the third and fourth periods (although in the third period almost all the articles are written by a single author);

  5. e)

    the quantity of uncertainty communicated by SUMs in IMRaD article subcorpus (referring to third and fourth periods) is lower than that communicated by SUMs in non-IMRaD article subcorpus (referring to third and fourth periods).

Even though it is impossible to identify with certainty all the variables responsible for the observed progressive eclipse of SUMs, it is reasonable to suppose a role for the following ones.

The evolution in medical research and practice. During the timespan under consideration, the observations made by a single doctor who treated a single case (or a small number of cases) have been progressively replaced by laboratory-based experimental studies, epidemiological studies, cluster randomised trials, etc., performed by research teams (sometimes heterogeneous, regarding specific competencies, specialisations, nationality, etc.). At the same time, new and advanced methods and technologies have been developed (regarding research tools, diagnosis, treatments, etc.), and this allowed scholars to reach more and more objectivity in their results and to limit subjectivity.

The evolution in medical scientific writing. Changes in medical research and practice have clearly influenced medical written communication. Top-rated medical journals have gradually given attention to different types of articles. If until the latter half of the nineteenth century a great part of the articles were case reports, presented by the authors (mostly single authors) using a personal, narrative style, things gradually changed. Indeed, it is possible to observe the progressive diminishing of case studies by single authors in favour of articles reporting results of more and more complex and highly specialised studies, conducted by teams of medical researchers. In line with these changes in types of studies and articles, the rhetorical style used in their communication has also changed: non-narrative, but descriptive, non-subjective, but objective, and increasingly more structured.

We should consider that the American National Standards Institute formally defined the IMRAD structure only in 1979. As Skelton (1997) claims, the set of constraints on the presentation of research in medical journals was published, for the first time, by four different medical journals (among which BMJ and The Lancet) in 1979 with the title “International Steering Committee. Uniform requirements for manuscripts submitted to biomedical journals.” These recommendations, known as Vancouver Style (the place where the Committee first effectively came together in 1978), consist of a set of “uniform technical requirements,” concerning ethical procedures to be followed, how a bibliography should be presented, the elements of the structure. The fourth version of these recommendations was published in the BMJ by the International Committee of Medical Journal (1991), where the division in sections with the headings Introduction, Methods, Results and Discussion for observational and experimental articles was defined.

Clearly, scientific, epistemological, and social changes affect the styles of writing over time (Atkinson 1992, 1996). In this study, in order to assess the progressive decreasing of SUMs, we focussed our attention particularly on two variables: the types of articles and the presence of IMRaD structure.

Different types of articles resort differently to SUMs in order to communicate the writers’ commitment towards the propositional contents. Specifically, our analysis reveals that, as expected, SUMs are more numerous in Lectures and Clinical cases rather than in Original research papers, where data and findings are conveyed in a more objective way, especially in the fourth period. In other words, when communicating their uncertainty, authors use markers of subjectivity above all in Letters and Clinical cases (types of articles prevalently gathered in the first two periods).

Analogously, the IMRaD structure (present in our corpus only since the third period, 1921–1960) seems also to affect the use of SUMs. In other terms, IMRaD articles have a lower number of SUMs. SUMs, as well as UMs, as expectable, occur more frequently in Introduction and Discussion. These results are consistent with those by Skelton (1997) and Skelton and Edwards (2000), and also with those emerging from a recent study by Keramati et al. (2019), who specifically analysed stance and engagement markers in a diachronic corpus of articles (1996–2016) characterised by IMRaD structure, extracted from three leading journals in the field of applied linguistics. The authors observe a significant decrease in the use of self-mention in the Method section (that might be connected to a more hard science orientation of researchers even in the applied linguistic domain), but also its massive rise in the Introduction, by interpreting this result as evidence of the development of a promotional and consumer-oriented discourse.

Our previous analyses (Bongelli et al. 2019, Zuczkowski et al. 2016) reveal that uncertainty decreases over time although not in a significant way. Within this diminishing uncertainty, that communicated by self-mentions (i.e. by SUMs) similarly decreases.

  1. Author contributions: Conceptualisation: IR, RB and AZ; methodology: IR and RB; data analysis IR and RB; writing (original draft preparation): IR and RB; writing (review and editing): IR, RB, AZ. All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  2. Conflict of interest: Authors state no conflict of interest.

References

Aijmer, Karin. 1980. “Evidence and the declarative sentence.” Acta Universitatis Stockholmiensis. Stockholm Studies in English Stockholm 53, 3–150.Search in Google Scholar

Agarwal, Shashank and Hong Yu. 2010. “Detecting hedge cues and their scope in biomedical text with conditional random fields.” Journal of biomedical informatics 43(6), 953–61.Search in Google Scholar

American National Standards Institute. 1979. American national standard for the preparation of scientific papers for written or oral presentation. New York: The Institute.Search in Google Scholar

Amdur, Robert J., Jessica Kirwan, and Christopher G. Morris. 2010. “Use of the passive voice in medical journal articles.” AMWA Journal: American Medical Writers Association Journal 25(3), 98–104.Search in Google Scholar

Atkinson, Dwight. 1992. “The evolution of medical research writing from 1735 to 1985. The case of the Edinburgh medical journal.” Applied Linguistics 13(4), 337–74.Search in Google Scholar

Atkinson, Dwight. 1996. “The philosophical transactions of the royal society of London, 1675–1975: A sociohistorical discourse analysis.” Language in Society 25(3), 333–71.Search in Google Scholar

Bazerman, Charles. 1988. Shaping written knowledge. Madison: University of Wisconsin Press.Search in Google Scholar

Biber, Douglas. 1988. Variation across speech and writing. Cambridge and New York: Cambridge University Press.Search in Google Scholar

Bongelli, Ramona, Carla Canestrari, Ilaria Riccioni, Andrzej Zuczkowski, Cinzia Buldorini, Ricardo Pietrobon, Alberto Lavelli, and Bernardo Magnini. 2012. “A corpus of scientific biomedical texts spanning over 168 years annotated for uncertainty.” In: Proceedings of the Eight International conference on language resources and evaluation (LREC'12), 23–25 May 2012, Istanbul 2012, Turkey, vol. 12, eds. Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis. European Language Resources Association (ELRA). 2009–2014. [Online] Available from: http://www.lrec-conf.org/proceedings/lrec2012/index.html [Accessed: 20th June 2012].Search in Google Scholar

Bongelli, Ramona, Ilaria Riccioni, Laura Vincze, and Andrzej Zuczkowski. 2018. “Questions and epistemic stance: Some examples from Italian conversations.” Ampersand 5, 29–44. [Online] Available from: https://www.sciencedirect.com/science/article/pii/S2215039018300444 [Accessed: 5th September 2019].Search in Google Scholar

Bongelli, Ramona, Ilaria Riccioni, Roberto Burro, and Andrzej Zuczkowski. 2019. “Writers’ uncertainty in scientific and popular biomedical articles. A comparative analysis of the British Medical Journal and Discover Magazine.” Plos One 14(9), e0221933. [Online] Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6728051/ [Accessed: 5th September 2019].Search in Google Scholar

Caffi, Claudia. 2007. Mitigation, studies in pragmatics. Amsterdam: Elsevier.Search in Google Scholar

Crompton, Peter. 1997. “Hedging in academic writing: Some theoretical problems.” English for Specific Purposes 16(4), 271–87.Search in Google Scholar

Crompton, Peter. 1998. “Identifying hedges: Definition or divination?.” English for Specific Purposes 17(3), 303–11.Search in Google Scholar

Dudley-Evans, Tony. 1994. “Academic text: The importance of the use and comprehension of hedges.” ASp. la revue du GERAS 5–6, 131–9.Search in Google Scholar

Farkas, Richárd, Veronika Vincze, György Móra, János Csirik, and György Szarvas. 2010. “The CoNLL-2010 shared task: learning to detect hedges and their scope in natural language text.” In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning-Shared Task, 15–16 July 2010, Uppsala, Sweden, eds. Richárd Farkas, Veronika Vincze, György Szarvas, György Móra, and János Csirik, p. 1–12. Association for Computational Linguistics.Search in Google Scholar

Fløttum, Kjersti. 2005. “The self and the others: polyphonic visibility in research articles.” International Journal of Applied Linguistics 15(1), 29–44.Search in Google Scholar

Fløttum, Kjersti. 2006. “We know report on…” versus “let us now see how.” authors roles and interaction with readers in research articles.” In: Academic discourse across disciplines, eds. Ken Hyland and Marina Bondi, vol. 42, p. 203–24. Bern: Peter Lang.Search in Google Scholar

Fløttum, Kjersti. 2012. “Variation of stance and voice across cultures.” In: Stance and voice in written academic genres, eds. Ken Hyland and Carmen Sancho Guinda, p. 218–31. New York: Palgrave Macmillan.Search in Google Scholar

Gao, Xia. 2017. “A cross-disciplinary corpus-based study on English and Chinese native speakers’ use of first-person pronouns in academic English writing.” Text and Talk 38(1), 93–113.Search in Google Scholar

Grabar, Natalia, Pierre Chauveau-Thoumelin, and Loïc Dumonet. 2016. “Medical discourse and subjectivity.” In: Advances in knowledge discovery and management, eds. Fabrice Guillet, Djamel A. Zighed and Gilbert Ritschard, p. 33–54. Springer International Publishing.Search in Google Scholar

Gross, Alan G. et al. 2002. Communicating science. The scientific paper from the 17th century to the present. New York: Oxford University Press.Search in Google Scholar

Holmes, Janet. 1984. “Modifying illocutionary force.” Journal of Pragmatic 8(3), 345–65.Search in Google Scholar

Holmes, Janet. 1988. “Doubt and certainty in ESL textbooks.” Applied linguistics 9(1), 21–44.Search in Google Scholar

Hyland, Ken. 1994. “Hedging in academic writing and EAF textbooks.” English for specific purposes 13(3), 239–56.Search in Google Scholar

Hyland, Ken. 1995. “The author in the text: hedging scientific writing.” Hong Kong Papers in Linguistics and Language Teaching 18, 33–42.Search in Google Scholar

Hyland, Ken. 1998a. Hedging in scientific research articles. Amsterdam/Philadelphia: John Benjamins Publishing.Search in Google Scholar

Hyland, Ken. 1998b. “Boosting, hedging and the negotiation of academic knowledge.” Text-Interdisciplinary Journal for the Study of Discourse 18(3), 349–82.Search in Google Scholar

Hyland, Ken. 2001. “Humble servants of the discipline? Self-mention in research articles.” English for specific purposes 20(3), 207–26.Search in Google Scholar

Hyland, Ken. 2002. “Authority and invisibility: Authorial identity in academic writing.” Journal of pragmatics 34(8), 1091–112.Search in Google Scholar

Hyland, Ken. 2004. Social interactions in academic writing. Ann Arbor: The University of Michigan Press.Search in Google Scholar

Hyland, Ken. 2014. “English for academic purposes.” In The Routledge companion to English studies, eds. Constant Leung and Brian V Street, p. 392–404. Abingdon (UK): Routledge.Search in Google Scholar

Hyland, Ken and Feng Jiang. 2016. “Change of attitude? A diachronic study of stance.” Written Communication 33(3), 251–74.Search in Google Scholar

Hyland, Ken and Feng Jiang. 2018. “‘In this paper we suggest’: Changing patterns of disciplinary metadiscourse.” English for Specific Purposes 51, 18–30.Search in Google Scholar

Hyland, Ken and John Milton. 1997. “Qualification and certainty in L1 and L2 students’ writing.” Journal of second language writing 6(2), 183–205.Search in Google Scholar

International Committee of Medical Journal Editors. 1991. “Uniform requirements for manuscripts submitted to biomedical journals.” British Medical Journal 302, 338–41.Search in Google Scholar

Keramati, Shirin Rezaei, Davud Kuhi, and Mahnaz Saeidi. 2019. “Cross-sectional diachronic corpus analysis of stance and engagement markers in three leading journals of applied linguistics.” Journal of Modern Research in English Language Studies 6(2), 1–25.Search in Google Scholar

Khedri, Mohsen. 2016. “Are we visible? An interdisciplinary data-based study of self-mention in research articles.” Poznan Studies in Contemporary Linguistics 52(3), 403–30.Search in Google Scholar

Kim, Jin-dong, Tomoko Ohta, Sampo Pyysalo, Yoshinobu Kano, and Jun’ichi Tsujii. 2009. “Overview of BioNLP’09 shared task on event extraction.” In: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task. Association for Computational Linguistics, p. 1–9.Search in Google Scholar

Millar, Neil, Brian Budgell, and Keith Fuller. 2013 “‘Use the active voice whenever possible’: The Impact of Style Guidelines in Medical Journals.” Applied Linguistics 34(4), 393–414.Search in Google Scholar

Mur-Dueñas, María Pilar, and Jolanta Šinkūnienė. 2016. “Self-reference in research articles across Europe and Asia: a review of studies.” Brno Studies in English 42(1), 71–92.Search in Google Scholar

Myers, Greg. 1989. “The pragmatics of politeness in scientific articles.” Applied Linguistics 10(1), 1–35.Search in Google Scholar

Nuyts, Jan. 2015. “Subjectivity: Between discourse and conceptualization.” Journal of Pragmatics 86, 106–10.Search in Google Scholar

Omero, Paolo, Massimiliano Valotto, Riccardo Bellana, Ramona Bongelli, Ilaria Riccioni, Andrzej Zuczkowski, and Carlo Tasso. 2020. “Writer’s uncertainty identification in scientific biomedical articles: a tool for automatic if-clause tagging.” Language Resources and Evaluation 54, 1161–1181.Search in Google Scholar

Özgür, Arzican, and Dragomir R. Radev. 2009. “Detecting speculations and their scopes in scientific text.” In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 6–7 August, Singapore, eds. Philipp Koehn and Rada Mihalcea, vol. 3, p. 1398–407. Association for Computational Linguistics.Search in Google Scholar

Poole, Robert, Andrew Gnann, and Gus Hahn-Powell. 2019. “Epistemic stance and the construction of knowledge in science writing: A diachronic corpus study.” Journal of English for Academic Purposes 42, 100784.Search in Google Scholar

Riccioni, Ilaria, Ramona Bongelli, Gill Philip, and Andrzej Zuczkowski. 2018. “Dubitative questions and epistemic stance.” Lingua 207, 71–95.Search in Google Scholar

Rongen Breivega, Kjersti, Trine Dahl, and Kjertsi Fløttum. 2002. “Traces of self and others in research articles. A comparative pilot study of English, French and Norwegian research articles in medicine, economics and linguistics.” International Journal of Applied Linguistics 12(2), 218–39.Search in Google Scholar

Rozumko, Agata. 2017. “Adverbial markers of epistemic modality across disciplinary discourses: A contrastive study of research articles in six academic disciplines.” Studia Anglica Posnaniensia 52(1), 73–101.Search in Google Scholar

Rundblad, Gabriella. 2007. “Impersonal, general, and social. The use of metonymy versus passive voice in medical discourse.” Written Communication 24(3), 250–77Search in Google Scholar

Salager-Meyer, Françoise. 1994. “Hedges and textual communicative function in medical English written discourse.” English for specific purposes 13(2), 149–70.Search in Google Scholar

Salager-Meyer, Françoise. 1997. “I think that perhaps you should: A study of hedges in written scientific discourse.” In: Functional approaches to written text: classroom applications, ed. Tom Miller, p. 127–43. Washington DC: United States Information Agency.Search in Google Scholar

Salager-Meyer, Françoise. 1999a. “Referential behavior in scientific writing: A diachronic study (1810–1995).” English for specific purposes 18(3), 279–305.Search in Google Scholar

Salager-Meyer, Françoise. 1999b. “Contentiousness in written medical English discourse: A diachronic study (1810–1995).” Text-Interdisciplinary Journal for the Study of Discourse 19(3), 371–98.Search in Google Scholar

Scott, Mike. 2012. WordSmith tools version 6. Stroud: Lexical Analysis SoftwareSearch in Google Scholar

Segal, Judy Z. 1993. “Strategies of influence in medical authorship.” Social Science and Medicine 37(4), 521–30.Search in Google Scholar

Shehzad, Wasima. 2007. “Explicit author in scientific discourse: A corpus-based study of the author’s voice.” Malaysian Journal of ELT Research 3(1), 18.Search in Google Scholar

Skelton, John. 1988. “The care and maintenance of hedges.” ELT journal 42(1), 37–43.Search in Google Scholar

Skelton, John. 1997. “The representation of truth in academic medical writing.” Applied Linguistics, 18(2), 121–40.Search in Google Scholar

Skelton, John and Sarah Edwards. 2000. “The function of the discussion section in academic medical writing.” British Medical Journal 320(7244): 1269–70.Search in Google Scholar

Szarvas, György, Veronika Vincze, Richárd Farkas, György Mora, and Iryna Gurevych. 2012. “Cross-genre and cross-domain detection of semantic uncertainty.” Computational Linguistics, 38(2), 335–67.Search in Google Scholar

Vincze, Veronika, György Szarvas, Richárd Farkas, György Móra, and János Csirik. 2008. “The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes.” BMC bioinformatics 9(11), S9.Search in Google Scholar

Vold, Eva Thue. 2006. “Epistemic modality markers in research articles: a cross‐linguistic and cross‐disciplinary study.” International Journal of Applied Linguistics 16(1), 61–87.Search in Google Scholar

Walková, Milada. 2019. “A three-dimensional model of personal self-mention in research papers.” English for Specific Purposes 53, 60–73.Search in Google Scholar

Zhou, Huiwei, Degen Huang, Xiaoyan Li, and Yuansheng Yang. 2011. “Combining structured and flat features by a composite kernel to detect hedges scope in biological texts.” Chinese Journal of Electronics 20(3), 476–82.Search in Google Scholar

Zhou, Huiwei, Huijie Deng, Degen Huang, and Minling Zhuet. 2015. “Hedge scope detection in biomedical texts: An effective dependency-based method.” PloS One 10(7), e0133715. [Online] Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4517914/ [Accessed: 30 May 2016].Search in Google Scholar

Zou, Bowei, Guodong Zhou, and Qiaoming Zhu. 2013. Tree kernel-based negation and speculation scope detection with structured syntactic parse features. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, p. 968–76.Search in Google Scholar

Zuczkowski, Andrzej, Ramona Bongelli, Ilaria Riccioni, Massimiliano Valotto, and Roberto Burro. 2016. “Writers’ uncertainty in a corpus of scientific biomedical articles with a diachronic perspective.” In: Yearbook of corpus linguistics and pragmatics 2016, ed. Jesús Romero-Trillo, p. 203–41. Cham (CH): Springer International Publishing.Search in Google Scholar

Zuczkowski, Andrzej, Ramona Bongelli, and Ilaria Riccioni. 2017. Epistemic stance in dialogue: knowing, unknowing, believing. Amsterdam/Philadelphia: John Benjamins Publishing.Search in Google Scholar

Zuczkowski, Andrzej, Ramona Bongelli, Ilaria Riccioni, and Gill Philip. 2021. Questions and epistemic stance in contemporary spoken British English. Newcastle upon Tyne (UK): Cambridge Scholars Publishing.Search in Google Scholar

BMJ References

Barrett, John. 1848. “Observations on scurvy as it was developed in Bath and its neighbourhood, in the spring of 1847.” BMJ, 173–7.Search in Google Scholar

Burn, J. H. and M. J. Rand. 1958. “Action of nicotine on the heart.” BMJ, 137–9.Search in Google Scholar

Colebrook, Leonard. 1933. “Puerperal fever: its aetiology and prevention.” BMJ, 723–6.Search in Google Scholar

Lowther, Clifton P., and Richard W. D. Turner. 1963. “Guanethidine in the treatment of hypertension.” BMJ, 776–81.Search in Google Scholar

Oliver, James. 1907. “The determinants of abortion and how to combat them.” BMJ, 1567–70.Search in Google Scholar

Radford, Thomas. 1849. “A successful case of caesarean section, with remarks.” BMJ, 456–60.Search in Google Scholar

Roper, Arthur C. 1908. “Perforated gastric ulcer: operation 44 hours after perforation: recovery.” BMJ, 785–6.Search in Google Scholar

Shapiro, C. M., J. R. Catterall, I Montgomery, G. M. Raab, and N. J. Douglas. 1986. “Do asthmatics suffer bronchoconstriction during rapid eye movement sleep?” BMJ, 1161–4.Search in Google Scholar

Sherman-Bigg, G. 1882. “A tropical fever.” BMJ, 607.Search in Google Scholar

Spriggs, Edmund. 1928. “The early recognition and treatment of cancer of the stomach.” BMJ, 838–40.Search in Google Scholar

Received: 2021-03-07
Revised: 2021-09-03
Accepted: 2021-09-06
Published Online: 2021-12-03

© 2021 Ilaria Riccioni et al., published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.