Methods Used in Economic Research: An Empirical Study of Trends and Levels

: The methods used in economic research are analyzed on a sample of all 3,415 regular research papers published in 10 general interest journals every 5th year from 1997 to 2017. The papers are classi ﬁ ed into three main groups by method: theory, experiments, and empirics. The theory and empirics groups are almost equally large. Most empiric papers use the classical method, which derives an operational model from theory and runs regressions. The number of papers published increases by 3.3% p.a. Two trends are highly signi ﬁ cant: The fraction of theoretical papers has fallen by 26 pp ( percentage points ) , while the fraction of papers using the classical method has increased by 15 pp. Economic theory predicts that such papers exag - gerate, and the papers that have been analyzed by meta - analysis con ﬁ rm the prediction. It is discussed if other methods have smaller problems.


Introduction
This paper studies the pattern in the research methods in economics by a sample of 3,415 regular papers published in the years 1997, 2002, 2007, 2012, and 2017 in 10 journals.The analysis builds on the beliefs that truth exists, but it is difficult to find, and that all the methods listed in the next paragraph have problems as discussed in Sections 2 and 4. Hereby I do not imply that allor even mostpapers have these problems, but we rarely know how serious it is when we read a paper.A key aspect of the problem is that a "perfect" study is very demanding and requires far too much space to report, especially if the paper looks for usable results.Thus, each paper is just one look at an aspect of the problem analyzed.Only when many studies using different methods reach a joint finding, we can trust that it is true.
Section 3 discusses the sample of journals chosen.The choice has been limited by the following main criteria: It should be good journals below the top ten A-journals, i.e., my article covers B-journals, which are the journals where most research economists publish.It should be general interest journals, and the journals should be so different that it is likely that patterns that generalize across these journals apply to more (most?)journals.The Appendix gives some crude counts of researchers, departments, and journals.It assesses that there are about 150 B-level journals, but less than half meet the criteria, so I have selected about 15% of the possible ones.This is the most problematic element in the study.If the reader accepts my choice, the paper tells an interesting story about economic research.
All B-level journals try hard to have a serious refereeing process.If our selection is representative, the 150 journals have increased the annual number of papers published from about 7,500 in 1997 to about 14,000 papers in 2017, giving about 200,000 papers for the period.Thus, the B-level dominates our science.Our sample is about 6% for the years covered, but less than 2% of all papers published in B-journals in the period.However, it is a larger fraction of the papers in general interest journals.
It is impossible for anyone to read more than a small fraction of this flood of papers.Consequently, researchers compete for space in journals and for attention from the readers, as measured in the form of citations.It should be uncontroversial that papers that hold a clear message are easier to publish and get more citations.Thus, an element of sales promotion may enter papers in the form of exaggeration, which is a joint problem for all eight methods.This is in accordance with economic theory that predicts that rational researchers report exaggerated results; see Paldam (2016Paldam ( , 2018)).For empirical papers, meta-methods exist to summarize the results from many papers, notably papers using regressions.Section 4.4 reports that metastudies find that exaggeration is common.
The empirical literature surveying the use of research methods is quite small, as I have found two articles only: Hamermesh (2013) covers 748 articles in 6 years a decade apart studies in three A-journals using a slightly different classification of methods,¹ while my study covers B-journals.Angrist, Azoulay, Ellison, Hill, and Lu (2017) use a machine-learning classification of 134,000 papers in 80 journals to look at the three main methods.My study subdivide the three categories into eight.The machinelearning algorithm is only sketched, so the paper is difficult to replicate, but it is surely a major effort.A key result in both articles is the strong decrease of theory in economic publications.This finding is confirmed, and it is shown that the corresponding increase in empirical articles is concentrated on the classical method.
I have tried to explain what I have done, so that everything is easy to replicate, in full or for one journal or one year.The coding of each article is available at least for the next five years.I should add that I have been in economic research for half a century.Some of the assessments in the paper will reflect my observations/experience during this period (indicated as my assessments).This especially applies to the judgements expressed in Section 4.

The eight categories
Table 1 reports that the annual number of papers in the ten journals has increased 1.9 times, or by 3.3% per year.The Appendix gives the full counts per category, journal, and year.By looking at data over two decades, I study how economic research develops.The increase in the production of papers is caused by two factors: The increase in the number of researchers.The increasing importance of publications for the careers of researchers.

(M1.1) Economic theory
Papers are where the main content is the development of a theoretical model.The ideal theory paper presents a (simple) new model that recasts the way we look at something important.Such papers are rare and obtain large numbers of citations.Most theoretical papers present variants of known models and obtain few citations.
In a few papers, the analysis is verbal, but more than 95% rely on mathematics, though the technical level differs.Theory papers may start by a descriptive introduction giving the stylized fact the model explains, but the bulk of the paper is the formal analysis, building a model and deriving proofs of some propositions from the model.It is often demonstrated how the model works by a set of simulations, including a calibration made to look realistic.However, the calibrations differ greatly by the efforts made to reach realism.Often, the simulations are in lieu of an analytical solution or just an illustration suggesting the magnitudes of the results reached.
Theoretical papers suffer from the problem known as T-hacking,² where the able author by a careful selection of assumptions can tailor the theory to give the results desired.Thus, the proofs made from the model may represent the ability and preferences of the researcher rather than the properties of the economy.

(M1.2) Statistical method
Papers reporting new estimators and tests are published in a handful of specialized journals in econometrics and mathematical statisticssuch journals are not included.In our general interest journals, some papers compare estimators on actual data sets.If the demonstration of a methodological improvement is the main feature of the paper, it belongs to (M1.2), but if the economic interpretation is the main point of the paper, it belongs to (M3.2) or (M3.3).³Some papers, including a special issue of Empirical Economics (vol.53-1), deal with forecasting models.Such models normally have a weak relation to economic theory.They are sometimes justified precisely because of their eclectic nature.They are classified as either (M1.2) or (M3.1), depending upon the focus.It appears that different methods work better on different data sets, and perhaps a trade-off exists between the user-friendliness of the model and the improvement reached.

(M1.3) Surveys
When the literature in a certain field becomes substantial, it normally presents a motley picture with an amazing variation, especially when different schools exist in the field.Thus, a survey is needed, and our sample contains 68 survey articles.They are of two types, where the second type is still rare: Here, the author reads the papers and assesses what the most reliable results are.Such assessments require judgement that is often quite difficult to distinguish from priors, even for the author of the survey.

(M1.3.2) Meta-studies
They are quantitative surveys of estimates of parameters claimed to be the same.Over the two decades from 1997 to 2017, about 500 meta-studies have been made in economics.Our sample includes five, which is 0.15%.⁴Metaanalysis has two levels: The basic level collects and codes the estimates and studies their distribution.This is a rather objective exercise where results seem to replicate rather well.⁵The second level analyzes the variation between the results.This is less objective.The papers analyzed by meta-studies are empirical studies using method (M3.2), though a few use estimates from (M3.1) and (M3.3).

(M2)
Experimental methods: subgroups (M2.1) and (M2.2) Experiments are of three distinct types, where the last two are rare, so they are lumped together.They are taking place in real life.Most of these experiments take place in a laboratory, where the subjects communicate with a computer, giving a controlled, but artificial, environment.⁶A number of subjects are told a (more or less abstract) story and paid to react in either of a number of possible ways.A great deal of ingenuity has gone into the construction of such experiments and in the methods used to analyze the results.Lab experiments do allow studies of behavior that are hard to analyze in any other way, and they frequently show sides of human behavior that are difficult to rationalize by economic theory.It appears that such demonstration is a strong argument for the publication of a study.
However, everything is artificialeven the payment.In some cases, the stories told are so elaborate and abstract that framing must be a substantial risk;⁷ see Levitt and List (2007) for a lucid summary, and Bergh and Wichardt (2018) for a striking example.In addition, experiments cost money, which limits the number of subjects.It is also worth pointing to the difference between expressive and real behavior.It is typically much cheaper for the subject to "express" nice behavior in a lab than to be nice in the real world.
(M2.2) Event studies are studies of real world experiments.They are of two types: (M2.2.1) Field experiments analyze cases where some people get a certain treatment and others do not.The "gold standard" for such experiments is double blind random sampling, where everything (but the result!) is preannounced; see Christensen and Miguel (2018).Experiments with humans require permission from the relevant authorities, and the experiment takes time too.In the process, things may happen that compromise the strict rules of the standard.⁸Controlled experiments are expensive, as they require a team of researchers.Our sample of papers contains no study that fulfills the gold standard requirements, but there are a few less stringent studies of real life experiments.
(M2.2.2) Natural experiments take advantage of a discontinuity in the environment, i.e., the period before and after an (unpredicted) change of a law, an earthquake, etc. Methods have been developed to find the effect of the discontinuity.Often, such studies look like (M3.2) classical studies with many controls that may or may not belong.Thus, the problems discussed under (M3.2) will also apply.

(M3) Empirical methods: subgroups (M3.1) to (M3.3)
The remaining methods are studies making inference from "real" data, which are data samples where the Note: Section 3.4 tests if the pattern observed in Table 3 is statistically significant.The Appendix reports the full data.
 6 Some experiments are more informal, they use classrooms or in rare cases phone interviews.I have even seen a couple of studies where it was unclear how the experiment was done.
7 If the issue has a low saliency for the subjects, the answers in polls and presumably in experiments depend upon the formulations of the story told.The word "framing" is used to cover the deliberate use of the formulation-dependency to reach results desired by the researcher.
 8 Justman (2018) studies the well-known STAR experiment in education and shows how such problems arise.New medical drugs have to go through a number of independent trials, and a meta-study of these trials.Big efforts are often made to reach the gold standard, but still the meta-study regularly shows biases.
researcher chooses the sample, but has no control over the data generating process.(M3.1)Descriptive studies are deductive.The researcher describes the data aiming at finding structures that tell a story, which can be interpreted.The findings may call for a formal test.If one clean test follows from the description,⁹ the paper is classified under (M3.1).If a more elaborate regression analysis is used, it is classified as (M3.2).Descriptive studies often contain a great deal of theory.
Some descriptive studies present a new data set developed by the author to analyze a debated issue.In these cases, it is often possible to make a clean test, so to the extent that biases sneak in, they are hidden in the details of the assessments made when the data are compiled.
(M3.2) Classical empirics has three steps: It starts by a theory, which is developed into an operational model.Then it presents the data set, and finally it runs regressions.
The significance levels of the t-ratios on the coefficient estimated assume that the regression is the first meeting of the estimation model and the data.We all know that this is rarely the case; see also point (m1) in Section 4.4.In practice, the classical method is often just a presentation technique.The great virtue of the method is that it can be applied to real problems outside academia.The relevance comes with a price: The method is quite flexible as many choices have to be made, and they often give different results.Preferences and interests, as discussed in Sections 4.3 and 4.4 below, notably as point (m2), may affect these choices.
(M3.3) Newer empirics.Partly as a reaction to the problems of (M3.2), the last 3-4 decades have seen a whole set of newer empirical techniques.¹⁰They include different types of VARs, Bayesian techniques, causality/ co-integration tests, Kalman Filters, hazard functions, etc.I have found 162 (or 4.7%) papers where these techniques are the main ones used.The fraction was highest in 1997.Since then it has varied, but with no trend.
I think that the main reason for the lack of success for the new empirics is that it is quite bulky to report a careful set of co-integration tests or VARs, and they often show results that are far from useful in the sense that they are unclear and difficult to interpret.With some introduction and discussion, there is not much space left in the article.Therefore, we are dealing with a cookbook that makes for rather dull dishes, which are difficult to sell in the market.
Note the contrast between (M3.2) and (M3.3): (M3.2) makes it possible to write papers that are too good, while (M3.3) often makes them too dull.This contributes to explain why (M3.2) is getting (even) more popular and the lack of success of (M3.3), but then, it is arguable that it is more dangerous to act on exaggerated results than on results that are weak.

The 10 journals
The 10 journals chosen are: Section 3.1 discusses the choice of journals, while Section 3.2 considers how journals deal with the pressure for publication.Section 3.3 shows the marked difference in publication profile of the journals, and Section 3.4 tests if the trends in methods are significant.

The selection of journals
As mentioned in the introduction, I assess that there is about 150 B-level journals, which are good journals below the top ten, and thus, the journals where most research economists publish.I have picked 10 of these.The choice has been somewhat reduced by three criteria: of the journal.¹¹(iii) They should be sufficiently different so that it is likely that patterns, which apply to these journals, tell a believable story about economic research.Note that (i) and (iii) require some compromises, as is evident in the choice of (J2), (J6), (J7), and (J8) (Table 4).
Methodological journals are excluded, as they are not interesting to outsiders.However, new methods are developed to be used in general interest journals.From studies of citations, we know that useful methodological papers are highly cited.If they remain unused, we presume that it is because they are useless, though, of course, there may be a long lag.
The choice of journals may contain some subjectivity, but I think that they are sufficiently diverse so that patterns that generalize across these journals will also generalize across a broader range of good journals.
The papers included are the regular research articles.Consequently, I exclude short notes to other papers and book reviews,¹² except for a few article-long discussions of controversial books.

Creating space in journals
As mentioned in the introduction, the annual production of research papers in economics has now reached about 1,000 papers in top journals, and about 14,000 papers in the group of good journals.¹³The production has grown with 3.3% per year, and thus it has doubled the last twenty years.The hard-working researcher will read less than 100 papers a year.I know of no signs that this number is increasing.Thus, the upward trend in publication must be due to the large increase in the importance of publications for the careers of researchers, which has greatly increased the production of papers.There has also been a large increase in the number of researches, but as citations are increasingly skewed toward the top journals (see Heckman & Moktan, 2018), it has not increased demand for papers correspondingly.The pressures from the supply side have caused journals to look for ways to create space.
Book reviews have dropped to less than 1/3.Perhaps, it also indicates that economists read fewer books than they used to.Journals have increasingly come to use smaller fonts and larger pages, allowing more words

 11
This means that open journals from small countries such as Canadian Journal of Economics, Kyklos and Scandinavian Journal are included, while the good regional journals from the USA are excluded and so are the main German and French journals.
12 Thus, from Vol. 41.3/5 of the European Economic Association the three survey papers have been included, while the 53 short proceeding papers have not been included.In addition, I have excluded special issues on the history of a journal or important researchers.
 13 Card and DellaVigna (2013) show that submissions to top-journals have doubled, while the number of articles has stayed fairly constant, so rejection rates have doubled.Thus, the pressures on the B-level journals has increased.Some of the journals analyzed have published the number of submissions and the rejection rate, both seem to have increased substantially.In addition, many more modest journals exist, and they seem to proliferate, notably because it is increasingly easy to publish on the net.
per page.The journals from North-Holland Elsevier have managed to cram almost two old pages into one new one.¹⁴This makes it easier to publish papers, while they become harder to read.Many journals have changed their numbering system for the annual issues, making it less transparent how much they publish.Only three -Canadian Economic Journal, Kyklos, and Scandinavian Journal of Economicshave kept the schedule of publishing one volume of four issues per year.It gives about 40 papers per year.Public Choice has a (fairly) consistent system with four volumes of two double issues per yearthis gives about 100 papers.The remaining journals have changed their numbering system and increased the number of papers published per yearoften dramatically.
Thus, I assess the wave of publications is caused by the increased supply of papers and not to the demand for reading material.Consequently, the study confirms and updates the observation by Temple (1918, p. 242): "… as the world gets older the more people are inclined to write but the less they are inclined to read."

How different are the journals?
The appendix reports the counts for each year and journal of the research methods.From these counts, a set of χ 2 -scores is calculated for the three main groups of methodsthey are reported in Table 5.It gives the χ 2 -test comparing the profile of each journal to the one of the other nine journals taken to be the theoretical distribution.
The test rejects that the distribution is the same as the average for any of the journals.The closest to the average is the EJPE and Public Choice.The two most deviating scores are for the most micro-oriented journal JEBO, which brings many experimental papers, and of course, Empirical Economics, which brings many empirical papers.

Trends in the use of the methods
Table 3 already gave an impression of the main trends in the methods preferred by economists.I now test if these impressions are statistically significant.The tests have to be tailored to disregard three differences between the journals: their methodological profiles, the number of papers they publish, and the trend in the number.Table 6 reports a set of distribution free tests, which overcome these differences.The tests are done on the shares of each research method for each journal.As the data cover five years, it gives 10 pairs of years to compare.¹⁵The three trend-scores in the []-brackets count how often the shares go up, down, or stay the same in the 10 cases.This is the count done for a Kendall rank correlation  14 During the work on this article I have had the opportunity to discuss with four editors, and in the period covered I have been a coeditor myself.Thus, I have seen these processes at work.
The first set of trend-scores for (M1.1) and (J1) is [1, 9, 0].It means that 1 of the 10 share-pairs increases, while nine decrease and no ties are found.The two-sided binominal test is 2%, so it is unlikely to happen.Nine of the ten journals in the (M1.1)-column have a majority of falling shares.The important point is that the counts in one column can be addedas is done in the all-row; this gives a powerful trend test that disregards differences between journals and the number of papers published.(Table A1) Four of the trend-tests are significant: The fall in theoretical papers and the rise in classical papers.There is also a rise in the share of stat method and event studies.It is that there is no trend in the number of experimental studies, but see Table A2 (in Appendix).

An attempt to interpret the pattern found
The development in the methods pursued by researchers in economics is a reaction to the demand and supply forces on the market for economic papers.As already argued, it seems that a key factor is the increasing production of papers.
The shares add to 100, so the decline of one method means that the others rise.Section 4.1 looks at the biggest changethe reduction in theory papers.Section 4.2 discusses the rise in two new categories.Section 4.3 considers the large increase in the classical method, while Section 4.4 looks at what we know about that method from meta-analysis.

The decline of theory: economics suffers from theory fatigue¹⁶
The papers in economic theory have dropped from 59.5 to 33.6%this is the largest change for any of the eight subgroups.¹⁷It is highly significant in the trend test.I attribute this drop to theory fatigue.As mentioned in Section 2.1, the ideal theory paper presents a (simple) new model that recasts the way we look at something important.However, most theory papers are less exciting: They start from the standard model and argue that a well-known conclusion reached from the model hinges upon a debatable assumptionif it changes, so does the conclusion.Such papers are useful.From a literature on one main model, the profession learns its strengths and weaknesses.It appears that no generally accepted method exists to summarize this knowledge in a systematic way, though many thoughtful summaries have appeared.I think that there is a deeper problem explaining theory fatigue.It is that many theoretical papers are quite unconvincing.Granted that the calculations are done right, believability hinges on the realism of the assumptions at the start and of the results presented at the end.
In order for a model to convince, it should (at least) demonstrate the realism of either the assumptions or the outcome.¹⁸If both ends appear to hang in the air, it becomes a game giving little new knowledge about the world, however skillfully played.
The theory fatigue has caused a demand for simulations demonstrating that the models can mimic something in the world.Kydland and Prescott pioneered calibration methods (see their 1991).Calibrations may be carefully done, but it often appears like a numerical solution of a model that is too complex to allow an analytical solution.

Two examples of waves: one that is still rising and another that is fizzling out
When a new method of gaining insights in the economy first appears, it is surrounded by doubts, but it also promises a high marginal productivity of knowledge.
Gradually the doubts subside, and many researchers enter the field.After some time this will cause the marginal productivity of the method to fall, and it becomes less interesting.The eight methods include two newer ones: Lab experiments and newer stats.¹⁹It is not surprising that papers with lab experiments are increasing, though it did take a long time: The seminal paper presenting the technique was Smith (1962), but only a handful of papers are from the 1960s.Charles Plott organized the first experimental lab 10 years laterthis created a new standard for experiments, but required an investment in a lab and some staff.Labs became more common in the 1990s as PCs got cheaper and software was developed to handle experiments, but only 1.9% of the papers in the 10 journals reported lab experiments in 1997.This has now increased to 9.7%, so the wave is still rising.The trend in experiments is concentrated in a few journals, so the trend test in Table 6 is insignificant, but it is significant in the Appendix Table A2, where it is done on the sum of articles irrespective of the journal.
In addition to the rising share of lab experiment papers in some journals, the journal Experimental Economics was started in 1998, where it published 281 pages in three issues.In 2017, it had reached 1,006 pages in four issues,²⁰ which is an annual increase of 6.5%.
Compared with the success of experimental economics, the motley category of newer empirics has had a more modest success, as the fraction of papers in the 5 years are 5.8, 5.2, 3.5, 5.4, and 4.2, which has no trend.Newer stats also require investment, but mainly in human capital.²¹Some of the papers using the classical methodology contain a table with Dickey-Fuller tests or some eigenvalues of the data matrix, but they are normally peripheral to the analysis.A couple of papers use Kalman filters, and a dozen papers use Bayesian VARs.However, it is clear that the newer empirics have made little headway into our sample of general interest journals.

The steady rise of the classical method: flexibility rewarded
The typical classical paper provides estimates of a key effect that decision-makers outside academia want to know.This makes the paper policy relevant right from the start, and in many cases, it is possible to write a one page executive summary to the said decision-makers.The three-step convention (see Section 2.3) is often followed rather loosely.The estimation model is nearly always much simpler than the theory.Thus, while the model can be derived from a theory, the reverse does not apply.Sometimes, the model seems to follow straight from common sense, and if the link from the theory to the model is thin, it begs the question: Is the theory really necessary?In such cases, it is hard to be convinced that  18 The methodological point goes back to the large discussion generated by Friedman (1953).19 The key inventor of each method was awarded a Nobel prize: Vernon Smith (in 2002) for Experimental Economics and Clive Granger (in 2003) for various Newer methods.The seminal papers were written in the 1960s and 1970s.


the tests "confirm" the theory, but then, of course, tests only say that the data do not reject the theory.
The classical method is often only a presentation devise.Think of a researcher who has reached a nice publishable result through a long and tortuous path, including some failed attempts to find such results.It is not possible to describe that path within the severely limited space of an article.In addition, such a presentation would be rather dull to read, and none of us likes to talk about wasted efforts that in hindsight seem a bit silly.
Here, the classical method becomes a convenient presentation device.
The biggest source of variation in the results is the choice of control/modifier variables.All datasets presumably contain some general and some special information, where the latter depends on the circumstances prevailing when the data were compiled.The regression should be controlled for these circumstances in order to reach the general result.Such ceteris paribus controls are not part of the theory, so many possible controls may be added.The ones chosen for publication often appear to be the ones delivering the "right" results by the priors of the researcher.The justification for their inclusion is often thin, and if two-stage regressions are used, the first stage instruments often have an even thinner justification.
Thus, the classical method is rather malleable to the preferences and interests of researchers and sponsors.This means that some papers using the classical technique are not what they pretend, as already pointed out by Leamer (1983), see also Paldam (2018) for new references and theory.The fact that data mining is tempting suggests that it is often possible to reach smashing results, making the paper nice to read.This may be precisely why it is cited.
Many papers using the classical method throw in some bits of exotic statistics technique to demonstrate the robustness of the result and the ability of the researcher.This presumably helps to generate credibility.

Knowledge about classical papers reached from meta-studies
A meta-study in economics analyzes a set of K (primary) papers, with estimates that claim to be of the same effect.²²The estimates are normally from studies using the classical method.At least 20,000 papers have been coded and analyzed.²³The meta-methods have now bloomed into meta-meta-studies covering thousands of papers; see Ioannidis, Stanley, and Doucouliagos (2017) covering 159 meta-studies of 6,700 primary papers and Doucouliagos, Paldam, and Stanley (2018) covering 101 metastudies of 4,300 primary papers.Three general results stand out: (m1) The range of the estimates is typically amazingly large, given the high t-ratios reported.This confirms that t-ratios are problematic as claimed in Section 2.3.(m2) Publication biases (exaggerations) are common, i.e., meta-analyses routinely reject the null hypothesis of no publication bias.My own crude rule of thumb is that exaggeration is by a factor twothe two meta-meta studies cited give some support to this rule.²⁴(m3) The meta-average estimated from all K studies normally converges, and for K > 30, the metaaverage normally stabilizes to a well-defined value, see Doucouliagos et al. (2018).
Individual studies using the classical method often look better than they are, and thus they are more uncertain than they appear, but we may think of the value of convergence for large Ns (number of observations) as the truth.The exaggeration is largest in the beginning of a new literature, but gradually it becomes smaller.Thus, the classical method does generate truth when the effect searched for has been studied from many sides.The word research does mean that the search has to be repeated!It is highly risky to trust a few papers only.  2 shows that 28.5% of the papers use the classical method.200,000 papers have been published since 1997.Assume that 75% are in general interest journals.This gives 0.285 × 0.75 × 200,000 ≈ 40,000 classical papers.Most papers covered by meta-analysis are from this group.Thus, a crude estimate is that the share of papers coded is 100 × 7,000/40,000 ≈ 17%.The papers covered are from a wide range of journals (or working papers) and some are from before 1997, but still I assess that 10% of all papers that might have been subjected to meta-analysis have actually been.This is a substantial sample, but we do not know how representative it is.24 The argument assumes that theory predicts that the effect studied is positive, such as one.Exaggeration therefore means that the published result is significantly larger than one, such as two.The (rare) case where theory predicts that the effect is zero is not discussed at present.

Methods Used in Economic Research  37
Meta-analysis has found other results such as: Results in top journals do not stand out.It is necessary to look at many journals, as many papers on the same effect are needed.Little of the large variation between results is due to the choice of estimators.
A similar development should occur also for experimental economics.Experiments fall in families: A large number cover prisoner's dilemma games, but there are also many studies of dictator games, auction games, etc.
Surveys summarizing what we have learned about these games seem highly needed.Assessed summaries of old experiments are common, notably in introductions to papers reporting new ones.It should be possible to extract the knowledge reached by sets of related lab experiments in a quantitative way, by some sort of metatechnique, but this has barely started.The first pioneering meta-studies of lab experiments do find the usual wide variation of results from seemingly closely related experiments.²⁵A recent large-scale replicability study by Camerer et al. (2018) finds that published experiments in the high quality journal Nature and Science exaggerate by a factor two just like regression studies using the classical method.

Conclusion
The study presents evidence that over the last 20 years economic research has moved away from theory towards empirical work using the classical method.
From the eighties onward, there has been a steady stream of papers pointing out that the classical method suffers from excess flexibility.It does deliver relevant results, but they tend to be too good.²⁶While, increasingly, we know the size of the problems of the classical method, systematic knowledge about the problems of the other methods is weaker.It is possible that the problems are smaller, but we do not know.
Therefore, it is clear that obtaining solid knowledge about the size of an important effect requires a great deal of papers analyzing many aspects of the effect and a careful quantitative survey.It is a well-known principle in the harder sciences that results need repeated independent replication to be truly trustworthy.In economics, this is only accepted in principle.
The classical method of empirical research is gradually winning, and this is a fine development: It does give answers to important policy questions.These answers are highly variable and often exaggerated, but through the efforts of many competing researchers, solid knowledge will gradually emerge.
Appendix: Two tables and some assessments of the size of the profession The text needs some numbers to assess the representativity of the results reached.These numbers just need to be orders of magnitude.I use the standard three-level classification in A, B, and C of researchers, departments, and journals.The connections between the three categories are dynamic and rely on complex sorting mechanisms.In an international setting, it matters that researchers have preferences for countries, notably their own.The relation between the three categories has a stochastic element.
The World of Learning organization reports on 36,000 universities, colleges, and other institutes of tertiary education and research.Many of these institutions are mainly engaged in undergraduate teaching, and some are quite modest.If half of these institutions have a program in economics, with a staff of at least five, the total stock of academic economists is 100,000, of which most are at the C-level.
The A-level of about 500 tenured researchers working at the top ten universities (mainly) publishes in the top 10 journals that bring less than 1,000 papers per year;²⁷ see Heckman and Moktan (2020).They (mainly) cite each other, but they greatly influence other researchers.²⁸The B-level consists of about 15-20,000 researchers who work at 4-500 research universities, with graduate programs and ambitions to publish.They (mainly) publish in the next level of about 150 journals.²⁹In addition, there are at least another 1,000 institutions that strive to move up in the hierarchy.
 27 Roughly, they publish half of their papers in the top-ten journals, and this fills half of these journals.28 They are surrounded by a large, ever changing "rim" of PhD students, non-tenured staff, visitors and research assistants, which gives a large scientific production and builds links to other institutions, notably at the B-level.29 As the promotion systems at universities in many countries use journal classifications, many lists of A, B and C journals exist, where the A list is 5-10.The lists are not the same, but they overlap.I hope that the number 150 corresponds to the number of general interest B-journals that the country of the reader recognizes.Note: The trend-scores are calculated as in Table 6.Compared to the results in Table 6, the results are similar, but the power is less than before.However, note that the results in Column (M2.1) dealing with experiments are stronger in Table A2.This has to do with the way missing observations are treated in the test.

 22
Stanley and Doucouliagos (2012) is a fine textbook of meta-analytical method in economics, whilePaldam (2015) is a brief intro.23 Table Table 2 lists the groups and main numbers discussed in the rest of the paper.Section 2.1 discusses (M1) theory.Section 2.2 covers (M2) experimental methods, while Section 2.3 looks at (M3) empirical methods using statistical inference from data.

Table 1 :
The 3,415 papers Card and DellaVigna (2013)es the pattern of author ages and coauthors.Card and DellaVigna (2013)further study the sociology of publications in the top journals. 2 The concept of T-hacking is used for the tailoring of theory to fit priors or interests of the author.T-hacking is closely related to datamining and overfitting discussed in Sections 4.3 and 4.4 on the classical method.Methods Used in Economic Research  29

Table 2 :
The 3,415 papersfractions in percent

Table 3 :
The change of the fractions from 1997 to 2017 in percentage points They should be general interest journalsmethodological journals are excluded.By general interest, I mean that they bring papers where an executive summary may interest policymakers and people in general.(ii) They should be journals in English (the Canadian Journal includes one paper in French), which are open to researchers from all countries, so that the majority of the authors are from outside the country  9 By a clean test, I mean a test that contains no control variables that are not part of the model.10 Thus, we distinguish between regression techniques (including extensions) and techniques that claim to be better.Some of the newer techniques are from the 1970 and thus not really new.Sometimes these techniques can be presented as an extra column in a table or a paragraph, but they tend to dominate the paper.

Table 4 :
The 10 journals covered Note.Growth is the average annual growth from 1997 to 2017 in the number of papers published.

Table 5 :
The methodological profile of the journals -χ 2 -scores for main groups The χ 2 -scores are calculated relative to all other journals.The sign (+) or (−) indicates if the journal has too many or too few papers relatively in the category.The P-values for the χ 2 (3)-test always reject that the journal has the same methodological profile as the other nine journals.

Table 6 :
Trend-scores and tests for the eight subgroups of methods across the 10 journals The three trend-scores in each [I 1 , I 2 , I 3 ]-bracket are a Kendall-count over all 10 combinations of years.I 1 counts how often the share goes up.I 2 counts when the share goes down, and I 3 counts the number of ties.Most ties occur when there are no observations either year.Thus, I 1 + I 2 + I 3 = 10.The tests are two-sided binominal tests disregarding the zeroes.The test results in bold are significant at the 5% level.
Hamermesh (2013)ting factor may be the shift in the relative costs making it relatively cheaper to do empirical papers.17Hamermesh(2013)andAngristetal. (2017)found the same fall in the share of theory-papers, using different samples.

Table A2 :
Counts, shares, and changes for all ten journals for subgroups