The rules and regulations for attaining PhD degrees vary substantially across different countries and topics. For instance, a large variety of differences in funding, tasks and duties, expectations towards publishing, and the overall duration of the PhD studies (and whether or not it is really “studying” or more of an employment) is commonplace.
In this blog post, I will briefly summarise how PhD studies in Software Engineering (SE) are organised in Sweden, followed by a summary of work we (Robert Feldt & myself) have done in the end of 2018 to obtain an overview of publication statistics for Swedish SE PhD students. This work was inspired by an earlier study done some time before that (with similar findings). You can find a summary of that work here.
One of the main reasons for writing this blog post (and for doing the work summarised within) is that, oftentimes, the academic model used in the United States is assumed to be the norm, despite huge differences to other countries. This post will clarify some of the differences in Sweden (which is, in turn, similar the situation in other Nordic countries).
SE PhD in Sweden
Obtaining a PhD in SE in Sweden requires three years of full-time research work. Additionally, this period is extended by another one to two years to obtain mandatory course credits and to do teaching duties, resulting in four to five years of PhD studies. Additionally, it is common to publish and defend a so-called “Licentiate” thesis after approximately half the time, sort of a dry run for the final PhD degree. Positions are fully funded, resulting in a salary that is somewhat lower than in industry. However, it is an actual salary, not a stipend or a “eat cheap pasta or stay hungry for five years” compensation. PhD salaries are standardised across Sweden. Differences might exist in the amount of course credits that need to be obtained, and in teaching duties. For instance, many PhD positions are funded by industry, which often requires certain time to be spent at the industry partner in exchange for teaching duties.
As an example, my PhD studies (starting 2013) took five years, including one year reserved for teaching and one year for obtaining course credits. I was typically a TA in two courses per year (responsibilities ranging from more or less only grading to handling the entire course almost by myself, depending on my maturity and the responsible instructor). Regarding courses, I had to obtain 60 course credits. Included in those were a number of mandatory courses, such as in teaching and pedagogy.
In the Swedish academic system, PhD students are more often than not treated as employees rather than students, having substantial responsibilities (e.g., in teaching, student supervision) and independence (e.g., to plan their coursework, plan research studies). Studies are followed up regularly, and it is possible to change the supervisor/advisor in case of conflict or changes in staff.
Publication Statistics for SE PhDs
As SE is a field with a high frequency of publication, and as most SE PhD and Licentiate theses in Sweden are compilation-style theses (published papers, plus an introduction chapter explaining how they tie together), publications play an important role during the studies. There are no official figures on what constitutes a “good” thesis, or how many publications there should be in one. This is arguably a sensible approach, as it largely depends on the studied topic, on the different venues, and on the coherence of the different publications in a thesis (many excellent papers don’t automatically make a thesis). However, PhD students often express the desire to get some kind of reference on these different factors.
To address this issue and to get an overview of the trends, we decided in the fall of 2018 to revisit existing SE PhD and Licentiate theses in Sweden with the purpose to obtain such a reference picture. To do so, we searched all compilation-style PhD and Licentiate theses produced by SE research groups in Sweden until end of 2018, of which we could either obtain a PDF or a printed copy. From the resulting set of 71 PhD and 51 Licentiate theses, we extracted the following data:
- Publication year;
- Total number of published and submitted papers included in the thesis;
- Total number of published and submitted papers not included in the thesis, but additionally listed (“other papers, not included in the thesis”);
- For included and listed papers, an additional breakdown into the number of papers submitted to/published in ISI-listed Journals, non ISI-listed journals, conferences (everything listed in the CORE ranking with at least ranking C), workshops, and other venues;
To obtain the list of theses, we relied on our knowledge of Swedish SE research, of contacts to the different research groups, and search in publication databases of the different universities. We included theses according to the following criteria:
- The thesis is a compilation-style PhD or Licentiate thesis from a known SE research group, and the thesis itself is listed as contributing to a PhD degree in SE, or
- the thesis explicitly states (in the abstract or introduction) as targeting SE, or
- the thesis is based on / includes at least two papers that have been published in an SE venue (an ISI-listed SE journal or a SE conference, at least ranked C according to the CORE ranking).
We classified papers not clearly marked as published or accepted as submitted/in submission. Papers in the “Other” category were not classified as published or submitted, as it is rarely clear for this category, e.g., for technical reports. We counted tool demos, posters, papers in doctoral symposia, and in “work-in-progress” tracks as “Other”. Finally, we excluded tutorials and keynotes.
In the following, we summarise the thesis data in terms of descriptive statistics and bar charts/box plots. We focus on PhD theses, since these are most relevant for international reference. However, similar figures can easily be obtained from the raw data, which is published on Zenodo. Note that we excluded names of PhD candidates and universities/research groups in the raw data, as we do not want to encourage unnecessary comparisons between groups. All metrics are flawed in some way, and differences in publications are insufficient to anyhow indicate quality of the research groups.
As can be seen from Fig. 1, the amount of PhD theses published per year differs quite a lot, with 1-2 in most of the early 2000s, and up to 8 theses in 2011, 2013, and 2017. This is important to keep in mind when looking at the statistics of papers published/included in the theses per year.
Fig. 2 depicts the amount of papers included on average in a PhD thesis, sorted by the year of the thesis. Interestingly, there are no clear trends over years, e.g., no visible increase in the amount of publications included in a PhD. Typical numbers range from 5.5 to 8 papers per thesis.
Among the total papers included in a PhD thesis, not all are typically published as some might be included as “under submission”. Fig. 3 shows the average amount of published/accepted papers that are included in a PhD thesis per year. Again, there are no clear trends, with a typical number being 5, while 2001 and 2014 show stronger outliers (7 and 8).
Analogous to Fig. 3, Fig. 4 shows the average amount of submitted papers that are included in a PhD thesis per year. Averages of over 1.5 and under 1 are uncommon, indicating that PhD students typically have between one and two papers under submission.
Taking all years into account, Fig. 5 shows the published/accepted and submitted papers included in PhD theses. The averages are depicted by the ‘x’ symbol, while the bold horizontal bars depict the median. Most theses are in the range of 4-6 published and 1-2 submitted papers. The highest number of published papers is 10, the lowest 3. Similarly, the highest number of submitted papers is 4, the lowest number is 0.
Breaking the papers further down by venue, we see that most published papers are in conferences, followed by ISI-ranked journal papers and workshops. A “typical” PhD thesis would have 2 published conference papers, 1 ISI journal paper, and 1 workshop papers. Clearly, these figures vary greatly between theses. For instance, there are several theses that include 3 published ISI journal papers, and only 1 conference paper. Similarly, there are theses with no (ISI) journal papers, 1 conference paper, and several workshop/other papers.
Finally, Fig. 7 depicts the included papers that are submitted, sorted by venue/type. It can clearly be seen that it is common to include submitted journal papers, with a mean of 1 submitted journal paper by thesis. This is sensible, since journal papers have a much longer turnaround time with unclear time lines for reviews. It is much less common to include conference or workshop papers under submission. For the “other” category, we cannot clearly tell what is being submitted, since it is not stated in the thesis. However, these might just as well be journal papers as well.
Summing up, we see that a archetypal PhD thesis in Sweden has about 5 published papers and 1 under submission (typically a journal paper). This trend is fairly stable over the years, with no discernible trend that there are meaningful increases (“publication inflation”). However, it is also important to note that there are large deviations between theses. We see this as an indication that there is no overly-formal requirement for publication numbers, but flexibility with respect to factors such as the strength of the individual publications, their coherence, and the novelty of the topic.