Education | Grischa Liebel

The rules and regulations for attaining PhD degrees vary substantially across different countries and topics. For instance, a large variety of differences in funding, tasks and duties, expectations towards publishing, and the overall duration of the PhD studies (and whether or not it is really “studying” or more of an employment) is commonplace.

In this blog post, I will briefly summarise how PhD studies in Software Engineering (SE) are organised in Sweden, followed by a summary of work we (Robert Feldt & myself) have done in the end of 2018 to obtain an overview of publication statistics for Swedish SE PhD students. This work was inspired by an earlier study done some time before that (with similar findings). You can find a summary of that work here.

One of the main reasons for writing this blog post (and for doing the work summarised within) is that, oftentimes, the academic model used in the United States is assumed to be the norm, despite huge differences to other countries. This post will clarify some of the differences in Sweden (which is, in turn, similar the situation in other Nordic countries).

SE PhD in Sweden

Obtaining a PhD in SE in Sweden requires three years of full-time research work. Additionally, this period is extended by another one to two years to obtain mandatory course credits and to do teaching duties, resulting in four to five years of PhD studies. Additionally, it is common to publish and defend a so-called “Licentiate” thesis after approximately half the time, sort of a dry run for the final PhD degree. Positions are fully funded, resulting in a salary that is somewhat lower than in industry. However, it is an actual salary, not a stipend or a “eat cheap pasta or stay hungry for five years” compensation. PhD salaries are standardised across Sweden. Differences might exist in the amount of course credits that need to be obtained, and in teaching duties. For instance, many PhD positions are funded by industry, which often requires certain time to be spent at the industry partner in exchange for teaching duties.

As an example, my PhD studies (starting 2013) took five years, including one year reserved for teaching and one year for obtaining course credits. I was typically a TA in two courses per year (responsibilities ranging from more or less only grading to handling the entire course almost by myself, depending on my maturity and the responsible instructor). Regarding courses, I had to obtain 60 course credits. Included in those were a number of mandatory courses, such as in teaching and pedagogy.

In the Swedish academic system, PhD students are more often than not treated as employees rather than students, having substantial responsibilities (e.g., in teaching, student supervision) and independence (e.g., to plan their coursework, plan research studies). Studies are followed up regularly, and it is possible to change the supervisor/advisor in case of conflict or changes in staff.

Publication Statistics for SE PhDs

As SE is a field with a high frequency of publication, and as most SE PhD and Licentiate theses in Sweden are compilation-style theses (published papers, plus an introduction chapter explaining how they tie together), publications play an important role during the studies. There are no official figures on what constitutes a “good” thesis, or how many publications there should be in one. This is arguably a sensible approach, as it largely depends on the studied topic, on the different venues, and on the coherence of the different publications in a thesis (many excellent papers don’t automatically make a thesis). However, PhD students often express the desire to get some kind of reference on these different factors.

To address this issue and to get an overview of the trends, we decided in the fall of 2018 to revisit existing SE PhD and Licentiate theses in Sweden with the purpose to obtain such a reference picture. To do so, we searched all compilation-style PhD and Licentiate theses produced by SE research groups in Sweden until end of 2018, of which we could either obtain a PDF or a printed copy. From the resulting set of 71 PhD and 51 Licentiate theses, we extracted the following data:

Publication year;
Total number of published and submitted papers included in the thesis;
Total number of published and submitted papers not included in the thesis, but additionally listed (“other papers, not included in the thesis”);
For included and listed papers, an additional breakdown into the number of papers submitted to/published in ISI-listed Journals, non ISI-listed journals, conferences (everything listed in the CORE ranking with at least ranking C), workshops, and other venues;

To obtain the list of theses, we relied on our knowledge of Swedish SE research, of contacts to the different research groups, and search in publication databases of the different universities. We included theses according to the following criteria:

The thesis is a compilation-style PhD or Licentiate thesis from a known SE research group, and the thesis itself is listed as contributing to a PhD degree in SE, or
the thesis explicitly states (in the abstract or introduction) as targeting SE, or
the thesis is based on / includes at least two papers that have been published in an SE venue (an ISI-listed SE journal or a SE conference, at least ranked C according to the CORE ranking).

We classified papers not clearly marked as published or accepted as submitted/in submission. Papers in the “Other” category were not classified as published or submitted, as it is rarely clear for this category, e.g., for technical reports. We counted tool demos, posters, papers in doctoral symposia, and in “work-in-progress” tracks as “Other”. Finally, we excluded tutorials and keynotes.

In the following, we summarise the thesis data in terms of descriptive statistics and bar charts/box plots. We focus on PhD theses, since these are most relevant for international reference. However, similar figures can easily be obtained from the raw data, which is published on Zenodo. Note that we excluded names of PhD candidates and universities/research groups in the raw data, as we do not want to encourage unnecessary comparisons between groups. All metrics are flawed in some way, and differences in publications are insufficient to anyhow indicate quality of the research groups.

As can be seen from Fig. 1, the amount of PhD theses published per year differs quite a lot, with 1-2 in most of the early 2000s, and up to 8 theses in 2011, 2013, and 2017. This is important to keep in mind when looking at the statistics of papers published/included in the theses per year.

Fig. 2: Average included papers per thesis

Fig. 2 depicts the amount of papers included on average in a PhD thesis, sorted by the year of the thesis. Interestingly, there are no clear trends over years, e.g., no visible increase in the amount of publications included in a PhD. Typical numbers range from 5.5 to 8 papers per thesis.

Fig. 3: Published papers per PhD thesis per year

Among the total papers included in a PhD thesis, not all are typically published as some might be included as “under submission”. Fig. 3 shows the average amount of published/accepted papers that are included in a PhD thesis per year. Again, there are no clear trends, with a typical number being 5, while 2001 and 2014 show stronger outliers (7 and 8).

Fig. 4: Submitted papers per PhD thesis per year

Analogous to Fig. 3, Fig. 4 shows the average amount of submitted papers that are included in a PhD thesis per year. Averages of over 1.5 and under 1 are uncommon, indicating that PhD students typically have between one and two papers under submission.

Fig. 5: Overall published and submitted papers included in PhD theses.

Taking all years into account, Fig. 5 shows the published/accepted and submitted papers included in PhD theses. The averages are depicted by the ‘x’ symbol, while the bold horizontal bars depict the median. Most theses are in the range of 4-6 published and 1-2 submitted papers. The highest number of published papers is 10, the lowest 3. Similarly, the highest number of submitted papers is 4, the lowest number is 0.

Fig. 6: Included published papers by venue

Breaking the papers further down by venue, we see that most published papers are in conferences, followed by ISI-ranked journal papers and workshops. A “typical” PhD thesis would have 2 published conference papers, 1 ISI journal paper, and 1 workshop papers. Clearly, these figures vary greatly between theses. For instance, there are several theses that include 3 published ISI journal papers, and only 1 conference paper. Similarly, there are theses with no (ISI) journal papers, 1 conference paper, and several workshop/other papers.

Fig. 7: Included submitted papers by venue

Finally, Fig. 7 depicts the included papers that are submitted, sorted by venue/type. It can clearly be seen that it is common to include submitted journal papers, with a mean of 1 submitted journal paper by thesis. This is sensible, since journal papers have a much longer turnaround time with unclear time lines for reviews. It is much less common to include conference or workshop papers under submission. For the “other” category, we cannot clearly tell what is being submitted, since it is not stated in the thesis. However, these might just as well be journal papers as well.

Summary/Discussion

Summing up, we see that a archetypal PhD thesis in Sweden has about 5 published papers and 1 under submission (typically a journal paper). This trend is fairly stable over the years, with no discernible trend that there are meaningful increases (“publication inflation”). However, it is also important to note that there are large deviations between theses. We see this as an indication that there is no overly-formal requirement for publication numbers, but flexibility with respect to factors such as the strength of the individual publications, their coherence, and the novelty of the topic.

In this small research summary of my 2016 article in Teaching in Higher Education (here), I make the case for team teaching, including PhD students in your lecturing! In our specific case, we teach software modelling. However, the concept might apply to many different lecture setups.

Team Teaching and Pair Lecturing

In team teaching, multiple teachers are involved in teaching a single course. Often, the format is to include different experts to teach different parts of a multi-disciplinary course. For example, I am currently taking a course on medicine for engineers. Here, lectures on different parts of the body, e.g., the cardiovascular system and the central nervous system, are given by different specialists in the respective areas.

Another case of team teaching is pair lecturing, giving a single lecture in a pair (or a team) of teachers. In this case, the teachers take turns explaining things and jump in when needed. The rationale is that multiple lecturers can give multiple explanations to a complex topic, thus facilitating understanding. Also, when one teacher is currently talking, the other teacher can observe. As most of us know from our student times (and from talks we have listened to at conferences), it rather easy to spot as a bystander when the audience is not following. In those cases, the bystander jumps in and gives another explanation that might help the audience to understand.

Pair Lecturing to Verbalise Cognitive Processes

When lecturing about software modelling, we often face the challenge that modelling is a cognitive process. Apart from design patterns, there are very few established principles on how software models are built, what they should contain, and how you construct them in a step-wise fashion. Essentially, the process of building an abstract model of something is mainly going on inside our heads. Making this process explicit to students is challenging.

One way to explain this process is, according to our experience, pair lecturing. In class, we exemplify the modelling process by constructing software models together at the whiteboard. While we built the model, we have to explain to each other what we are drawing. Additionally, we might have to clarify, if our explanation is insufficient, argue, if our approach is questioned by the other teacher, or correct, if we together reach the conclusion that a different solution is better. This discussion at the whiteboard is, essentially, a verbalisation of the cognitive process that would be hidden to students if we simply drew the solution on the board. Similarly, by just explaining our own solution, much of the discussion would be omitted.

Apart from verbalising the modelling process to students, a positive side-effect is that many of the questions students might have during class are already brought up by the other teacher. This encourages further questions and participation from student side (in our experience), and makes sure that questions are actually asked.

Teacher Synchronisation & Costs

Traditionally, pair lecturing has been criticised for its high costs. Clearly, including two teachers instead of one raises the teaching costs.

Our approach to lower these costs dramatically is to include PhD students instead of members of the faculty. As the main teacher is still present, the PhD student does not need to be as knowledgeable in the topic as the main teacher. In fact, not knowing too much might be helpful, as the PhD student will in this case ask similar questions as the audience would – he/she builds a bridge between the main teacher and the students.

Now you might ask yourself: “OK, so those guys just use their PhD students because they’re cheaper. That’s it?”. No. The use of PhD students in pair lecturing is far from accidental in our case. In the Swedish university system, PhD students have a certain percentage of teaching duties. Typically, this duty is spent on group supervision in labs or project work. This, however, means that the PhD students need sufficient knowledge to supervise. Furthermore, they constantly need to synchronise with the main teacher on course contents, progress, and similar issues.

Including PhD students in the lectures effectively removes the need for additional synchronisation. At all times, the PhD student is aware of what is going on in the lectures and knows what the main teacher explained. Hence, additional synchronisation meetings are not needed. Furthermore, there is a lower risk that the PhD student will give explanations in the supervision sessions that are in contrast to the lectures.

Finally, pair lecturing using PhD students facilitates teacher succession. In our case, we had a large staff turnover in the last couple of years. This turnover suddenly left a lot of courses without teacher. In the case of the software modelling course, the PhD student that was involved in pair lecturing knew about all course moments, rationale behind them, and could essentially take over the course, or at least easily hand it over to the new teacher. Even if the main teacher kept teaching the course, it was easier to handle issues such as sick leave, business travel, or other cases in which the main teacher could not be present.

Summary

To sum up the entire article: In our experience, pair lecturing (lecturing with two teachers at the same time) using PhD students as part of the lecturing team helps:

by giving students multiple explanations to a single problem,
by verbalising thought processes and discussions,
and by engaging the student audience more than traditional lectures.

Our course evaluations also clearly show that students see these benefits!

With respect to the course setup, pair lecturing helps:

by synchronising between the main teacher and the teaching assistants, effectively removing the need for synchronisation meetings,
by avoiding contradicting statements between main teachers and teaching assistants,
and by effectively creating a “backup teacher” in case the main teacher is not available or even leaves the university/the course.

The full article includes a lot more details, empirical data, and reflections!

Grischa Liebel

Category Archives: Education

Software Engineering PhDs in Sweden: Statistics and Trends