Skip to content
BY 4.0 license Open Access Published by De Gruyter October 9, 2021

Generation of surgical reports using keyword-augmented next sequence prediction

  • Richard Bieck EMAIL logo , Valentina Wildfeuer , Viktor Kunz , Martin Sorge , Markus Pirlich , Max Rockstroh and Thomas Neumuth


The documentation of a surgical procedure remains a time-consuming task that surgeons must incorporate into their daily routine. However, since a surgical report should be produced immediately after the operation with all impressions of the procedure in mind, a means of automation assistance should be provided. We, therefore, propose a method that generates surgical reports based on keywords stated during the procedure. Our report generation is based on a sequence-tosequence model that is trained on sentence pairs of two consecutive sentences in a surgical report. The known sentence is augmented with a keyword based on the following surgical action to be documented and is then passed into a language model to generate the next sentence. In this way, the complexity of predicting a vast number of possible surgical report phrasings is reduced to a next sentence prediction task. For the language model, an encoder-decoder structure was used with bidirectional 2-layer Long-Short Term Memory (LSTM) units for both components and an attention layer between input and output sentences. The training data consisted of 50 ear-,nose- and throat surgery (ENT) reports with 1500 sentences. The model training was performed in a k-fold cross-validation study with k = 10 and cross-entropy loss as the objective function. The generated reports were investigated using NIST, ROUGE, and METEOR metrics. Additionally, three medical experts identified the report content regarding plausibility and text errors. The trained models reached an accuracy of 0.82 for the next sentence predictions. The generated reports show consistent sentence structures and keyword correspondence for about 70 % of provided keyword sequences. The NIST, ROUGE, and METEOR metrics reached 0.65, 0.71, and 0.64, respectively. The model underperformed for not yet known keyword sequences and shows signs of overfitting when keyword sequences deviate from the baseline of the training set. Our approach for the keyword-augmented generation of surgical reports shows the potential of reducing the text generation complexity by providing a sequence of anchor words. However, the automated generation of surgical reports remains a difficult task due to individual report phrasings and the high variance in keyword sequences.

Published Online: 2021-10-09
Published in Print: 2021-10-01

© 2021 The Author(s), published by Walter de Gruyter GmbH, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 3.3.2024 from
Scroll to top button