Jump to ContentJump to Main Navigation
Show Summary Details
More options …

it - Information Technology

Methods and Applications of Informatics and Information Technology

Editor-in-Chief: Conrad, Stefan / Molitor, Paul

Online
ISSN
2196-7032
See all formats and pricing
More options …
Volume 60, Issue 4

Issues

Predictive analytics for data driven decision support in health and care

Dieter Hayn / Sai Veeranki / Martin Kropf / Alphons Eggerth / Karl Kreiner / Diether Kramer / Günter Schreier
Published Online: 2018-07-28 | DOI: https://doi.org/10.1515/itit-2018-0004

Abstract

Due to an ever-increasing amount of data generated in healthcare each day, healthcare professionals are more and more challenged with information. Predictive models based on machine learning algorithms can help to quickly identify patterns in clinical data. Requirements for data driven decision support systems for health and care (DS4H) are similar in many ways to applications in other domains. However, there are also various challenges which are specific to health and care settings. The present paper describes a) healthcare specific requirements for DS4H and b) how they were addressed in our Predictive Analytics Toolset for Health and care (PATH). PATH supports the following process: objective definition, data cleaning and pre-processing, feature engineering, evaluation, result visualization, interpretation and validation and deployment. The current state of the toolset already allows the user to switch between the various involved levels, i. e. raw data (ECG), pre-processed data (averaged heartbeat), extracted features (QT time), built models (to classify the ECG into a certain rhythm abnormality class) and outcome evaluation (e. g. a false positive case) and to assess the relevance of a given feature in the currently evaluated model as a whole and for the individual decision. This allows us to gain insights as a basis for improvements in the various steps from raw data to decisions.

Keywords: Clinical decision support; Machine learning; Predictive modelling; Feature engineering

ACM CCS: Applied computingLife and medical sciencesHealth care information systems

1 Introduction

Due to an ever-increasing amount of data generated from healthcare processes each day, healthcare professionals are more and more challenged with information. Scientific literature, clinical guidelines and rule-based decision support systems can be effective in those areas where the domain is well understood. However, as systems grow larger over time, they are difficult to maintain. Moreover, when rules from different disease models need to be combined (“co-morbidities”), finding patterns for new rules is very time-consuming. Predictive models based on machine learning algorithms can help to quickly identify patterns.

1.1 Related research

Data driven decision support systems for health and care (DS4H) are a promising approach, especially in complex healthcare settings. They have already been explored in various clinical scenarios, mostly in research settings. Selected application areas are described in the following.

1.1.1 Hospital re-admissions

Prediction and prevention of re-admissions is one of the major DS4H applications. A major driver is the necessity to contain costs, e. g. by reducing the length of in-hospital stays and optimizing the number of procedures applied to the patient. However, such cost savings must be balanced against the risk of re-admissions after discharge. An increasing number of publications deal with re-admissions, some in non-disease specific populations [1], some focusing on a particular medical indication (e. g. a certain disease or surgery) [2], some considering a particular period of time after discharge, (e. g. 30 d) [3], others a predefined interval (e. g. a calendar year or quarter [4], [5], [6], [7]).

1.1.2 Adverse drug reactions

Adverse drug reactions are a common complication especially in the treatment of elderly and chronically ill patients. It is common that patients receive 20 different drugs or more. In such complicated poly-pharmaceutical scenarios, drug-drug-interactions are hard to avoid and may lead to severe adverse events, hospitalizations and death. Therefore, adverse drug reaction prediction has become a major application of machine learning applications in health and care (e. g. [8], [9], [10]).

1.1.3 Delirium prediction

Delirium is a neuropsychiatric syndrome with increased morbidity and mortality [11], [12]. It is often misdiagnosed in hospitalized patients [11], [12], [13], [14]. Approximately 15–30 % of elderly patients are identified with delirium on admission and ∼56 % will develop delirium during their stay in hospital [13]. Effective early detection could avoid 30–40 % of the cases [14], [15], which could save costs and shorten hospital stays [16]. Several groups predicted delirium from different risk-stratification cohort rankings [16], [17], [18]. A systematic review [19] evaluated risk factors and derivative rules [17], [20], [21], [22]. Recently, we started to analyse hospitalized patients from geronto-psychiatry and internal medicine departments to predict delirium [23].

1.1.4 Patient blood management

Blood transfusion is a highly prevalent procedure and in some scenarios, it has lifesaving potential. However, in a significant number of cases transfusion is administered to hemodynamically stable patients with no benefit, but increased odds of adverse events and substantial costs. Therefore, the concept of Patient Blood Management has gained importance to pre-empt and reduce transfusion and to identify the optimal transfusion volume for an individual patient when transfusion is indicated. In a recent study [24] we applied machine learning tools to pre-operative data to predict the amount of red blood cells to be transfused during surgery and to prospectively optimize blood ordering schedules, indicating that predictive modelling is more accurate than state of the art algorithms [24], [25].

1.2 Research contribution

DS4H have specific requirements. However, to our best knowledge there is currently no publication that summarizes these requirements in a compact way.

Supervised machine learning requires 2 phases: Learning with retrospective data and prospective prediction (Fig. 1). State of the art solutions typically address these two phases, which is accurate for many research settings. For real-world healthcare applications, however, additional steps need to be considered, e. g. data collection, de-identification and deployment.

Supervised machine learning phases. First, the model is trained retrospectively to predict a known target from M observations O1...OM, each consisting of N features F1...FN. Thereafter, the trained model is prospectively applied to a new dataset to predict the unknown outcome.
Figure 1

Supervised machine learning phases. First, the model is trained retrospectively to predict a known target from M observations O1...OM, each consisting of N features F1...FN. Thereafter, the trained model is prospectively applied to a new dataset to predict the unknown outcome.

1.3 Objectives

The present paper describes a) healthcare specific requirements for DS4H and b) how they were addressed in our predictive analytics toolset.

2 Requirements for data driven decision support systems for health and care

Requirements for DS4H are similar to applications in other domains, but there are also various challenges which are specific for health and care settings (see e. g. [26]). The development of a DS4H should follow a two-level process with two continuously repeating cycles. Cycle 1 concerns data cleaning & pre-processing, feature engineering, training, evaluation and visualization, interpretation & validation. In cycle 2, existing real-world models need to be adapted in regular, but significantly longer intervals, in terms of objective definition, data collection & de-identification and deployment (see Fig. 2).

Two-level process of data driven decision support in health and care.
Figure 2

Two-level process of data driven decision support in health and care.

DS4H have specific requirements for each process of both cycles, especially in terms of handling of complex data, data protection, patient safety, regulatory compliance, etc., which are described in the following.

2.1 Objective definition

Definition of the objective of a model is a complex and crucial task, which requires involvement of various stakeholders.

2.1.1 Requirements

Valuable scenarios are such, where the following requirements are fulfilled:

  • The data needed for prediction are available

  • The data needed are available in time

  • Different options are available depending on the predicted target value

2.1.2 Application scenarios and usecases

A DS4H should be applicable to various scenarios with reasonable efforts, including the following applications (extended from [26]). An example-usecase is described for each scenario.

  • Patient centred point of care applications

    • At hospital during admission

      Usecase: A patient is admitted to hospital. Alerts are generated, if specific risks are present.

    • At hospital during discharge

      Usecase: A patient should be discharged from hospital. Alerts are presented to the physician, if the readmission risk is higher than a certain threshold.

    • At out-patient physicians

      Usecase: A patient visits his general practitioner. Alerts are generated, if specific risks are present.

  • Patient-facing applications

    • Telehealth systems at the patient’s home

      Usecase: Blood pressure data are received from a telehealth patient. Notifications are sent to a telehealth nurse, if specific risks are present.

  • Hospital management applications

    • Benchmarking

      Usecase: A clinical controller compares different departments of his institution. Significant differences are presented based on predictive modelling.

  • Business intelligence

    Usecase: A financial controller analyses the future bed occupancy of his department. Notifications are generated in case of high risks for over- / under-booking.

  • Population management applications

    Usecase: A decision maker needs to decide which population group should be addressed with a certain intervention. The effect of the intervention on different population groups is modelled.

  • Health outcome research

    Usecase: A decision maker needs to decide whether a new intervention should be introduced. Outcomes with / without the intervention are modelled.

2.2 Data collection & de-identification

Health and care data are complex, including large numbers of patients, large numbers of heterogeneous variables, and data linking across multiple sources [26]. DS4H need to support such complex data structures in a way that data import for new data sources can be managed in reasonable time. Healthcare data are sensitive in terms of privacy and, therefore, need appropriate safeguarding and security concepts. Current and future data protection regulations – especially the new General Data Protection Regulation of the EC [27] – require that health care data used in research need to be either anonymised or pseudonymised. Since personalised data are not necessary during training, de-identification is suggested – especially if the inner cycle of Fig. 2 is performed outside the hospital infrastructure.

2.3 Data cleaning & pre-processing

Health and care data sets often contain missing or incomplete data [26]. Data points can also stem from different kinds of time series, both regularly sampled data sequences (e. g. biosignals such as ECGs) or irregularly sampled time series (e. g. multiple blood pressure measurements in the hospital during a hospital stay). Many data are stored in a coded way, based on standardised coding systems. Some of these codes are equal in almost all healthcare systems (e. g. ICD10 codes for diagnoses [28]), others have regional dialects in different countries (e. g. ICHI [29]). DS4H should support data cleaning and pre-processing of such data.

2.4 Feature engineering

Feature engineering aims at transforming heterogeneous data to a matrix of observations and features (“learning dataset” in Fig. 1). DS4H should support all typical operations, applicable to different “feature levels”, related to one another with 1:1, 1:N or N:M relations (e. g. patient, admission, diagnosis, etc.). Typical operation types are described in the following.

2.4.1 Operations on the same feature level

New features can be derived from other features within the dataset (see Fig. 3). E. g., the Body Mass Index (BMI) can be derived from body weight and height, a Comorbidity Index, or a categorical parameter (e. g. anatomicalRegion = {‘head’, ‘hand’}) can be mapped to boolean parameters (e. g. anatomicalRegion_head, anatomicalRegion_hand).

Operations on a single feature level. For each observation O1...OM, a new feature FN+1 is calculated from the original features F1...FN of this observation.
Figure 3

Operations on a single feature level. For each observation O1...OM, a new feature FN+1 is calculated from the original features F1...FN of this observation.

2.4.2 Expansion

There are datasets that support multiple corresponding data fields for a single feature level. For example, discharge data often contain one primary diagnosis plus multiple secondary diagnoses, which are represented by separate data fields (e. g. secondaryDiagnosis_1, secondaryDiagnosis_2, etc.). Each of these corresponding data fields can be expanded to an additional feature level (e. g. “diagnosis”). For many further processing steps, such an expanded data structure is easier to handle. Expansion is illustrated in Fig. 4.

Illustration of feature expansion. Corresponding features which are represented by separate features in sub-level 1 are mapped to separate observations in sub-level 2.
Figure 4

Illustration of feature expansion. Corresponding features which are represented by separate features in sub-level 1 are mapped to separate observations in sub-level 2.

2.4.3 Aggregation

All data from the various levels must finally be aggregated to the target’s root observation level (e. g. a single admission for re-admission prediction) to provide a single matrix to the machine learning algorithm. Various aggregation methods should be supported, including simple operations (e. g. operations from descriptive statistics like max, min, number of (unique) elements, sum, etc.) and specific, complex calculations based on evidence based clinical guidelines (e. g. Charlson’s Comorbidity Index [30]). (see Fig. 5)

During feature aggregation, all observations OS1 of sub-level 1, which correspond to the same OID, are combined to a new feature FA in the root level.
Figure 5

During feature aggregation, all observations OS1 of sub-level 1, which correspond to the same OID, are combined to a new feature FA in the root level.

2.4.4 Bin management

In some scenarios, there is a need to group the data into periods of time. This is managed by “binning” data from a certain period to a bin. E. g. bin 1 contains all data of month 1, bin 2 contains month 2, etc. In a typical application, data of all but the last bin are used to predict a target parameter of the final, subsequent bin (e. g. data from months 1–11 are used to predict a severe event in month 12). Bin management is illustrated in Fig. 6.

2.4.5 Combination of operations

All these steps need to be combined and cascaded for typical DS4H.

Illustration of bin management. In this example, data for feature FS11 from bin 1 and bin 2 are mapped to a specific feature (F1 and F2), respectively, while data from bin 3 are used as a target.
Figure 6

Illustration of bin management. In this example, data for feature FS11 from bin 1 and bin 2 are mapped to a specific feature (F1 and F2), respectively, while data from bin 3 are used as a target.

2.5 Model training

Model training should support different learning and testing schemata. Typically, the initial models developed for new applications are tested with fast schemata such as 2-fold cross validation, while in subsequent cycles, when higher accuracy is required, more time-consuming schemata such as 10-fold cross validation are preferred. The DS4H should also support different machine learning algorithms (e. g. Decision Trees, Random Forests, Support Vector Machines, Linear Regression Models, Logistic Regression, Neuronal Networks incl. Deep Learning, etc.) to allow selection of the optimal algorithm for each application scenario – depending on type of outcome, the size of the dataset, etc.

2.6 Evaluation

There are many statistical measures that are commonly used for model evaluation. The DS4H should support typical measures, such as area under the receiver operating characteristics, accuracy, sensitivity, specificity, correlation coefficients, F-scores, etc. Since especially the inner cycle shown in Fig. 2 is typically run quite often during model development, all settings and results of different model versions should be stored and reproducibility of previous results should be supported.

2.7 Visualisation, interpretation & validation

Data cleaning, pre-processing, feature engineering and model training are complex process steps with multiple degrees of freedom. Therefore, assessing the effect of certain changes within this process on evaluation results is often difficult and time consuming and specific visualization tools are required. Additionally, current implementation guidelines for clinical decision support systems suggest providing the features and rules which led to the respective prediction to the physician [31]. Although, for many machine learning algorithms this requirement is difficult to meet, it is essential for acceptance of the DS4H.

2.8 Deployment

For deployment of trained models to real-world applications, these models need to become an integral part of the hospital information system. During training, previously collected retrospective data are used. However, during routine care, all the data needed for prediction must be provided almost in real-time. Many tools needed for training are not needed after the deployment, such as a full development environment including various tools for model optimisation, evaluation, visualisation etc. While during training anonymised or pseudonymised data can be used, personalised data are required after deployment. Additionally, all data need to be collected in real time, to provide timely predictions. If DS4H are to be implemented in real-world applications and outside clinical study settings, it should be evaluated, whether they need to be certified as a medical device.

3 Methods – Predictive Analytics Toolset

We have developed a Predictive Analytics Toolset for Health and care applications (PATH) which is based on MATLAB R2017 (The MathWorks, Inc, Natik, NE) including the following toolboxes: Signal Processing, Database and Statistics and Machine Learning. PATH supports the inner and the outer circle of the process illustrated in Fig. 2. Specification of the all adjustable parameters for each step is done in Microsoft Excel as a user interface. After each step, intermediate results together with all specifications are stored on the hard disk. A table based concept ensures that a) previously computed models with identical configurations do not need to be re-computed (potentially time-consuming) and b) all previously achieved results can be reproduced at any time. Additionally, we have developed a concept and approach on how to implement the outer circle of Fig. 2, including objective definition, data collection & de-identification and deployment. Implementation of the 8 phases of the process is described in the following.

3.1 Objectives definition

PATH supports implementation of various scenarios, including regression models, binary and multiple classifications. Any feature available within the learning dataset can serve as the target. Up to now, we have applied PATH to the following targets: delirium [23], hospital re-admissions and adherence during telemonitoring (not published yet), patient blood management including benchmarking [24], [25], healthcare resource utilisation based on health insurance claims [4], [5], [6], [7].

3.2 Data collection and de-identification

Currently, SQL is used to retrieve data from a data warehouse. Specific SQL scripts are needed for each data warehouse. Data can be gathered from different tables and they can be assigned to different data levels, such as patient, admission, diagnosis etc. For training, all data are de-identified by omitting direct patient identifiers such as first name or last name, and by transforming indirect identifiers in a way that K-anonymity is provided (e. g. transforming date of birth to age in years, omitting extremely rare diseases, etc.).

3.3 Data cleaning and pre-processing

The pipeline supports various types of data and various relations in between data objects. Within a separate Excel Source Definition file, data types and their relation to one another (1:1, 1:N, N:M, etc.) are specified. This specification file is also used to specify data cleaning procedures (e. g. outlier removal) which shall be performed in MATLAB. Single data points can be of any basic data type (e. g. date/time, boolean, integer, float, enumeration/categorical, string, etc.). To accelerate model generation, all data types are translated into pure numerical values (e. g. boolean values TRUE / FALSE are translated to 1 / 0). These purely numeric data matrices as well as mapping tables for non-numeric data are stored on the hard disk. Data points from regularly and irregularly sampled time series are supported. PATH is linked to our biosignal processing toolset [32]. Signal processing algorithms can be specified within the Excel settings file (e. g. mean heart rate of the signal, QRS amplitude, blood pressure trend, etc.). PATH also supports to embed external coding systems, such as ICD, ICHI, etc.

3.4 Feature engineering

Within PATH, all operations described in chapter 2.4 can be specified in a Feature Set Definition Excel file.

3.5 Model training

Using a Modelling Definition Excel file, a variety of different models can be generated from the feature sets, including Decision Trees, Random Forests, Support Vector Machines, Linear Regression Models, Logistic Regression, etc. Observations in the feature set can arbitrarily be divided into subsets for training, testing and validation (e. g. 5-fold cross validation). Various modelling settings (meta parameters) can be specified (weight and cost functions, number of trees for Random Forests, etc.).

3.6 Evaluation

Based on the settings specified in the respective Excel spreadsheet, sets of methods for model evaluations are defined and applied. This includes not only the standard key performance indicators such as Receiver Operating Characteristics (ROC), F-score, etc. but also additional capabilities like optimal threshold finding or applying a threshold as determined by a previous modelling procedure or one from a previous bin. Results from different model specifications can easily be compared to one another, using various statistical tools, including box plots, scatter plots, confusion matrices, etc. Due to automated storage of models, results and specifications, PATH guarantees that results previously achieved can be reproduced at later points in time. To ensure reproducibility for random processes (e. g. splitting training and test set / random forests, etc.) involved in such an endeavour, a special validation mode has been implemented which sets all random number generators to defined values.

3.7 Visualization, interpretation and validation

We have developed certain application specific tools that support result visualization, interpretation and validation. The visualization tools are linked to certain aspects of other process steps. For example, by clicking on a certain field of a confusion matrix generated during model evaluation (e. g. clicking on the False Positive results category), a filtered subset with only the selected observations can be analysed. Within the viewer, quick navigation from one observation to the other is supported. For each observation, certain elements can be plotted graphically (e. g. the original ECG signal). The values of all features calculated for the respective observation are displayed and compared to the mean value of the respective feature as achieved within the whole training dataset. For each feature, global feature importance (i. e. importance of each feature within the whole learning dataset) and influence of the feature on the prediction achieved for the selected individual observation (see [32]) are shown (see Fig. 7).

Screenshot (adapted) of the interactive time series visualization tool, including original signal such as ECG (1), derived time series such as heart rate (2) and averaged heart beats (3) and a sortable list of features (4).
Figure 7

Screenshot (adapted) of the interactive time series visualization tool, including original signal such as ECG (1), derived time series such as heart rate (2) and averaged heart beats (3) and a sortable list of features (4).

Example for visualizing prediction results from a random forest based model (symbolic data – figure adapted from [34]).
Figure 8

Example for visualizing prediction results from a random forest based model (symbolic data – figure adapted from [34]).

3.8 Deployment

One route towards deployment is to export standardized Predictive Modelling Markup Language (PMML) [33] objects from our MATLAB models. PMML models can be imported to components, which support PMML within the healthcare provider’s IT environment. One possible choice is to use an R server as in the delirium prediction use case – although, the models applied there are influenced from PATH these are trained with R using the caret package [34]. To provide the data in time, a data warehouse is needed which provides all the data elements needed by the model, e. g. via an ODBC driver. Prediction results are presented to the user in a separate graphical user interface. Risk values can be coded in a traffic-light system. To explain a certain decision, the global feature importance and the influence on the individual decision [35] are calculated for each feature. Both are presented graphically to the user as part of an overview display within the hospital information system (see Fig. 8).

4 Results

4.1 Requirements

Tab. 1 summarises the requirements described in chapter 2 and whether they are met by PATH. 14 out of 23 requirements are completely met by PATH, 5 are met partly and 4 are not met yet. All requirements that are not fulfilled yet and all but one of the partly fulfilled requirements referred to cycle 2 as illustrated in Fig. 2, i. e. to Tasks 1, 2 and 8.

Table 1

Requirements as described in chapter 2 within the eight tasks shown in Fig. 2 and their realization in PATH (green: complete, orange: partly, red: none).

4.2 Model performance

PATH proved to be adaptable to different scenarios in health and care. Table 2 gives an overview of our prediction accuracy in different application scenarios as previously published. The delirium prediction has initially been trained with R, but is reproducible with PATH (for more details, see the respective references). Our results compare very well with similar results published in literature [1], [2], [3], [16], [17], [18].

Table 2

Accuracy for different application scenarios.

5 Discussion

5.1 Summary and interpretation

Although applications in healthcare share many general machine learning methods and aspects, there are circumstances which require specific processes in health and care scenarios. However, there is currently no publication available which describes these requirements in a compact way. In this paper, we summarized requirements needed for DS4H and we described core elements of our toolset PATH which at least partly supports all major steps needed for such applications.

Specific requirements for health and care applications are summarized in Tab. 1. These requirements might be a valuable summary for other researchers. Especially data protection, de-identification, evaluation and certification are more relevant than in other application areas. These requirements complicate and delay DS4H as compared to other application scenarios.

The specific requirements have different implications on the various phases of software development. We found that separation of the two cycles illustrated in Fig. 2 can significantly improve the software development process. Most publications so far focussed on cycle 1, although both cycles are essential. Although four requirements of cycle 2 are not yet implemented in PATH, too, all the others have at least been partly addressed.

Specification of the model and feature settings in Excel and processing the models in MATLAB proved the required structure and flexibility for complex application scenarios where extensive data pre-processing is needed. The current state of PATH allows the user to switch between the various involved levels, i. e. raw data (e. g. ECG), pre-processed data (averaged heartbeat), extracted features (QT interval), built models (classify as normal / pathological), evaluation outcome (a false positive case), and to assess the relevance of a given feature in the whole model or for an individual decision. This allows us to gain insights as a basis for improvements in each step from raw data to decisions.

5.2 Limitations

The software used for model optimization is intended for the inner circle of the predictive modelling process illustrated in Fig. 2. This process is usually executed by software developers or data scientists with deep insights into the data and procedures used. Therefore, we have not yet developed an intuitive user interface suitable for laymen but rather a toolset for experts who have some experience with MATLAB. There are various algorithms known to be suitable in particular scenarios which we have not implemented yet, e. g. artificial neural networks and deep learning. Up to now data de-identification requires several manual steps which are more and more being automatised. Currently, PATH is not certified as a medical device since it is only used within clinical trials. As in most machine learning applications, our results show a certain level of accuracy, which is strongly dependent on the respective application scenario. It is often difficult to say, which minimum level of accuracy is required. Since DS4H are often seen as “black boxes”, results derived from such systems often are hard to interpret by healthcare professionals. Although we have implemented tools to explain specific predictions to physicians (see chapter 3.8), there is still a huge need for innovative tools which help healthcare professionals a) to understand the system’s prediction and b) to react properly e. g. if a high risk for the patient is predicted by the model.

5.3 Outlook

We are continuously developing our approach further to support additional types of data, different target parameters, consider more evidence based data, give further insights into data, results and models, etc. We are continuously updating the system to new challenges in healthcare. We are working on interfaces to other predictive modelling tools, such as WEKA, Orange, R or KNIME towards a “meta framework” for DS4H. Up to now, none of the developed models have been deployed to a real-world scenario. Currently, we are preparing for the deployment of an R-based model for delirium prediction in a hospital in Austria, running on top of a SAP HANA infrastructure directly connected to the HIS. Models for re-admission management are also under development and considered for deployment.

6 Conclusion

Data driven decision support for health and care is a promising approach especially in complex clinical applications. Development and real-world application of predictive models need to be integrated into a complex process. To fulfil the requirements of real-world clinical settings, software tools need to support all process steps, including objective definition, data cleaning and pre-processing, feature engineering, evaluation, result visualization, interpretation and validation, and deployment. Implementation of a two-cycle development process is strongly recommended.

References

  • 1.

    J. Billings, I. Blunt, A. Steventon, T. Georghiou, G. Lewis, and M. Bardsley, “Development of a predictive model to identify inpatients at risk of re-admission within 30 days of discharge (PARR-30)”, (in eng), BMJ Open, vol. 2, no. 4, 2012. Web of ScienceGoogle Scholar

  • 2.

    B. J. Mortazavi et al., “Analysis of Machine Learning Techniques for Heart Failure Readmissions”, (in eng), Circ Cardiovasc Qual Outcomes, vol. 9, no. 6, pp. 629–640, Nov 2016. Web of ScienceGoogle Scholar

  • 3.

    M. Shulan, K. Gao, and C. D. Moore, “Predicting 30-day all-cause hospital readmissions”, (in eng), Health Care Manag Sci, vol. 16, no. 2, pp. 167–175, Jun 2013. Web of ScienceCrossrefGoogle Scholar

  • 4.

    Y. Xie, S. Neubauer, G. Schreier, S. Redmond, and N. Lovell, “Impact of Hierarchies of Clinical Codes on Predicting Future Days in Hospital”, in 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Milan, 2015, pp. 6852–6855: IEEE. Google Scholar

  • 5.

    Y. Xie et al., “Analyzing health insurance claims on different timescales to predict days in hospital”, (in eng), J Biomed Inform, Jan 2016. Web of ScienceGoogle Scholar

  • 6.

    Y. Xie et al., “Predicting Days in Hospital Using Health Insurance Claims”, (in eng), IEEE J Biomed Health Inform, vol. 19, no. 4, pp. 1224–1233, Jul 2015. Web of ScienceGoogle Scholar

  • 7.

    Y. Xie et al., “Predicting Number of Hospitalization Days Based on Health Insurance Claims Data using Bagged Regression Trees”, in 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (Embc), 2014, pp. 2706–2709. Google Scholar

  • 8.

    K. Raja, M. Patrick, J. T. Elder, and L. C. Tsoi, “Machine learning workflow to enhance predictions of Adverse Drug Reactions (ADRs) through drug-gene interactions: application to drugs for cutaneous diseases”, (in eng), Sci Rep, vol. 7, no. 1, p. 3690, Jun 2017. Web of ScienceCrossrefGoogle Scholar

  • 9.

    E. A. Voss, R. D. Boyce, P. B. Ryan, J. van der Lei, P. R. Rijnbeek, and M. J. Schuemie, “Accuracy of an automated knowledge base for identifying drug adverse reactions”, (in eng), J Biomed Inform, vol. 66, pp. 72–81, Feb 2017. CrossrefWeb of ScienceGoogle Scholar

  • 10.

    T. B. Ho, L. Le, D. T. Thai, and S. Taewijit, “Data-driven Approach to Detect and Predict Adverse Drug Reactions”, (in eng), Curr Pharm Des, vol. 22, no. 23, pp. 3498–3526, 2016. CrossrefGoogle Scholar

  • 11.

    D. Hayn, A. Kollmann, and G. q. u. Schreier, “Automated QT Interval Measurement from Multilead ECG Signals”, in Comp Cardiol, 2006, vol. 33, pp. 381–384, publications\2006\2006-09-17-20_CinC_Valencia\DHa\Paper\0381.pdf. Google Scholar

  • 12.

    D. K. Kiely et al., “Persistent delirium predicts greater mortality”, (in eng), J Am Geriatr Soc, vol. 57, no. 1, pp. 55–61, Jan 2009. CrossrefWeb of ScienceGoogle Scholar

  • 13.

    G. Schreier, P. Kastner, W. Marko, and I. Ieee, “An automatic ECG processing algorithm to identify patients prone to paroxysmal atrial fibrillation”, in Computers in Cardiology 2001, vol. 28, Computers in Cardiology, 2001, pp. 133–135. Google Scholar

  • 14.

    D. Hayn, A. Kollmann, and G. Schreier, “Predicting initiation and termination of atrial fibrillation from the ECG”, (in eng), Biomed Tech (Berl), vol. 52, no. 1, pp. 5–10, Feb 2007. CrossrefGoogle Scholar

  • 15.

    M. Vukovic, M. Drobics, K. Kreiner, D. Hayn, and G. Schreier, “Alarm Management in Patient Health Status Monitoring”, in Ehealth2012 – Health Informatics Meets Ehealth – Von Der Wissenschaft Zur Anwendung Und Zuruck: Mobile Health & Care – Gesundheitsvorsorge Immer Und Uberall, pp. bfpage39–44, 2012. Google Scholar

  • 16.

    J. Morak, D. Hayn, P. Kastner, M. Drobics, and G. Schreier, “Near Field Communication technology as the key for data acquisition in clinical research”, in First International Workshop on near Field Communication, Proceedings, 2009, pp. 15–19. Google Scholar

  • 17.

    G. Schreier et al., “A Mobile-Phone based Teledermatology System to support Self-Management of Patients suffering from Psoriasis”, in 30th Annual International Conference of the Ieee Engineering in Medicine and Biology Society, Vols 1–8, IEEE Engineering in Medicine and Biology Society Conference Proceedings, 2008, pp. 5338–5341. Google Scholar

  • 18.

    J. Morak, P. Kastner, D. Hayn, A. Kollmann, and G. Schreier, “Evaluation of a Patient-Terminals Based on Mobile and Near Field Communication Technology”, in Ehealth2008 – Medical Informatics Meets Ehealth, pp. 73–79, 2008. Google Scholar

  • 19.

    G. Schreier et al., “Automated and manufacturer independent assessment of the battery status of implanted cardiac pacemakers by electrocardiogram analysis”, in Proceedings of the 26th Annual International Conference of the Ieee Engineering in Medicine and Biology Society, Vols 1–7, Proceedings of Annual International Conference of the Ieee Engineering in Medicine and Biology Society, 2004, pp. 76–79. Google Scholar

  • 20.

    M. Vukovic, M. Drobics, D. Hayn, K. Kreiner, and G. Schreier, “Automated Alarm Management System for Home Telemonitoring of Chronic Heart Failure Patients”, in Abstractbook of the ICICTH 20112 Conference. 12–14 Jul 2012; Samos, Greece, 2012, p. 14, publications\2012\2012-07-12_ICICTH_Samos\MVu_final\vukovic2012.pdf: Research and Training Institute of East Aegean, Greece. Google Scholar

  • 21.

    D. Hayn, B. Jammerbund, G. Schreier, and IEEE, “ECG Quality Assessment for Patient Empowerment in mHealth Applications”, in 2011 Computing in Cardiology, 2011, pp. 353–356. Google Scholar

  • 22.

    M. Vukovic et al., “Weather Influence on Alarm Occurrence in Home Telemonitoring of Heart Failure Patients”, in 2012 Computing in Cardiology (Cinc), Vol 39, 2012, pp. 525–528. Google Scholar

  • 23.

    D. Kramer et al., “Development and Validation of a Multivariable Prediction Model for the Occurrence of Delirium in Hospitalized Gerontopsychiatry and Internal Medicine Patients”, (in eng), Stud Health Technol Inform, vol. 236, pp. 32–39, 2017. Google Scholar

  • 24.

    D. Hayn et al., “Development of Multivariable Models to Predict and Benchmark Transfusion in Elective Surgery Supporting Patient Blood Management”, Applied Clinical Informatics, vol. 8, no. 2, pp. 617–631, 2017. CrossrefWeb of ScienceGoogle Scholar

  • 25.

    D. Hayn et al., “Data Driven Methods for Predicting Blood Transfusion Needs in Elective Surgery”, (in eng), Stud Health Technol Inform, vol. 223, pp. 9–16, 2016. Google Scholar

  • 26.

    D. Gotz and D. Borland. “Data-Driven Healthcare: Challenges and Opportunities for Interactive Visualization”, IEEE Computer Graphics and Applications, vol. 36, no. 3, pp. 90–96, 2016. CrossrefGoogle Scholar

  • 27.

    The European Parliament and of the Council, “Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)”, 2016. 

  • 28.

    WHO. International Statistical Classification of Diseases and Related Health Problems 10th Revision. 2016. Available from http://apps.who.int/classifications/icd10/browse/2016/en. Last visited: 31.12.2017. Google Scholar

  • 29.

    WHO. International Classification of Health Interventions. Draft in Development 2015. Available from http://www.who.int/classifications/ichi/en/. Last visited: 31.12.2017. Google Scholar

  • 30.

    M. E. Charlson, P. Pompei, K. L. Ales, C. R. MacKenzie, “A new method of classifying prognostic comorbidity in longitudinal studies: development and validation”, J Chronic Dis., vol. 40, pp. 373–383, 1987. CrossrefGoogle Scholar

  • 31.

    D. W. Bates et al., “Ten commandments for effective clinical decision support: Making the practice of evidence-based medicine a reality”, (in eng), Journal of the American Medical Informatics Association, vol. 10, no. 6, pp. 523–530, Nov–Dec 2003. Google Scholar

  • 32.

    M. Kropf, D. Hayn, G. Schreier. ECG classification based on time and frequency domain features using random forests. Computing in Cardiology, Rennes (F); 2017. 

  • 33.

    A. Guazzelli. Representing predictive solutions in PMML. Move from raw data to predictions. IBM developerWorks. 2010. Available from https://www.ibm.com/developerworks/library/ba-ind-PMML2/ba-ind-PMML2-pdf.pdf Last visited: 31.12.2017. Google Scholar

  • 34.

    Max Kuhn. Contributions from Jed Wing, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton Kenkel, the R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew Ziem, Luca Scrucca, Yuan Tang and Can Candan, (2016), caret: Classification and Regression Training. R package version 6.0-68. https://CRAN.R-project.org/package=caret. Google Scholar

  • 35.

    D. Hayn, H. Walch, J. Stieg, K. Kreiner, H. Ebner, and G. Schreier, “Plausibility of Individual Decisions from Random Forests in Clinical Predictive Modelling Applications”, (in eng), Stud Health Technol Inform, vol. 236, pp. 328–335, 2017. Google Scholar

About the article

Dieter Hayn

Dieter Hayn received his MSc in biomedical engineering from the TU Graz, his PhD from the Health and Life Science University Hall/Tyrol and his MBA from the MU Graz. He is currently working as a senior scientist at AIT. His research interests include data science, predictive modelling and biosignal processing. He is a co-editor of the eHealth20xx proceedings and (co-) author of numerous journal / conference papers.

Sai Veeranki

Sai Veeranki obtained IT master’s degree from Alpen-Adria-Universität Klagenfurt in 2014 and graduated in health care information technology in 2016 from FH Kärnten. He is currently employed at AIT and working on his PhD “Predictive modeling in healthcare” in cooperation with KAGes and CBMed.

Martin Kropf

Martin Kropf is with the AIT since 2012. He received his MSc in eHealth from the FH Joanneum Graz in 2009 and is currently doing his PhD at the TU Graz. Since 2015, he is working as a data scientist and clinical project manager at the Charité Berlin.

Alphons Eggerth

Alphons Eggerth has been studying Biomedical Engineering at the Graz University of Technology and is currently working on his PhD thesis at the AIT Austrian Institute of Technology. His research interest is focused on data driven decision support based on time series data from telemonitoring settings.

Karl Kreiner

Karl Kreiner is scientist and project manager at the AIT since 2003. He has more than 10 years of experience in tele-monitoring and machine learning applications. Karl Kreiner contributed to more than 30 national and international research and industry projects. He is author of various scientific publications.

Diether Kramer

Diether Kramer completed his studies in sociology and economics. Diether Kramer received his PhD from the University of Graz in 2013. Since 2007, he has worked at the University of Graz, then freelance for the Max Planck Institute for Demographic Research, as well as for the Wirtschaftsnachrichten and AVL List. From 2014 to 2015 he worked as a consultant for IMS-Health. Since the end of 2015 he is responsible for innovative data use at the KAGes.

Günter Schreier

Günter Schreier received the doctoral and Habilitation degrees in electrical engineering and biomedical informatics from the Graz University of Technology and is currently the thematic coordinator for “Predictive Healthcare Information Systems” with the AIT Austrian Institute of Technology. He serves as the President of the Austrian Society of Biomedical Engineering and the annual scientific eHealth conference in Vienna. He has (co-)authored 300+ scientific publications and presentations and advises the Austrian Ministry of Health and the European Commission.


Received: 2017-12-31

Revised: 2018-03-30

Accepted: 2018-04-26

Published Online: 2018-07-28

Published in Print: 2018-08-28


Parts of this work have been carried out with the support of different funding organisations, i. e. the K1 COMET Competence Center CBmed, which is funded by the Federal Ministry of Transport, Innovation and Technology (BMVIT); the Federal Ministry of Science, Research and Economy (BMWFW); Land Steiermark (Department 12, Business and Innovation); the Styrian Business Promotion Agency (SFG); and the Vienna Business Agency. The COMET program is executed by the FFG. We also thank SAP SE for their support.


Citation Information: it - Information Technology, Volume 60, Issue 4, Pages 183–194, ISSN (Online) 2196-7032, ISSN (Print) 1611-2776, DOI: https://doi.org/10.1515/itit-2018-0004.

Export Citation

© 2018 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Comments (0)

Please log in or register to comment.
Log in