Background Human immunodeficiency virus (HIV) viral failure occurs when antiretroviral therapy fails to suppress and sustain a person’s viral load count below 1,000 copies of viral ribonucleic acid per milliliter. For those newly diagnosed with HIV and living in a setting where healthcare resources are limited, such as a low- and middle-income country, the World Health Organization recommends viral load monitoring six months after initiation of antiretroviral treatment and yearly thereafter. Deviations from this schedule are made in cases where viral failure occurs or at the discretion of the clinician. Failure to detect viral failure in a timely fashion can lead to delayed administration of essential interventions. Clinical prediction models based on information available in the patient medical record are increasingly being developed and deployed for decision support in clinical medicine and public health. This raises the possibility that prediction models can be used to detect potential for viral failure in advance of viral measurements, particularly when those measurements occur infrequently. Objective Our goal is to use electronic health record data from a large HIV care program in Kenya to characterize and compare the predictive accuracy of several statistical machine learning methods for predicting viral failure at the first and second measurements following initiation of antiretroviral therapy. Predictive accuracy is measured in terms of sensitivity, specificity and area under the receiver-operator characteristic curve. Methods We trained and cross-validated 10 statistical machine learning models and algorithms on data from over 10,000 patients in the Academic Model Providing Access to Healthcare care program in western Kenya. These included parametric, non-parametric, ensemble, and Bayesian methods. The input variables included 50 items from the clinical record, hand picked in consultation with clinician experts. Predictive accuracy measures were calculated using 10-fold cross validation. Results Viral load failure rate is about 20% in this patient cohort at both the first and second measurements. Ensemble techniques generally outperformed other methods. For predicting viral failure at the first follow up measure, specificity was over 90% for these methods, but sensitivity was typically in the 50–60% range. Predictive accuracy was greater for the second follow up measure, with sensitivities over 80%. Super Learner, gradient boosting and Bayesian additive regression trees consistently outperformed other methods. For a viral failure rate of 20%, the positive predictive value for the top-performing methods is between 75 and 85%, while the negative predictive value is over 95%. Conclusion Evidence from this study suggests that machine learning techniques have potential to identify patients at risk for viral failure prior to their scheduled measurements. Ultimately, prognostic virologic assessment can help guide the administration of earlier targeted intervention such as enhanced drug resistance monitoring, rigorous adherence counseling, or appropriate next-line therapy switching. External validation studies should be used to confirm the results found here.