On the Effectiveness of Self-Training in MOOC Dropout Prediction

  • 1 University School of Information, Communication and Technology, Guru Gobind Singh (GGS)Indraprastha University, , 110078, New Delhi, India
  • 2 University School of Information, Communication and Technology, Guru Gobind Singh (GGS)Indraprastha University, , 110078, New Delhi, India

Abstract

Massive open online courses (MOOCs) have gained enormous popularity in recent years and have attracted learners worldwide. However, MOOCs face a crucial challenge in the high dropout rate, which varies between 91%-93%. An interplay between different learning analytics strategies and MOOCs have emerged as a research area to reduce dropout rate. Most existing studies use click-stream features as engagement patterns to predict at-risk students. However, this study uses a combination of click-stream features and the influence of the learner’s friends based on their demographics to identify potential dropouts. Existing predictive models are based on supervised learning techniques that require the bulk of hand-labelled data to train models. In practice, however, scarcity of massive labelled data makes training difficult. Therefore, this study uses self-training, a semi-supervised learning model, to develop predictive models. Experimental results on a public data set demonstrate that semi-supervised models attain comparable results to state-ofthe-art approaches, while also having the flexibility of utilizing a small quantity of labelled data. This study deploys seven well-known optimizers to train the self-training classifiers, out of which, Stochastic Gradient Descent (SGD) outperformed others with the value of F1 score at 94.29%, affirming the relevance of this exposition.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] Masters K., A brief guide to understanding MOOCs, The Internet Journal of Medical Education, 1(2), 2011, 2

  • [2] Hew K.F., Cheung W.S., Students’ and instructors’ use of massive open online courses (MOOCs): Motivations and challenges, Educational research review, 12, 2014, 45–58

  • [3] McAuley A., Stewart B., Siemens G., Cormier D., The MOOC model for digital practice, 2010

  • [4] Dalipi F., Imran A.S., Kastrati Z., MOOC dropout prediction using machine learning techniques: Review and research challenges, in 2018 IEEE Global Engineering Education Conference (EDUCON), IEEE, 2018, 1007–1014

  • [5] Shah D., Online Degrees Slowdown: A Review of MOOC Stats and Trends in 2019 - Class Central, 2020

  • [6] Shah D., Year of MOOC-based Degrees: A Review of MOOC Stats and Trends in 2018 - Class Central, 2019

  • [7] Jewitt K., The MOOC Revolution–massive open online courses: the answer to problems facing education or an experiment that could destroy centuries of tradition., Compass: Journal of Learning and Teaching, 10(1), 2017

  • [8] Hood N., Littlejohn A., Quality in MOOCs: Surveying the terrain, 2016

  • [9] Clark D., Donald Clark Plan B: MOOCs: taxonomy of 8 types of MOOC, http://donaldclarkplanb.blogspot.com/2013/04/moocs-taxonomy-of-8-types-of-mooc.html, 2013, (Accessed on 02/16/2020)

  • [10] Swenson P., Taylor N.A., Online teaching in the digital age, Sage Publications, 2012

  • [11] Jordan K., Massive open online course completion rates revisited: Assessment, length and attrition, The International Review of Research in Open and Distributed Learning, 16(3), 2015, 10.19173/irrodl.v16i3.2112

  • [12] Catropa D., Big (MOOC) Data: Inside Higher Ed, 2013

  • [13] Khalil H., Ebner M., MOOCs completion rates and possible methods to improve retention-A literature review, in EdMedia+ Innovate Learning, Association for the Advancement of Computing in Education (AACE), 2014, 1305–1313

  • [14] Yuan L., Powell S., MOOCs and open education: Implications for higher education, 2013

  • [15] Belanger Y., Thornton J., Barr R.C., Bioelectricity: A quantitative approach–Duke University’s first MOOC, EducationXPress, 2013(2), 2013, 1–1

  • [16] Conole G.G., MOOCs as disruptive technologies: strategies for enhancing the learner experience and quality of MOOCs, Revista de Educación a Distancia, (39), 2013

  • [17] Onah D.F., Sinclair J., Boyatt R., Dropout rates of massive open online courses: behavioural patterns, EDULEARN14 proceedings, 1, 2014, 5825–5834

  • [18] Peltier J.W., Drago W., Schibrowsky J.A., Virtual communities and the assessment of online marketing education, Journal of Marketing Education, 25(3), 2003, 260–276

  • [19] Hone K.S., El Said G.R., Exploring the factors affecting MOOC retention: A survey study, Computers & Education, 98, 2016, 157–168

  • [20] Peltier J.W., Schibrowsky J.A., Drago W., The interdependence of the factors influencing the perceived quality of the online learning experience: A causal model, Journal of Marketing Education, 29(2), 2007, 140–153

  • [21] O’Brien B., Online student retention: can it be done?, Association for the Advancement of Computing in Education (AACE), 2002

  • [22] Open Culture, The Big Problem for MOOCs Visualized, http://www.openculture.com/2013/04/the_big_problem_for_moocs_visualized.html, 2013, (Accessed on 01/30/2020)

  • [23] Kolowich S., Coursera Takes a Nuanced View of MOOC Dropout Rates, 2013

  • [24] Grover S., Franz P., Schneider E., Pea R., The MOOC as Distributed Intelligence: Dimensions of a Framework & Evaluation of MOOCs., in CSCL (2), 2013, 42–45

  • [25] Parr C., Mooc completion rates ’below 7%’, 2013

  • [26] Toven-Lindsey B., Rhoads R.A., Lozano J.B., Virtually unlimited classrooms: Pedagogical practices in massive open online courses, The internet and higher education, 24, 2015, 1–12

  • [27] Margaryan A., Bianco M., Littlejohn A., Instructional quality of massive open online courses (MOOCs), Computers & Education, 80, 2015, 77–83

  • [28] Parker A., Interaction in distance education: The critical conversation, AACE Journal, 1(12), 1999, 13–17

  • [29] Sunar A.S., White S., Abdullah N.A., Davis H.C., How learners’ interactions sustain engagement: a MOOC case study, IEEE Transactions on Learning Technologies, 10(4), 2016, 475–487

  • [30] Alario-Hoyos C., Pérez-Sanagustín M., Delgado-Kloos C., Muñoz-Organero M., Rodríguez-de-las Heras A., et al., Analysing the impact of built-in and external social tools in a MOOC on educational technologies, in European Conference on Technology Enhanced Learning, Springer, 2013, 5–18

  • [31] Nagrecha S., Dillon J.Z., Chawla N.V., MOOC dropout prediction: lessons learned from making pipelines interpretable, in Proceedings of the 26th International Conference on World Wide Web Companion, International World Wide Web Conferences Steering Committee, 2017, 351–359

  • [32] Qiu J., Tang J., Liu T.X., Gong J., Zhang C., Zhang Q., Xue Y., Modeling and predicting learning behavior in MOOCs, in Proceedings of the ninth ACM international conference on web search and data mining, ACM, 2016, 93–102

  • [33] Liang J., Li C., Zheng L., Machine learning application in MOOCs: Dropout prediction, in 2016 11th International Conference on Computer Science & Education (ICCSE), IEEE, 2016, 52–57

  • [34] Whitehill J., Williams J., Lopez G., Coleman C., Reich J., Beyond prediction: First steps toward automatic intervention in MOOC student stopout, Available at SSRN 2611750, 2015

  • [35] Boyer S., Veeramachaneni K., Transfer learning for predictive models in massive open online courses, in International conference on artificial intelligence in education, Springer, 2015, 54–63

  • [36] Kizilcec R.F., Halawa S., Attrition and achievement gaps in online learning, in Proceedings of the second (2015) ACM conference on learning@ scale, ACM, 2015, 57–66

  • [37] He J., Bailey J., Rubinstein B.I., Zhang R., Identifying at-risk students in massive open online courses, in Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

  • [38] Taylor C., Veeramachaneni K., O’Reilly U.M., Likely to stop? predicting stopout in massive open online courses, arXiv preprint arXiv:1408.3382, 2014

  • [39] Kloft M., Stiehler F., Zheng Z., Pinkwart N., Predicting MOOC dropout over weeks using machine learning methods, in Proceedings of the EMNLP 2014 workshop on analysis of large scale social interaction in MOOCs, 2014, 60–65

  • [40] Amnueypornsakul B., Bhat S., Chinprutthiwong P., Predicting attrition along the way: The UIUC model, in Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs, 2014, 55–59

  • [41] Fei M., Yeung D.Y., Temporal models for predicting student dropout in massive open online courses, in 2015 IEEE International Conference on Data Mining Workshop (ICDMW), IEEE, 2015, 256–263

  • [42] Wang W., Yu H., Miao C., Deep model for dropout prediction in MOOCs, in Proceedings of the 2nd International Conference on Crowd Science and Engineering, ACM, 2017, 26–32

  • [43] Xing W., Chen X., Stein J., Marcinkowski M., Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization, Computers in human behavior, 58, 2016, 119–129

  • [44] Al-Shabandar R., Hussain A., Laws A., Keight R., Lunn J., Radi N., Machine learning approaches to predict learning outcomes in Massive open online courses, in 2017 International Joint Conference on Neural Networks (IJCNN), IEEE, 2017, 713–720

  • [45] Al-Shabandar R., Hussain A., Laws A., Keight R., Lunn J., Towards the differentiation of initial and final retention in massive open online courses, in International Conference on Intelligent Computing, Springer, 2017, 26–36

  • [46] Chaplot D.S., Rhim E., Kim J., Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks, in AIED Workshops, volume 53, 2015, 54–57

  • [47] Whitehill J., Mohan K., Seaton D., Rosen Y., Tingley D., Delving deeper into MOOC student dropout prediction, arXiv preprint arXiv:1702.06404, 2017

  • [48] Crossley S., Paquette L., Dascalu M., McNamara D.S., Baker R.S., Combining click-stream data with NLP tools to better understand MOOC completion, in Proceedings of the sixth international conference on learning analytics & knowledge, ACM, 2016, 6–14

  • [49] Robinson C., Yeomans M., Reich J., Hulleman C., Gehlbach H., Forecasting student achievement in MOOCs with natural language processing, in Proceedings of the sixth international conference on learning analytics & knowledge, ACM, 2016, 383–387

  • [50] Coleman C.A., Seaton D.T., Chuang I., Probabilistic use cases: Discovering behavioral patterns for predicting certification, in Proceedings of the Second (2015) ACM Conference on Learning@ Scale, ACM, 2015, 141–148

  • [51] Li W., Gao M., Li H., Xiong Q., Wen J., Wu Z., Dropout prediction in MOOCs using behavior features and multi-view semi-supervised learning, in 2016 international joint conference on neural networks (IJCNN), IEEE, 2016, 3130–3137

  • [52] Gardner J., Brooks C., Student success prediction in MOOCs, User Modeling and User-Adapted Interaction, 28(2), 2018, 127–203

  • [53] Moreno-Marcos P.M., Muñoz-Merino P.J., Maldonado-Mahauad J., Pérez-Sanagustín M., Alario-Hoyos C., Kloos C.D., Temporal analysis for dropout prediction using self-regulated learning strategies in self-paced MOOCs, Computers & Education, 145, 2020, 103728

  • [54] Xing W., Du D., Dropout prediction in MOOCs: Using deep learning for personalized intervention, Journal of Educational Computing Research, 57(3), 2019, 547–570

  • [55] Liu T.y., Li X., Finding out reasons for low completion in MOOC environment: an explicable approach using hybrid data mining methods, DEStech Transactions on Social Science, Education and Human Science, (meit), 2017

  • [56] Chen Y., Zhang M., MOOC student dropout: pattern and prevention, in Proceedings of the ACM Turing 50th Celebration Conference-China, 2017, 1–6

  • [57] Mourdi Y., Sadgal M., Berrada Fathi W., El Kabtane H., A machine learning based approach to enhance MOOC users’ classification., Turkish Online Journal of Distance Education (TOJDE), 21(2), 2020

  • [58] Mubarak A.A., Cao H., Zhang W., Prediction of students’ early dropout based on their interaction logs in online learning environment, Interactive Learning Environments, 2020, 1–20

  • [59] Chen C., Sonnert G., Sadler P.M., Sasselov D.D., Fredericks C., Malan D.J., Going over the cliff: MOOC dropout behavior at chapter transition, Distance Education, 41(1), 2020, 6–25

  • [60] Sun D., Mao Y., Du J., Xu P., Zheng Q., Sun H., Deep Learning for Dropout Prediction in MOOCs, in 2019 Eighth International Conference on Educational Innovation through Technology (EITT), IEEE, 2019, 87–90

  • [61] Chen J., Feng J., Sun X., Wu N., Yang Z., Chen S., MOOC dropout prediction using a hybrid algorithm based on decision tree and extreme learning machine, Mathematical Problems in Engineering, 2019, 2019

  • [62] Liao J., Tang J., Zhao X., Course drop-out prediction on MOOC platform via clustering and tensor completion, Tsinghua Science and Technology, 24(4), 2019, 412–422

  • [63] Alamri A., Alshehri M., Cristea A., Pereira F.D., Oliveira E., Shi L., Stewart C., Predicting MOOCs dropout using only two easily obtainable features from the first week’s activities, in International Conference on Intelligent Tutoring Systems, Springer, 2019, 163– 173

  • [64] Hassan S.U., Waheed H., Aljohani N.R., Ali M., Ventura S., Herrera F., Virtual learning environment to predict withdrawal by leveraging deep learning, International Journal of Intelligent Systems, 34(8), 2019, 1935–1952

  • [65] Wen Y., Tian Y., Wen B., Zhou Q., Cai G., Liu S., Consideration of the local correlation of learning behaviors to predict dropouts from MOOCs, Tsinghua Science and Technology, 25(3), 2019, 336–347

  • [66] Feng W., Tang J., Liu T.X., Understanding dropouts in MOOCs, Association for the Advancement of Artificial Intelligence, 2019

  • [67] Cristea A.I., Alamri A., Kayama M., Stewart C., Alshehri M., Shi L., Earliest predictor of dropout in moocs: a longitudinal study of futurelearn courses, 2018

  • [68] Haiyang L., Wang Z., Benachour P., Tubman P., A time series classification method for behaviour-based dropout prediction, in 2018 IEEE 18th international conference on advanced learning technologies (ICALT), IEEE, 2018, 191–195

  • [69] Qiu L., Liu Y., Liu Y., An integrated framework with feature selection for dropout prediction in massive open online courses, IEEE Access, 6, 2018, 71474–71484

  • [70] Ardchir S., Talhaoui M.A., Jihal H., Azzouazi M., Predicting MOOC Dropout Based on Learner’s Activity, International Journal of Engineering & Technology, 7(4.32), 2018, 124–126

  • [71] Vitiello M., Walk S., Chang V., Hernandez R., Helic D., Guetl C., MOOC dropouts: A multi-system classifier, in European Conference on Technology Enhanced Learning, Springer, 2017, 300–314

  • [72] Cobos R., Wilde A., Zaluska E., Predicting attrition from massive open online courses in FutureLearn and edX, in Proceedings of the 7th International Learning Analytics and Knowledge Conference, Simon Fraser University, Vancouver, BC, Canada, 2017, 13–17

  • [73] Wang F., Chen L., A Nonlinear State Space Model for Identifying At-Risk Students in Open Online Courses, International Educational Data Mining Society, 2016

  • [74] Vitiello M., Walk S., Hernández R., Helic D., Gütl C., Classifying students to improve MOOC dropout rates, Research Track, 2016, 501

  • [75] Tang J.K., Xie H., Wong T.L., A big data framework for early identification of dropout students in MOOC, in International Conference on Technology in Education, Springer, 2015, 127–132

  • [76] Yang D., Wen M., Howley I., Kraut R., Rose C., Exploring the effect of confusion in discussion forums of massive open online courses, in Proceedings of the second (2015) ACM conference on learning@ scale, 2015, 121–130

  • [77] Jiang S., Williams A., Schenke K., Warschauer M., O’dowd D., Predicting MOOC performance with week 1 behavior, in Educational data mining 2014, 2014

  • [78] Rosé C.P., Carlson R., Yang D., Wen M., Resnick L., Goldman P., Sherer J., Social factors that contribute to attrition in MOOCs, in Proceedings of the first ACM conference on Learning@ scale conference, ACM, 2014, 197–198

  • [79] Feld S.L., The focused organization of social ties, American journal of sociology, 86(5), 1981, 1015–1035

  • [80] Bahns A.J., Pickett K.M., Crandall C.S., Social ecology of similarity: Big schools, small schools and social relationships, Group Processes & Intergroup Relations, 15(1), 2012, 119–131

  • [81] Chen T., He L., Collaborative filtering based on demographic attribute vector, in 2009 ETP International Conference on Future Computer and Communication, IEEE, 2009, 225–229

  • [82] Vozalis M.G., Margaritis K.G., Using SVD and demographic data for the enhancement of generalized collaborative filtering, Information Sciences, 177(15), 2007, 3017–3037

  • [83] Mazhari S., Fakhrahmad S.M., Sadeghbeygi H., A user-profile-based friendship recommendation solution in social networks, Journal of Information Science, 41(3), 2015, 284–295

  • [84] MoocData, http://moocdata.cn/data/user-activity, (Accessed on 05/29/2020)

  • [85] Li M., Zhou Z.H., SETRED: Self-training with editing, in Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, 2005, 611–621

  • [86] Nartey O.T., Yang G., Wu J., Asare S.K., Semi-Supervised Learning for Fine-Grained Classification with Self-Training, IEEE Access, 2019

  • [87] McClosky D., Charniak E., 0001 M.J., Effective Self-Training for Parsing, in R.C. Moore, J.A. Bilmes, J. Chu-Carroll, M. Sanderson, eds., Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 4-9, 2006, New York, New York, USA, The Association for Computational Linguistics, 2006

  • [88] Chollet F., et al., Keras, https://keras.io, 2015

  • [89] Breiman L., Random forests, Machine learning, 45(1), 2001, 5–32

OPEN ACCESS

Journal + Issues

Open Computer Science is an open access, peer-reviewed journal. The journal publishes research results in the following fields: algorithms and complexity theory, artificial intelligence, bioinformatics, networking and security systems,
programming languages, system and software engineering, and theoretical foundations of computer science.

Search