Life Scholarship Renewal

Predicting Eligibility through Machine Learning

 

Overview 

As a Data Modeler in the Office of Enrollment Information at the College of Charleston, my primary project consisted of developing processes and models for predicting a student’s risk of maintaining their in-state scholarship offering. While the sensitive nature of the data and proprietary nature of the specific findings deter sharing these aspects of the project directly, included below are remarks on each step of the process as well as the results and my recommendations for the future of this project.

In addition to information available through a student’s application to the College, the project also utilizes the student’s course schedule and academic performance throughout their first semester to improve the model’s results. In doing so, three distinct time periods were identified for creating predictions and initiating possible intervention methods: Day One, Midterm, and Fall Final, correlating to late August, early October, and mid-December, respectively.

 

Data Cleaning

The initial R markdown file used for processing the data for modeling takes as its input a series of Cognos reports including each student’s standardized test scores, high school performance, demographic information, and upcoming course schedule. These disparate reports were each individually processed and prepared before imputing missing values and merging the resulting tables together into one cohesive dataset used for training our random forest model used to classify students based on their risk of not maintaining their scholarship eligibility.

Among the necessary transformations, one of the most challenging proved to be combining the standardized test score into one properly scaled and comparable feature for our dataset. Not only did it require scaling the ACT and SAT scores together to create one variable, but due to the changes made to the SAT test in 2015, the data cleaning file had to take into account which type of test student’s reported taking and treated each test’s scores accordingly to ensure the proper scale was used to output the resulting test score attribute.

In addition, one of the most beneficial side effects of this data cleaning process proved to be the creation of the novel Average Difficulty metric. The first step of creating this Average Difficulty metric for a student’s schedule included aggregating all of the previous years’ LIFE scholarship recipients’ final grades across their freshman year, transforming the grades into a typical GPA scale, and averaging the grade across each student who completed the course. Then, this average GPA earned in the course was multiplied by the number of credit hours, ensuring a four hour course was considered appropriately more time-consuming in a student’s schedule than a similar three hour course. Finally, each student’s schedule’s difficulty was summed and divided by the number of courses to arrive at the final Average Difficulty for a particular schedule. This metric proved to be a powerful predictor of scholarship retention, appearing in the top three in terms of Feature Importance for each model deployed.

Tangentially, this Average Difficulty metric was also deployed in a subsequent project to identify pairs and trios of courses which, when taken in conjunction with one another, were detrimental to a student’s success. This new measure proved essential in finding frequently appearing subsets of courses which could then be taken over the course of two semesters, thereby making a student’s first semesters more manageable and alleviating pressure which otherwise might have impacted their scholarship retention.

 

Model Tuning & Training

With the dataset in place, three models were then optimized and trained, each one representing one of the aforementioned time periods for potential intervention as new data became available. Each random forest model featured an optimized mtry value, which denotes how many features are considered at random by a given decision tree within the forest. Further, each model used a unique classification cutoff value tweaked to highlight the intent for intervention at a given point in time. For instance, the Day One model, given the relative uncertainty due to lack of grade data, focusses on casting a wide net and classifying as many ineligible students as possible, even at the expense of misclassifying more students who maintain their eligibility. Contrastingly, the Midterm and Fall Final models pivot towards more aggressive cut-off values as the intervention methods shift toward more in-depth conversations and require targeting a smaller subset of students.

In addition, one of the benefits of using the random forest model is its capabilities for handling attributes with multicollinearity (the existence of multiple attributes with high correlation to one another in the dataset), which occurs several times throughout the dataset used for this project. Similar to more traditional statistical models like logistic regression, the random forest provides as an outputs partial probabilities for each student’s eligibility. This allowed us to utilize these percentages as a measure of “Predicted Risk” for a given student, essentially how confident our model is in the student’s Eligibility. Not only does this allow for more nuance than setting a cutoff point and pushing all students on either side of the threshold to one extreme or the other, but it also allows for specifically highlighting specific subsets of students based on certainty and the timing of implementation.

 

Model Evaluation

Throughout this project, results are considered in two different methods, both provided as outputs from the random forest model. 

First, using the cutoff value assigned to the random forest, one can determine the threshold for what percentage of decision trees within the forest must vote for a particular student to be classified as Eligible. Using the default cutoff, the model classifies Eligible students particularly well, but struggles to classify Ineligible students with any deployable level of precision. By tuning the cutoff value, we can find a better balance in which both Eligible and Ineligible students are classified relatively well without sacrificing one’s class error rate for the other.

Cutoff=(.6,.4) OOB Error Accuracy Sensitivity Specificity
Day One 30.47% 0.71 0.72 0.69
Midterm 22.49% .78 .81 .72
Fall Final 16.44% .82 .85 .78

To highlight the value of considering various cutoff values, if one changes the cutoff to be (.65, .35), the Day One model’s accuracy drops by 3%, but its specificity increases by .06, meaning it correctly classifies 75% of Ineligible students instead of 69%. For the 2017 cohort, this translates to identifying 189 out of the 252 students who wind up Ineligible by year’s end before they’ve even taken a single course. The tradeoff, however, that is in doing so, the sensitivity drops by .09 down to correctly identifying only 63% of Eligible students. For the Midterm and Fall Final model, this new cutoff results in a roughly 3% drop in Sensitivity for a similar increase in Specificity.

Cutoff=(.65,.35) OOB Error Accuracy Sensitivity Specificity
Day One 32.36% .68 .63 .75
Midterm 24.01% .77 .78 .76
Fall Final 17.6% .81 .81 .81

As you can see, the evaluation of the three models depends largely on the cutoff values chosen, which can in turn be determined by the requirements of the intervention methods. Thus, we can say the model’s accuracy is an adequate single metric for comparing the information gain as time passes and comparing tuning parameters for each individual model, but in practice Specificity is likely to be a better gauge of our model’s real-world success as it helps determine how well the model identifies ineligible students.

 

Recommendations

Currently, no intervention methods have been put in place regarding the results of this project. Listed below are considerations for the time periods following each model’s deployment for how those in position to work alongside students to ensure their success could engage with students at risk of losing their scholarship.

Day One

In the opening week of the fall semester in August, the model would likely best serve tactics of engagement which are generalized and broad in scope. Intervention may include additional communication during orientation, advising, or the first weeks of courses prior to the Add/Drop window closing. Such tactics could reach students across the spectrum of predicted risk without a considerable amount of additional resources expended. If available resources allowed a more involved intervention, it is recommended to focus on the subset of students who would be considered high-risk at this early stage as this is a smaller cohort of students and allows those most at-risk to have the most time to make adjustments as needed.

Midterm

By the midpoint of the semester, the model’s results are increasingly confident in many of the students who are predicted to be in either extreme along the spectrum of predicted risk. Thus, this would provide an opportunity for intervention methods to focus on ongoing efforts with students already engaged, but also to shift focus to individuals for whom the model still expresses uncertainty. Not only will this be a smaller portion of students than in the Day One model, but they may also prove to be students who need less involved efforts to see them flip from Ineligible to Eligible by the end of the year. Specifically, the timing of midterm grades lines up well with course registration for the following semester. Thus, intervention could center around more intentional advising and considerations for the difficulty of the courses selected. It may also be an appropriate time to begin considering the possibility of summer courses for either meeting the minimum hour threshold or improving their GPA based on Midterm projections for their fall semester’s final grades.

Fall Final

By the time the semester’s final grades are submitted and able to be added to the dataset, it will likely be with the onset of the spring semester before any intervention methods can be deployed. Thus, these interactions may take the form of working with students whose fall GPA or level of risk is markedly different from their midterm projections and to consider altering their spring schedule accordingly prior to the Add/Drop period ending. Further, this would be an ideal time to more concretely discuss the options provided by a summer semester and more definitively determine what grades would be necessary to earn in the remaining semesters to maintain eligibility now that the first semester’s grades are set in stone with regards to the student’s LIFE GPA.