As part of the 2017 CLPsych Shared Task, I worked (alongside my lab-mates at CAMH) to develop an automatic classifier to label forum posts from ReachOut.com into 4 urgency levels, to allow moderators to know which posts to prioritize, based on possible risks to the poster. ReachOut is a message board focused on allowing peers to support each other in coping with mental health challenges.
We used a meta-learning tool called TPOT which, given a set of features, automatically generates a machine learning pipeline using genetic algorithms. Using TensorFlow and a pre-trained sentiment model from OpenAI, I extracted high-dimensional vectors representing the sentiment of each forum post. These were used as inputs for TPOT pipeline, as well as manually-chosen features such as time of day, season (which have been established as correlated with suicide risk).
Though we did not win the competition, our models were generally competitive, and we achieved the best recall (true positive rate) when separating posts that required moderator action from those that did not.
Contributions: Implementation, Data Analysis, Report Writing. I worked with Derek Howard, Geoffrey Woollard and Prof. Leon French on this project. Images courtesy of ReachOut and the UPenn Epistasis Lab.