Integrated Capstone Project · MS in Data Science, Boston University · Dec 2025

Deconstruct Hate Speech

This capstone tackles a deceptively hard problem: can a model reliably tell when online speech crosses the line into hate speech, abuse, or toxicity — and which signals actually predict it?

I trained classifiers across five independent datasets (Convabuse, Dynamically Generated Hate Speech, Hate Speech in the 2020 U.S. Elections, Multilingual and Multi-Aspect Hate Speech, and Online Abusive Attacks), combining metadata features with sentence embeddings to predict hate speech, abuse level, toxicity, and the number of target groups impacted. Accuracy ranged from 70% (hate speech) to 99.8% (toxicity), and feature-importance and SHAP analysis consistently pointed to sentiment as a key driver of harmful content.

My takeaway: platforms don’t need one universal hate-speech detector. Different signals — sentiment, parent-post toxicity, annotator-labeled context — predict different aspects of harmful content, and a moderation pipeline that combines them can catch more without over-flagging legitimate speech.

Your browser can’t display this PDF inline.

Download the PDF instead →