DEBAGREEMENT is a dataset including 42,894 comment-reply pairings from the popular debate website Reddit, each of which has been labeled with agree, neutral, or disagree. We gathered interactions from five different forums: r/BlackLivesMatter, r/Brexit, r/Climate, r/Democrats, and r/Republican. Comment pairings for each forum were chosen in such a way that they generate a user interaction graph when taken as a whole.
DEBAGREEMENT offers a challenge for Natural Language Processing (NLP) systems because it includes slang, irony, and topic-specific humor, all of which are common in online discussions. We compared the performance of state-of-the-art language models on a (dis)agreement detection task, and looked at the usage of contextual information that is accessible to the models during training (graph, authorship, and temporal information).
DEBAGREEMENT provides novel opportunities for combining graph-based and text-based machine learning techniques to detect agreements as well as disagreements online, in light of recent research showing that context, such as social context or knowledge graph information, enables language models to perform better on downstream NLP tasks.
Key Takeaways in this Tech Talk:
DEBAGREEMENT is a dataset including 42,894 comment-reply pairings from the popular debate website Reddit, each of which has been labeled with agree, neutral, or disagree. We gathered interactions from five different forums: r/BlackLivesMatter, r/Brexit, r/Climate, r/Democrats, and r/Republican. Comment pairings for each forum were chosen in such a way that they generate a user interaction graph when taken as a whole.
DEBAGREEMENT offers a challenge for Natural Language Processing (NLP) systems because it includes slang, irony, and topic-specific humor, all of which are common in online discussions. We compared the performance of state-of-the-art language models on a (dis)agreement detection task, and looked at the usage of contextual information that is accessible to the models during training (graph, authorship, and temporal information).
DEBAGREEMENT provides novel opportunities for combining graph-based and text-based machine learning techniques to detect agreements as well as disagreements online, in light of recent research showing that context, such as social context or knowledge graph information, enables language models to perform better on downstream NLP tasks.
Key Takeaways in this Tech Talk: