Question Answering in Social Media

What is TweetQA?

With social media becoming increasingly popular on which lots of news and real-time events are reported, developing automated question answering systems is critical to the effectiveness of many applications that rely on real-time knowledge. While previous question answering (QA) datasets have concentrated on formal text like news and Wikipedia, we present the first large-scale dataset for QA over social media data. To make the tweets are meaningful and contain interesting information, we gather tweets used by journalists to write news articles. We then ask human annotators to write questions and answers upon these tweets. Unlike other QA datasets like SQuAD in which the answers are extractive, we allow the answers to be abstractive. The task requires model to read a short tweet and a question and outputs a text phrase (does not need to be in the tweet) as the answer.

TweetQA Paper (Xiong et al. ACL '19)

TweetQA Example

Comparison

1000 most distinctive domain words from SQuAD.

1000 most distinctive domain words from TweetQA.

Getting Started

We've built a few resources to help you get started with the dataset.

Download a copy of the dataset (distributed under the CC BY-SA 4.0 license):

TweetQA Data and Evaluation Script

To evaluate your models on our official testing dataset, please check our codalab competition site:

CodaLab Submission

Citations

Please cite our paper as below if you use the TweetQA dataset.

@inproceedings{xiong2019tweetqa,
  title={TweetQA: A Social Media Focused Question Answering Dataset},
  author={Xiong, Wenhan and Wu, Jiawei and Wang, Hong and Kulkarni, Vivek and Yu, Mo and Guo, Xiaoxiao and Chang, Shiyu and Wang, William Yang},
  booktitle={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
  year={2019}
}

Have Questions?

For dataset issues, email to xwhan@cs.ucsb.edu.
For submission issues, please email Hong Wang at hongwang600@cs.ucsb.edu.

Leaderboard

Rank	Model	BLEU-1	METEOR	ROUGE-L
	EXTRACT-Upperbound (Xiong et al. ACL '19)	75.1	69.8	75.6
	Human Performance (Xiong et al. ACL '19)	70.0	66.7	73.5
1 June 25, 2022	EA-SMRC SUDA NLP (HY)	78.5	74.7	80.0
2 June 02, 2021	ST-F-MTL SUDA NLP	76.84	73.30	77.97
3 Oct 28, 2019	NUT-RC SUDA NLP Huang et al. COLING' 20	76.1	72.2	77.8
4 Oct 07, 2019	xlnet PingAnLifeInsuranceAI	73.3	70.0	75.0
5 Oct 11, 2019	xlnet Jiayun Lin	72.8	69.8	74.5
6 Oct 16, 2019	TKR(single model) Anonymous	69.0	65.6	71.2
7 Oct 27, 2019	bert(single model) FastNet	66.2	63.2	68.1
8 July 01, 2019	BERT Base Baseline (Xiong et al. ACL '19)	61.4	58.6	64.1
9 July 01, 2019	Generative Baseline (Xiong et al. ACL '19)	36.1	31.8	39.0
10 July 01, 2019	BiDAF Baseline (Xiong et al. ACL '19)	34.9	31.4	38.6
11 July 01, 2019	Query-Matching Baseline (Xiong et al. ACL '19)	11.2	12.1	17.4