Question Size and Corpus Size

Year Datasets Question size #Train questions #Dev questions #Test questions Corpus size #Train Corpus #Dev Corpus #Test Corpus Unit of Corpus
2013 MCTest mc160 640 280 120 240 160 70 30 60 Stories
2013 MCTest mc500 2000 1200 200 600 500 300 50 150 Stories
2015 CNN 387420 380298 3924 3198 92579 90266 1220 1093 Documents
2015 CuratedTREC 2180 1486 N/A 694 N/A N/A N/A N/A N/A
2015 Daily Mail 997467 879450 64835 53182 219506 196961 12148 10397 Documents
2015 WikiQA 3047 2118 296 633 29258 20360 2733 6165 Sentences
2016 BookTest 14160825 14140825 10000 10000 14062 N/A N/A N/A Books
2016 Facebook CBT 687K 669343 8000 10000 108 98 5 5 Books
2016 Google MC-AFP 1742618 1727423 7602 7593 1742618 1727423 7602 7593 Passages
2016 LAMBADA 10022 2662 4869 5153 12684 2662 4869 5153 Passages
2016 MovieQA 21406 14166 2844 4396 548 362 77 109 Movies
2016 MS MARCO 1010916 N/A N/A N/A N/A N/A N/A N/A Passages
2016 NewsQA 119633 107K 6K 6K 1010916 909824 50546 50546 Documents
2016 SQuAD1.1 100K 87599 10570 9533 536 442 48 46 Articles
2016 Who-did-What 200K 127786 10000 10000 147786 127786 10000 10000 Passages
2016 WikiMovies 100k 96185 N/A 9952 N/A N/A N/A N/A N/A
2016 WikiReading 18.87M 16.03M 1.89M 0.95M 4.7M N/A N/A N/A Articles
2017 COMICS N/A N/A N/A N/A 3948 N/A N/A N/A Books
2017 NarrativeQA 46765 32747 3461 10557 1572 1102 115 355 Documents
2017 Qangaroo-MEDHOP 2508 1620 342 546 2508 1620 342 546 Passages
2017 Qangaroo-WIKIHOP 51318 43738 5129 2451 51318 43738 5129 2451 Passages
2017 Quasar-S 37362 31049 3174 3139 37362 31049 3174 3139 Passages
2017 Quasar-T 43013 37012 3000 3000 43012 37012 3000 3000 Passages
2017 RACE 97687 87866 4887 4934 27933 25137 1389 1407 Passages
2017 SciQ dataset 13679 11679 1000 1000 0 N/A N/A N/A Passages
2017 SearchQA 140461 99820 13393 27248 140461 99820 13393 27248 Passages
2017 Textbook Question Answering (TQA) 26260 15154 5309 5797 1076 666 200 210 Lessons
2017 TriviaQA (Wiki) 77582 61888 7993 7701 138538 110648 14229 13661 Documents
2017 TriviaQA(Web) 95956 76496 9951 9509 662659 528979 68621 65059 Documents
2018 ARC-Challenge Set 2590 1119 299 1172 N/A N/A N/A N/A Passages
2018 ARC-Easy Set 5197 2251 570 2376 N/A N/A N/A N/A Passages
2018 CliCR 104919 91344 6391 7184 N/A N/A N/A N/A Passages
2018 CLOTH 99433 76850 11067 11516 7131 5513 805 813 Passages
2018 CoQA 127k 110K 7K 10K 8399 7199 500 700 Passages
2018 DuoRC-Paraphrase 100316 70k 15k 15k N/A N/A N/A N/A Passages
2018 DuoRC-Self 85773 60k 12k 12k N/A N/A N/A N/A Passages
2018 HotpotQA(Distractor Setting) 113k 90564 7405 7405 N/A N/A N/A N/A Examples
2018 HotpotQA(Fullwiki Setting) 113k 90564 7405 7405 N/A N/A N/A N/A Examples
2018 MCScript 13939 9731 1411 2797 0 N/A N/A N/A Passages
2018 MultiRC 6k 5147 953 N/A N/A N/A N/A N/A Passages
2018 OpenBookQA 5957 4957 500 500 N/A N/A N/A N/A Passages
2018 PaperQA-Last Sentence(Park etc 2018) 80118 71804 4179 4135 N/A N/A N/A N/A Passages
2018 PaperQA-Title(Park etc 2018) 84803 77298 3752 3753 N/A N/A N/A N/A Passages
2018 ProPara 488 391 54 43 N/A N/A N/A N/A Passages
2018 QuAC 98407 83568 7354 7353 8845 6843 1000 1002 Unique sections
2018 RecipeQA 36K 29657 3562 3567 19779 15847 1963 1969 Recipes
2018 ReCoRD 120730 100730 10000 10000 80121 65709 7133 7279 Passages
2018 ReviewQA 587492 528665 N/A 58827 100000 90000 N/A 10000 Documents
2018 SciTail Dataset 1834 1542 121 171 27026 23596 1304 2126 Examples
2018 SQuAD2.0 151054 130319 11873 8862 505 442 35 28 Articles
2019 CommonSenseQA 12247 6259 2449 2449 0 N/A N/A N/A Passages
2019 DREAM 10197 6116 2040 2041 6444 3869 1288 1287 Dialogues
2019 DROP 96567 77409 9536 9622 6735 5565 582 588 Passages
2019 Natural Question(Long answer) 323045 307373 7830 7842 323045 307373 7830 7842 Wikipedia Pages
2019 Natural Question(Short answer) 323045 307373 7830 7842 323045 307373 7830 7842 Wikipedia Pages
2019 ShARC 948 628 69 251 32436 21890 2270 8276 Utterances