: A large-scale dataset containing approximately 92,000 computer science papers from 31 major conferences. It includes AI-generated summaries (GPT-3.5) designed for large-scale scientometric studies and automated literature reviews.
If you are looking for "txt" files related to AI crawling, you might be interested in the proposal. Download 273k txt
: A massive collection of 1.14 billion content regions from historical American newspaper articles. It is used for training large language models (LLMs) and exploring world history. Download 273k txt