Sunday , 14 December 2025

Germany 100k.zip <FULL ⇒>

: Many versions include a brief summary for each article, allowing models to be trained on how to condense information.

This dataset typically contains extracted from German Wikipedia . It is widely used by researchers for tasks such as: Germany 100k.zip

: Approximately 100,000 documents with titles, tables, and images removed to provide clean, plain text. : Many versions include a brief summary for

: Providing a large corpus for both extractive and abstractive summarization techniques. 000 documents with titles

Multilingual Text Summarization for German Texts Using ... - MDPI

: These datasets often represent millions of individual word tokens, making them suitable for training small-to-medium scale language models.

While exact versions vary (such as the dataset hosted on Hugging Face ), these files generally include: