Sports Named Entity Recognition (Sports NER) Dataset
This dataset is designed for Named Entity Recognition (NER) tasks in the sports domain, focusing on extracting structured information from unstructured sports-related text. Named Entity Recognition is a fundamental task in Natural Language Processing (NLP) that identifies and classifies key elements such as names, locations, and events within text data :contentReference[oaicite:0]{index=0}.
The dataset contains annotated sentences covering a wide range of sports contexts, including match descriptions, player performances, tournament details, and rule-based scenarios.
Each text sample is labeled with domain-specific entity categories, including:
- Player Name
- Team Name
- Tournament Name
- Location
- Equipment Name
- Rules or Penalty
- Common Sports Terms (CST)
- Date and Time
These annotations enable the development of customized NER models tailored to sports analytics, where traditional general-purpose NER systems may not perform effectively due to domain-specific terminology :contentReference[oaicite:1]{index=1}.
The dataset can be used for various applications such as:
- Sports analytics and information extraction
- Intelligent sports news summarization
- Chatbots and question-answering systems
- Automated commentary analysis
- Knowledge graph construction in sports domain
This dataset is publicly available on Mendeley Data and can be accessed via the following DOI: https://data.mendeley.com/datasets/rcf4kbxtf8/2
License: Creative Commons Attribution 4.0 (CC BY 4.0)
If you use this dataset in your research or project, please cite the original source.