Wikimedia Foundation logoWikimedia Research

Language-Agnostic Topic Classification

No guarantees are made that this tool will be maintained.

This is an experimental tool hosted on Toolforge. No additional personal data is collected by this tool per the Cloud Services Terms of Use.

This tool showcases various language-agnostic topic classification models -- i.e. these models can label a Wikipedia article with one or more high-level topics using a single model that can provide predictions for any language edition of Wikipedia.

Wikidata Topic Classification

This first experimental API provides topic predictions for Wikipedia articles based on their Wikidata items.

Because it relies on Wikidata items, it is language-neutral and works for any Wikipedia article (with a Wikidata item, which is almost all of them).

Because it relies on Wikidata items, it has little data to go on with sparse Wikidata items.

You can test out the API below -- for example, with Toni Morrison (Q72334), which will give you a list of most likely topics for the article along with the model's confidence

Wikipedia Topic Classification

This second experimental API provides topic predictions for Wikipedia articles based on their links to other Wikipedia articles (outlinks only).

It represents these links as Wikidata items -- e.g., a link to en:Menhaden is represented as Q218526.

Because it relies on Wikidata items, it is language-neutral and works for any Wikipedia article. However, unlike the above purely-Wikidata model, this model uses an article's links and will give different predictions depending on the language version of the article and does not require that the article itself have an associated Wikidata item.

Because it relies on outlinks, it may have little data to go on for new articles that have few links or link to rarely-linked items (see performance).

You can test out the API below -- for example, with Toni Morrison (en)