Downstream Wikipedia link usage and migration to Wikidata

MusicBrainz has linked to Wikipedia for many years and we now have links to Wikidata as well. Wikidata, however, acts as a central repository for Wikipedia links, so it does not make sense for MusicBrainz to maintain its own separate set of Wikipedia links, especially since Wikipedia URLs are not very stable (because of page moves and deletions) and require a lot of maintenance. Most of our data with Wikipedia links is now also linked to Wikidata, so we plan to start removing Wikipedia links where we have a Wikidata link which has the same Wikipedia link.

What this means for downstream data users:

If you use Wikipedia links, we will provide Wikidata links but you will need to fetch the Wikipedia links you want from Wikidata separately. Wikidata has information on ways to access their data at https://www.wikidata.org/wiki/Wikidata:Data_access

We plan to start removing the links after the schema change this month, starting with the less common languages and entity types. It will take a while to work through the existing links, so we don’t expect to start removing English links from artists until after the Autumn schema change.

We recognise that some people may have code which depends on these links – if you’re using these links and the above sounds problematic, please let us know how you’re using the data (which languages and entity types) and how much time you would need to support Wikidata.

21 thoughts on “Downstream Wikipedia link usage and migration to Wikidata”

  1. Although I understand you reasons that are perfectly logical there is one major reason I dislike this idea.
    wikidata links are so difficult for an average user to find from the wikipedia page so the barrier to entry is higher and contributions will decrease.
    For example, often now increasing traffic to sites is via a mobile device, from the mobile (.m.) wikipedia page there is no way I can see of finding out what the wikidata number is for that page/item of data. Even on the desktop page it is buried in a link (19th link down in a few dozen links).
    If you are to do this please have a way of automatically converting the easy to enter wikipedia url to the wikidata number.

  2. Yes, I totally agree with Rovastar. From the end user perspective it should not matter whether I enter a Wikipedia or Wikidata URL, the MB backend should take care of this. I also hope the MB interface will still provide the users easy to follow links to Wikipedia.

    I also don’t quite understand why breaking existing apps is really required here. Couldn’t the MB webservice continue to return Wikipedia links when a Wikidata entry is available? As far as I understand the server needs to fetch and probably cache those URLs anyway for displaying the Wikipedia snippet and providing the Wikipedia links to the user.

  3. There is certainly no plan to stop allowing people to paste Wikipedia links.

    Initially, pasting Wikipedia links will continue to work just like it currently does. The only difference is that we will start to remove Wikipedia links from entities which *also* have a Wikidata link. That won’t affect how users add the URLs.

    In the long term, we should make the conversion happen as soon as the user enters the Wikipedia URL. The ticket for that is http://tickets.musicbrainz.org/browse/MBS-8333 which discusses various options.

    @phwolfer: The idea has been mentioned a few times, but it seems to be a theoretical problem and if it is, it would be a huge waste of our very limited developer time to implement and maintain. If what we plan to do _is_ a problem for someone, this blog post is their chance to let us know. 🙂

  4. > There is certainly no plan to stop allowing people to paste Wikipedia links.

    Which wouldn’t be possible anyway, as many Wikipedia pages don’t have Wikidata items.

    I suppose a bot will be able to changed WP URLs to WD URLs when they are available.

  5. @nikki,

    Ok but why are you not doing that ticket first?

    Often I get confused at the ordering that things happen at musicbrainz.

    I have no real issues with having wikidata links instead where applicable but you should have all the other systems in place first.
    * work out what you will see in the summary section where the current wikipedia text is.
    * Have a system to automatically convert them.
    * have a system that stops people a wikipedia link If the same wikipedia link exists
    etc

  6. @Rovastar: We *are* doing that ticket before going ahead with this. Note that this blog post is not saying “we’re doing this *now*!” but rather it is saying “we’re going to do this Soon™” and asking for feedback from data users for whom it may be a problem. “Regular” users of the site should ideally not notice (much) of a difference once this actually goes into effect.

    “work out what you will see in the summary section where the current wikipedia text is.”
    As reosarevok said, the summary section is already pulling its data using Wikidata.

    “Have a system to automatically convert them.”
    There’s a bot currently adding WD links to all entities with WP links. Nikki already pointed out that there’s a ticket to convert WP links to WD ones on user-entry.

    “have a system that stops people a wikipedia link If the same wikipedia link exists”
    The current URL entry system already handles this.

  7. The blog post certainly reads like the removal will start right after the next schema change release( “We plan to start removing the links after the schema change this month, starting with the less common languages and entity types”).

  8. Sorry for the late reply.
    Yes I was reading it very much so like this is happening very soon. And I read nikkis comments about “long term” solution to fixing the links as something that would happen months or years away not days, weeks away.

  9. Yes, links to non-English Wikipedia editions will start to be removed soon. This will happen one link at a time, so the process will be spread out over a longer time. Freso got it wrong in his first paragraph.

    (Also, his last paragraph is technically right, but I guess Rovastar actually meant “have a system that stops people [add] a wikipedia link if the *corresponding Wikidata* link exists”. That will not be implemented, but a redundant Wikipedia link will be bot-removed at some point.)

  10. I am still a bit confused, will the removal of the Wikipedia links mean there won’t be any links to Wikipedia directly on the MB sites? Those are the links I use frequently when researching. Could those links still be displayed using the info from the Wikidata AR?

  11. Normal users do notice the change. I suppose that more people can read German than Georgian. http://musicbrainz.org/label/098e4074-d539-4160-80e1-d65cee643edd (Cotta’sche Buchhandlung), the summary, as given now won’t help them.
    Even people who read both languages probably prefer the German wikipedia, because it has far more information.
    The last point is important, I think. It is hard for sortware to find out which wikipedia articles are readworthy.

  12. I really don’t have a problem with the idea as such (Single Source of Truth, etc.), but I would really prefer if the UI wasn’t so horrible. If I go to https://musicbrainz.org/artist/063391e2-ffec-45fe-ba01-d4b7a838955d then the external link labelled “Q11879289” really doesn’t bring much hope. Also when I click on it against all better instincts in me, I get to https://www.wikidata.org/wiki/Q11879289 and I cannot say that it makes much more sense to the average user (at least, as much as I can pretend to be an average user). Could something be done about the UI, please?

  13. While i understand the motivations behind this move, i really dislike the useful wikipedia direct links are removed. Having just the wikidata link is for me (as editor) a loss of time and information, because you ends on a very unfriendly page, with many unsorted (by human) links.

    IE. if i search data about a band and find valuable information on spanish wikipedia, i don’t want this to be replaced by a wikidata link (because it doesn’t give info about which specific page has valuable infos). Also there are cases of UK bands more known in France than in UK, having a very poor english wikipedia page, but a very instructive french page. A german MusicBrainz user/editor may not choose to look at the french page in this case.

    I would have drop wikipedia links but store and preserve a “hint” about which wikipedia pages were thought useful, and re-use those “hints” when it comes to display links to wikipedia obtained from the wikidata.
    Note, those hints are basically just which languages have useful infos about this album/artist/place/etc…

    For each entity having a wikidata link i would always display:
    – wikipedia link matching the language of the user, english for non-beta servers
    – main languages from the country the entity is (ie. artist from Spain, spanish ones, place from Germany, german ones, etc…), but matching countrylanguage isn’t perfect, and this info isn’t always available
    – every wikipedia matching “hints” given by editors, ie. if an editor gaves a hint about the french page for a place in Germany, french wikipedia link should be displayed as well.

  14. Erik: The language fallback does need improving. See http://tickets.musicbrainz.org/browse/MBS-8417 for suggestions.

    Zas: We’ve never had a way to say which pages are considered good articles, we’ve always linked to English plus the artist’s native language(s).

    To everyone asking about direct Wikipedia links: We do still link directly to Wikipedia under the Wikipedia extract, so if the link under the extract doesn’t do what you want (and it’s not because it picks the wrong language which is already covered by http://tickets.musicbrainz.org/browse/MBS-8417), it’s not clear what you actually want to see and I recommend creating a ticket in JIRA describing what you would like to see and where you’d like to see it.

  15. The removal of Wikipedia links from the user interface has removed the users access to a lot of non-english Wikipedia pages.
    The Wikipedia extract and the link below it is often useless for the minor languages, because a page in a major language is chosen, but that is not always the best page on the subject, and even worse, the language chosen may be one that the user can understand.
    Yes, I know that one can find the Wikipedia pages in the languages one prefers on the Wikidata page, but that is really not very user friendly.

  16. Sorry, the sentence: the language chosen may be one that the user can understand
    should have been: the language chosen may be one that the user don’t even understand.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.