A common problem for users of MusicBrainz is that of synchronizing a local collection against the main MusicBrainz servers. Our current rate limit stipulates that you make at most 1 request per second, which we understand is extremely limiting – especially if you’re trying to fetch thousands of releases! During our first hack weekend, we created the beginnings of a service to allow you to get a list of MBIDs that have been updated. We have finished the preliminaries of this service, and now we need to hear from you how you’d want to utilize this.
The most basic data we currently gather is a JSON document containing a list of MBIDs that have changed per hour. For each of our data replication packets, we generate a JSON packet that summarizes all of the MBIDs that have changed, either directly on indirectly (such as the addition of more relationships).
A “What’s Changed?” Service
The first piece of feedback we received was that people were not really interested in consuming this data stream, but would rather have a service that allows them to query what data has changed in a given window of time. Having to manually fetch packets and perform set intersections is not particularly difficult, but the more hoops people have to jump through, the less likely they are to even use the service. We’ve been pondering how best to implement this service, and we would like feedback on the following options:
Filter a list of MBIDs
The service would allow you to
POSTa set of MBIDs, and would in turn return the subset of these MBIDs that have been changed. You are able to specify any date and have all changes since that date. For example, you could find all changes to all releases in your library since you last checked 2 weeks ago.
Because every MBID would take 36 bytes to submit, there will be a limit on the amount of MBIDs that can be submitted in order to preserve bandwidth.
Provide client libraries
Rather than having people craft their own web service requests, MusicBrainz should provide a library to do this. This will allow us to use more advanced techniques (for example, Bloom filters) to both conserve bandwidth, and allow for larger queries. In this scheme the web service will be documented, but users are not expected to consume it directly.
MusicBrainz could offer a simplified API, which is based on option 1, while also supporting larger queries through option 2. For example, we might limit option 1 to have a maximum of 4000 MBIDs per request/response, while the service that depends on our client libraries could handle many more.
Allow filtering based on collections
MusicBrainz already has the concepts of collections, which have an associated unique identifier, so these will be used to filter the list of changes. This limits the service to only deal with releases, and will require people set up collections before they can do queries. Again, due to the possibility of large collections, there will likely be pagination on responses – though the per-page limit will probably be fairly high.
These are the ideas that we’ve been debating, and we’d love to know which of these would work for you. If you have other ideas, we’re also very interested in hearing what those are!