W3C home > Mailing lists > Public > public-lod@w3.org > June 2010

Re: Discogs Linked Data

From: Leigh Dodds <leigh.dodds@talis.com>
Date: Mon, 7 Jun 2010 13:12:25 +0100
Message-ID: <AANLkTik5nkGKRJ2_SUEUc62A8_l9CEwmehcQEPhDPea_@mail.gmail.com>
To: mats.gls@gmail.com
Cc: Kurt J <kurtjx@gmail.com>, public-lod@w3.org

As already noted the discogs data is still live, but I failed to load
the void description hence the home page not resolving properly.

I'll aim to get that fixed ASAP.

As Kurt pointed out the code for the conversion is available if anyone
needs it. I was intending to try and hack up a fix for the encoding
issue as you've described below. Its not pretty but should do the job.

The cross-links that exist so far are based purely by generating links
based on the existing web page links in the discogs data. So if there
are problems there then these may also be issues with the data.

My overall goal here was to explore a full conversion of the dataset,
using the full music ontology and with as high a fidelity as possible.
However its a spare time project, hence the outstanding issue list.
I'm happy to collaborate with people on extending the code as

I also hope to convince the discogs maintainers to adopt Linked Data also.



On 4 June 2010 05:15,  <mats.gls@gmail.com> wrote:
>> this is a data set i really want too!!!!  somebody know a way around
>> the unicode problem???
> Maybe find stuff like these "&#195;&#175;" with a regexp and then replace
> them with the correct unicode chars.
> In Python something like this looped through each line of the files should
> work I think:
> import re
> teststr = 'Tcha&#195;&#175;kovsky'
> regex = re.compile(r'(?<!(&#\d{3};))(&#\d{3};){2}(?!(&#\d{3};))')
> rObj = re.search(regex, teststr)
> if rObj is not None:
>   hexValues = [hex(int(rObj.group()[2:5])), hex(int(rObj.group()[8:11]))]
>   newChar = ''.join([chr(int(c, 16)) for c in hexValues]).decode('utf8')
>   print re.sub(regex, newChar, teststr)
> output>Tchaïkovsky
> I've posted a more complete version here http://pastebin.com/vuq72irC
> Cheers,
> Mats
> ________________________________
> Please consider the environment before printing this email.
> Find out more about Talis at http://www.talis.com/
> shared innovation™
> Any views or personal opinions expressed within this email may not be those
> of Talis Information Ltd or its employees. The content of this email message
> and any files that may be attached are confidential, and for the usage of
> the intended recipient only. If you are not the intended recipient, then
> please return this message to the sender and delete it. Any use of this
> e-mail by an unauthorised recipient is prohibited.
> Talis Information Ltd is a member of the Talis Group of companies and is
> registered in England No 3638278 with its registered office at Knights
> Court, Solihull Parkway, Birmingham Business Park, B37 7YB.

Leigh Dodds
Programme Manager, Talis Platform
Received on Monday, 7 June 2010 12:13:17 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:16:06 UTC