Re: Discogs Linked Data from zazi on 2010-06-07 (public-lod@w3.org from June 2010)

From: zazi <zazi@elbklang.net>
Date: Mon, 07 Jun 2010 14:51:33 +0200
To: public-lod@w3.org
Message-ID: <4C0CEB55.4090406@elbklang.net>
Hi Leigh,

I contacted you a while ago re. the Discogs dataset. It would be glad, 
if you could send me the mapping code you wrote for the Discogs RDFizer. 
I plan to include such a RDFizer also in my Master-like project, so I 
would be very happy to have a nice starting project, where I can continue..
I would like to make heavy use of the extended release concept, which is 
included into the Music Ontology 2.0.

Cheers,

Thomas

Am 07.06.2010 14:12, schrieb Leigh Dodds:
> Hi,
>
> As already noted the discogs data is still live, but I failed to load
> the void description hence the home page not resolving properly.
>
> I'll aim to get that fixed ASAP.
>
> As Kurt pointed out the code for the conversion is available if anyone
> needs it. I was intending to try and hack up a fix for the encoding
> issue as you've described below. Its not pretty but should do the job.
>
> The cross-links that exist so far are based purely by generating links
> based on the existing web page links in the discogs data. So if there
> are problems there then these may also be issues with the data.
>
> My overall goal here was to explore a full conversion of the dataset,
> using the full music ontology and with as high a fidelity as possible.
> However its a spare time project, hence the outstanding issue list.
> I'm happy to collaborate with people on extending the code as
> required.
>
> I also hope to convince the discogs maintainers to adopt Linked Data also.
>
> Cheers,
>
> L.
>
> On 4 June 2010 05:15,<mats.gls@gmail.com>  wrote:
>>> this is a data set i really want too!!!!  somebody know a way around
>>> the unicode problem???
>>>
>> Maybe find stuff like these "&#195;&#175;" with a regexp and then replace
>> them with the correct unicode chars.
>> In Python something like this looped through each line of the files should
>> work I think:
>> import re
>> teststr = 'Tcha&#195;&#175;kovsky'
>> regex = re.compile(r'(?<!(&#\d{3};))(&#\d{3};){2}(?!(&#\d{3};))')
>> rObj = re.search(regex, teststr)
>> if rObj is not None:
>>    hexValues = [hex(int(rObj.group()[2:5])), hex(int(rObj.group()[8:11]))]
>>    newChar = ''.join([chr(int(c, 16)) for c in hexValues]).decode('utf8')
>>    print re.sub(regex, newChar, teststr)
>> output>Tchaïkovsky
>> I've posted a more complete version here http://pastebin.com/vuq72irC
>> Cheers,
>> Mats
Received on Tuesday, 8 June 2010 12:14:17 UTC