- From: zazi <zazi@elbklang.net>
- Date: Mon, 07 Jun 2010 14:51:33 +0200
- To: public-lod@w3.org
Hi Leigh, I contacted you a while ago re. the Discogs dataset. It would be glad, if you could send me the mapping code you wrote for the Discogs RDFizer. I plan to include such a RDFizer also in my Master-like project, so I would be very happy to have a nice starting project, where I can continue.. I would like to make heavy use of the extended release concept, which is included into the Music Ontology 2.0. Cheers, Thomas Am 07.06.2010 14:12, schrieb Leigh Dodds: > Hi, > > As already noted the discogs data is still live, but I failed to load > the void description hence the home page not resolving properly. > > I'll aim to get that fixed ASAP. > > As Kurt pointed out the code for the conversion is available if anyone > needs it. I was intending to try and hack up a fix for the encoding > issue as you've described below. Its not pretty but should do the job. > > The cross-links that exist so far are based purely by generating links > based on the existing web page links in the discogs data. So if there > are problems there then these may also be issues with the data. > > My overall goal here was to explore a full conversion of the dataset, > using the full music ontology and with as high a fidelity as possible. > However its a spare time project, hence the outstanding issue list. > I'm happy to collaborate with people on extending the code as > required. > > I also hope to convince the discogs maintainers to adopt Linked Data also. > > Cheers, > > L. > > On 4 June 2010 05:15,<mats.gls@gmail.com> wrote: >>> this is a data set i really want too!!!! somebody know a way around >>> the unicode problem??? >>> >> Maybe find stuff like these "ï" with a regexp and then replace >> them with the correct unicode chars. >> In Python something like this looped through each line of the files should >> work I think: >> import re >> teststr = 'Tchaïkovsky' >> regex = re.compile(r'(?<!(&#\d{3};))(&#\d{3};){2}(?!(&#\d{3};))') >> rObj = re.search(regex, teststr) >> if rObj is not None: >> hexValues = [hex(int(rObj.group()[2:5])), hex(int(rObj.group()[8:11]))] >> newChar = ''.join([chr(int(c, 16)) for c in hexValues]).decode('utf8') >> print re.sub(regex, newChar, teststr) >> output>Tchaïkovsky >> I've posted a more complete version here http://pastebin.com/vuq72irC >> Cheers, >> Mats
Received on Tuesday, 8 June 2010 12:14:17 UTC