Re: Unicode Character Database in RDF?

2011/1/5 Bernard Vatant <bernard.vatant@mondeca.com>

> Hi all
>
> Unless I miss something, Unicode characters as linked data have been
> published for quite a while by Gerard de Melo (in cc) at http://lexvo.org
> See e.g., http://lexvo.org/id/char/5A34
>


If I understand the request from Shane right, he focuses not on Unicode
characters, but the Unicode character database. The latter has a lot of
information e.g. about character properties available (see
http://unicode.org/cldr/utility/properties.jsp ) which is not incooperated
into lexvo.org

Felix


>
> Bernard
>
> 2011/1/4 Sampo Syreeni <decoy@iki.fi>
>
> On 2011-01-03, Ivan Herman wrote:
>>
>>  I have asked an advise from Felix Sasaki, (cc-d), who knows both Unicode
>>> and RDF. Here is his answer:
>>>
>>
>> In theory creating an RDF version should be a basic character manipulation
>> exercise. Take the character database file, assign surrogate keys to all of
>> the characters (they after all have already been painstakingly
>> unified/deduplicated/etc in a manner even most master data management
>> initiatives don't do), then assign a predicate to each of the fields, and
>> proceed to split the file into triples, omitting empty fields. Put up an OWL
>> schema, and you have a more than good base version of it in RDF.
>>
>> Were I running UNIX, the basic processing step would take perhaps half a
>> day, utilizing standard command line tools. I'm sure somebody around here
>> can do it even faster.
>> --
>> Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front
>> +358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
>>
>>
>
>
> --
> Bernard Vatant
> Senior Consultant
> Vocabulary & Data Engineering
> Tel:       +33 (0) 971 488 459
> Mail:     bernard.vatant@mondeca.com
> ----------------------------------------------------
> Mondeca
> 3, cité Nollez 75018 Paris France
> Web:    http://www.mondeca.com
> Blog:    http://mondeca.wordpress.com
> ----------------------------------------------------
>

Received on Wednesday, 5 January 2011 08:36:34 UTC