Re: Unicode Character Database in RDF? from Bernard Vatant on 2011-01-05 (semantic-web@w3.org from January 2011)

From: Bernard Vatant <bernard.vatant@mondeca.com>
Date: Wed, 5 Jan 2011 09:17:01 +0100
To: Sampo Syreeni <decoy@iki.fi>
Cc: Ivan Herman <ivan@w3.org>, Shane Norris <norlesh@gmail.com>, W3C Semantic Web IG <semantic-web@w3.org>, Felix Sasaki <felix.sasaki@fh-potsdam.de>, Gerard de Melo <gdemelo@mpi-inf.mpg.de>
Message-ID: <AANLkTikquiT2eKC3m-MYwT8wcmZSJ3-KyednuG_0mEcM@mail.gmail.com>

Hi all

Unless I miss something, Unicode characters as linked data have been
published for quite a while by Gerard de Melo (in cc) at http://lexvo.org
See e.g., http://lexvo.org/id/char/5A34

Bernard

2011/1/4 Sampo Syreeni <decoy@iki.fi>

> On 2011-01-03, Ivan Herman wrote:
>
>  I have asked an advise from Felix Sasaki, (cc-d), who knows both Unicode
>> and RDF. Here is his answer:
>>
>
> In theory creating an RDF version should be a basic character manipulation
> exercise. Take the character database file, assign surrogate keys to all of
> the characters (they after all have already been painstakingly
> unified/deduplicated/etc in a manner even most master data management
> initiatives don't do), then assign a predicate to each of the fields, and
> proceed to split the file into triples, omitting empty fields. Put up an OWL
> schema, and you have a more than good base version of it in RDF.
>
> Were I running UNIX, the basic processing step would take perhaps half a
> day, utilizing standard command line tools. I'm sure somebody around here
> can do it even faster.
> --
> Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front
> +358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
>
>


-- 
Bernard Vatant
Senior Consultant
Vocabulary & Data Engineering
Tel:       +33 (0) 971 488 459
Mail:     bernard.vatant@mondeca.com
----------------------------------------------------
Mondeca
3, cité Nollez 75018 Paris France
Web:    http://www.mondeca.com
Blog:    http://mondeca.wordpress.com
----------------------------------------------------

Received on Wednesday, 5 January 2011 08:21:36 UTC