Re: Unicode Character Database in RDF? from Sampo Syreeni on 2011-01-04 (semantic-web@w3.org from January 2011)

From: Sampo Syreeni <decoy@iki.fi>
Date: Tue, 4 Jan 2011 23:42:13 +0200 (EET)
To: Ivan Herman <ivan@w3.org>
cc: Shane Norris <norlesh@gmail.com>, W3C Semantic Web IG <semantic-web@w3.org>, Felix Sasaki <felix.sasaki@fh-potsdam.de>
Message-ID: <Pine.LNX.4.64.1101042337270.23780@lakka.kapsi.fi>

On 2011-01-03, Ivan Herman wrote:

> I have asked an advise from Felix Sasaki, (cc-d), who knows both 
> Unicode and RDF. Here is his answer:

In theory creating an RDF version should be a basic character 
manipulation exercise. Take the character database file, assign 
surrogate keys to all of the characters (they after all have already 
been painstakingly unified/deduplicated/etc in a manner even most 
master data management initiatives don't do), then assign a predicate to 
each of the fields, and proceed to split the file into triples, omitting 
empty fields. Put up an OWL schema, and you have a more than good base 
version of it in RDF.

Were I running UNIX, the basic processing step would take perhaps half a 
day, utilizing standard command line tools. I'm sure somebody around 
here can do it even faster.
-- 
Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

Received on Tuesday, 4 January 2011 21:47:54 UTC