W3C home > Mailing lists > Public > semantic-web@w3.org > January 2011

Re: Unicode Character Database in RDF?

From: Shane Norris <norlesh@gmail.com>
Date: Thu, 6 Jan 2011 04:34:59 +1100
Message-ID: <AANLkTi=5xOW_yJJ+m=qhmTKCSAgqkLzcckET_ZCmZw6W@mail.gmail.com>
To: Felix Sasaki <felix.sasaki@fh-potsdam.de>
Cc: Gerard de Melo <gdemelo@mpi-inf.mpg.de>, Bernard Vatant <bernard.vatant@mondeca.com>, Sampo Syreeni <decoy@iki.fi>, Ivan Herman <ivan@w3.org>, W3C Semantic Web IG <semantic-web@w3.org>
Hi and thanks to all,
    http://lexvo.org/id/char/xxxx definitely addresses my main
criteria of a public URI per code point. As for the UCD ontology I can
just add the properties to my own ontology for now.
All though I didn't realise it when I posed the question it makes more
sense now to have the UCD ontology defined in a different name space
from the code point itself anyway since all the same code points have
existed since Unicode1.0 whether they were assigned to or not, while
there interpretations have been added to with successive versions of
the standard (becomes relevant if your dealing with other legacy

My use case is to be able to express tokenization rules (white space,
foreign words, symbols ...) using RDF.


On Wed, Jan 5, 2011 at 10:27 PM, Felix Sasaki
<felix.sasaki@fh-potsdam.de> wrote:
> Hello Gerard,
> 2011/1/5 Gerard de Melo <gdemelo@mpi-inf.mpg.de>
>> Hello Shane and others,
>>>> Unless I miss something, Unicode characters as linked data have been
>>>> published for quite a while by Gerard de Melo (in cc) at
>>>> http://lexvo.org
>>>> See e.g., http://lexvo.org/id/char/5A34
>>> If I understand the request from Shane right, he focuses not on Unicode
>>> characters, but the Unicode character database. The latter has a lot of
>>> information e.g. about character properties available (see
>>> http://unicode.org/cldr/utility/properties.jsp ) which is not
>>> incooperated
>>> into lexvo.org
>> As the maintainer of Lexvo.org, I could easily add a few additional
>> character properties to the Lexvo ontology and RDF dump based
>> on the particular use cases you are interested in. So far, I have
>> tried to avoid creating a 1:1 mapping of all properties to predicates.
> This makes a lot of sense, since otherwise you get many triples without use
> cases. Also, there are differences between the properties in terms of
> stability and data sources. See
> http://www.unicode.org/Public/5.1.0/ucd/UCD.html for what is available in
> version 5.1. Note also that there is information about characters which is
> not in the properties' data base, e.g. whether a character can be used in an
> internationlized domain name or not, see
> http://unicode.org/cldr/utility/idna.jsp . So again, it really depends on
> the use case what information you want in the RDF representation.
> Felix
>> Best regards,
>> Gerard
>> --
>> Gerard de Melo [demelo@mpi-inf.mpg.de]
>> Max Planck Institute for Informatics
>> http://www.mpi-inf.mpg.de/~gdemelo/
Received on Wednesday, 5 January 2011 17:36:18 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:22 UTC