Re: Unicode Character Database in RDF? from Felix Sasaki on 2011-01-06 (semantic-web@w3.org from January 2011)

From: Felix Sasaki <felix.sasaki@fh-potsdam.de>
Date: Thu, 6 Jan 2011 15:41:01 +0100
To: Shane Norris <norlesh@gmail.com>
Cc: Gerard de Melo <gdemelo@mpi-inf.mpg.de>, Bernard Vatant <bernard.vatant@mondeca.com>, Sampo Syreeni <decoy@iki.fi>, Ivan Herman <ivan@w3.org>, W3C Semantic Web IG <semantic-web@w3.org>
Message-ID: <AANLkTingYa=_CQdYqMS7eQe5BHVq=vpkLdfDvDV1XsRJ@mail.gmail.com>

Hello Shane,

if you want to express tokenization in a locale specific manner, the data in
CLDR might then also be useful for you, see e.g. the definitions at

http://unicode.org/reports/tr35/#Character_Elements
http://unicode.org/reports/tr35/#Collation_Elements
http://unicode.org/reports/tr35/#Delimiter_Elements

Felix

2011/1/5 Shane Norris <norlesh@gmail.com>

> Hi and thanks to all,
>    http://lexvo.org/id/char/xxxx definitely addresses my main
> criteria of a public URI per code point. As for the UCD ontology I can
> just add the properties to my own ontology for now.
> All though I didn't realise it when I posed the question it makes more
> sense now to have the UCD ontology defined in a different name space
> from the code point itself anyway since all the same code points have
> existed since Unicode1.0 whether they were assigned to or not, while
> there interpretations have been added to with successive versions of
> the standard (becomes relevant if your dealing with other legacy
> specifications).
>
> My use case is to be able to express tokenization rules (white space,
> foreign words, symbols ...) using RDF.
>
> Shane
>
> On Wed, Jan 5, 2011 at 10:27 PM, Felix Sasaki
> <felix.sasaki@fh-potsdam.de> wrote:
> > Hello Gerard,
> >
> > 2011/1/5 Gerard de Melo <gdemelo@mpi-inf.mpg.de>
> >>
> >> Hello Shane and others,
> >>>>
> >>>> Unless I miss something, Unicode characters as linked data have been
> >>>> published for quite a while by Gerard de Melo (in cc) at
> >>>> http://lexvo.org
> >>>> See e.g., http://lexvo.org/id/char/5A34
> >>>>
> >>>
> >>> If I understand the request from Shane right, he focuses not on Unicode
> >>> characters, but the Unicode character database. The latter has a lot of
> >>> information e.g. about character properties available (see
> >>> http://unicode.org/cldr/utility/properties.jsp ) which is not
> >>> incooperated
> >>> into lexvo.org
> >>
> >> As the maintainer of Lexvo.org, I could easily add a few additional
> >> character properties to the Lexvo ontology and RDF dump based
> >> on the particular use cases you are interested in. So far, I have
> >> tried to avoid creating a 1:1 mapping of all properties to predicates.
> >
> > This makes a lot of sense, since otherwise you get many triples without
> use
> > cases. Also, there are differences between the properties in terms of
> > stability and data sources. See
> > http://www.unicode.org/Public/5.1.0/ucd/UCD.html for what is available
> in
> > version 5.1. Note also that there is information about characters which
> is
> > not in the properties' data base, e.g. whether a character can be used in
> an
> > internationlized domain name or not, see
> > http://unicode.org/cldr/utility/idna.jsp . So again, it really depends
> on
> > the use case what information you want in the RDF representation.
> >
> > Felix
> >
> >>
> >> Best regards,
> >> Gerard
> >>
> >> --
> >> Gerard de Melo [demelo@mpi-inf.mpg.de]
> >> Max Planck Institute for Informatics
> >> http://www.mpi-inf.mpg.de/~gdemelo/<http://www.mpi-inf.mpg.de/%7Egdemelo/>
> >>
> >
> >
>

Received on Thursday, 6 January 2011 14:42:26 UTC