- From: Felix Sasaki <felix.sasaki@fh-potsdam.de>
- Date: Thu, 6 Jan 2011 15:41:01 +0100
- To: Shane Norris <norlesh@gmail.com>
- Cc: Gerard de Melo <gdemelo@mpi-inf.mpg.de>, Bernard Vatant <bernard.vatant@mondeca.com>, Sampo Syreeni <decoy@iki.fi>, Ivan Herman <ivan@w3.org>, W3C Semantic Web IG <semantic-web@w3.org>
- Message-ID: <AANLkTingYa=_CQdYqMS7eQe5BHVq=vpkLdfDvDV1XsRJ@mail.gmail.com>
Hello Shane, if you want to express tokenization in a locale specific manner, the data in CLDR might then also be useful for you, see e.g. the definitions at http://unicode.org/reports/tr35/#Character_Elements http://unicode.org/reports/tr35/#Collation_Elements http://unicode.org/reports/tr35/#Delimiter_Elements Felix 2011/1/5 Shane Norris <norlesh@gmail.com> > Hi and thanks to all, > http://lexvo.org/id/char/xxxx definitely addresses my main > criteria of a public URI per code point. As for the UCD ontology I can > just add the properties to my own ontology for now. > All though I didn't realise it when I posed the question it makes more > sense now to have the UCD ontology defined in a different name space > from the code point itself anyway since all the same code points have > existed since Unicode1.0 whether they were assigned to or not, while > there interpretations have been added to with successive versions of > the standard (becomes relevant if your dealing with other legacy > specifications). > > My use case is to be able to express tokenization rules (white space, > foreign words, symbols ...) using RDF. > > Shane > > On Wed, Jan 5, 2011 at 10:27 PM, Felix Sasaki > <felix.sasaki@fh-potsdam.de> wrote: > > Hello Gerard, > > > > 2011/1/5 Gerard de Melo <gdemelo@mpi-inf.mpg.de> > >> > >> Hello Shane and others, > >>>> > >>>> Unless I miss something, Unicode characters as linked data have been > >>>> published for quite a while by Gerard de Melo (in cc) at > >>>> http://lexvo.org > >>>> See e.g., http://lexvo.org/id/char/5A34 > >>>> > >>> > >>> If I understand the request from Shane right, he focuses not on Unicode > >>> characters, but the Unicode character database. The latter has a lot of > >>> information e.g. about character properties available (see > >>> http://unicode.org/cldr/utility/properties.jsp ) which is not > >>> incooperated > >>> into lexvo.org > >> > >> As the maintainer of Lexvo.org, I could easily add a few additional > >> character properties to the Lexvo ontology and RDF dump based > >> on the particular use cases you are interested in. So far, I have > >> tried to avoid creating a 1:1 mapping of all properties to predicates. > > > > This makes a lot of sense, since otherwise you get many triples without > use > > cases. Also, there are differences between the properties in terms of > > stability and data sources. See > > http://www.unicode.org/Public/5.1.0/ucd/UCD.html for what is available > in > > version 5.1. Note also that there is information about characters which > is > > not in the properties' data base, e.g. whether a character can be used in > an > > internationlized domain name or not, see > > http://unicode.org/cldr/utility/idna.jsp . So again, it really depends > on > > the use case what information you want in the RDF representation. > > > > Felix > > > >> > >> Best regards, > >> Gerard > >> > >> -- > >> Gerard de Melo [demelo@mpi-inf.mpg.de] > >> Max Planck Institute for Informatics > >> http://www.mpi-inf.mpg.de/~gdemelo/<http://www.mpi-inf.mpg.de/%7Egdemelo/> > >> > > > > >
Received on Thursday, 6 January 2011 14:42:26 UTC