- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 12 Mar 2002 17:16:08 +0900
- To: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>, <w3c-rdfcore-wg@w3.org>, <w3c-i18n-ig@w3.org>
Hello Jeremy, Many thanks for your examples, which show the cases very well. At 11:23 02/03/04 +0000, Jeremy Carroll wrote: >I wish to make the NFC issue more real to the wg by constructing a >realistic fraudulent piece of RDF which could be used as part of a scam. >error003.rdf in the zip shows how a fraudulent promisary note could be >constructed. > >error001a.rdf and error002a.rdf are identical as RDF to error001 and >error002 but use XML character references so that the XML is in NFC, and >the examples written in this way no longer appear fraudulent, just odd. They don't appear fraudulent if you look at the XML source. But if you look at them with a tool, they will still look so, I guess. >error001b.rdf is in NFC but uses XML comments to separate the two parts of >the non normalized characters. The current version of charmod makes the >treatment of files like this as language dependent: ><http://www.w3.org/TR/2002/WD-charmod-20020220#sec-fully-normalized>http:// >www.w3.org/TR/2002/WD-charmod-20020220<http://www.w3.org/TR/2002/WD-charmod >-20020220#sec-fully-normalized>#sec-fully-normalized >[[[ >Formal languages define constructs, which are identifiable pieces occuring >in instances of the language such as comments, element tags, processing >instructions, runs of character data, etc. Which of those constructs need >to be constrained not to begin with a composing character is >language-dependent and depends on what processing the language undergoes. >]]] > >I18N-IG, if we were to make non normalized literals ill-formed RDF how >easy is it to check that a string is in NFC? Is there public domain code >we can use? Pointers please. There is public-domain code: - Charlint (http://www.w3.org/International/charlint/), in perl and written more for clarity (I hope) than efficiency, in particular because it reads in the whole Unicode data file before doing anything. - http://www.unicode.org/unicode/reports/tr15/Normalizer.html, a small demo working on a subset of base and combining characters. - ICU (http://oss.software.ibm.com/icu/userguide/normalization.html). - Unicode::Normalize, a perl module (I guess much more efficient than charlint) at http://homepage1.nifty.com/nomenclator/perl/Unicode-Normalize.html Regards, Martin.
Received on Tuesday, 12 March 2002 03:42:11 UTC