W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > March 2002

Re: NFC and RDF Security consideration

From: Martin Duerst <duerst@w3.org>
Date: Tue, 12 Mar 2002 17:16:08 +0900
Message-Id: <>
To: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>, <w3c-rdfcore-wg@w3.org>, <w3c-i18n-ig@w3.org>
Hello Jeremy,

Many thanks for your examples, which show the cases very well.

At 11:23 02/03/04 +0000, Jeremy Carroll wrote:

>I wish to make the NFC issue more real to the wg by constructing a 
>realistic fraudulent piece of RDF which could be used as part of a scam.

>error003.rdf in the zip shows how a fraudulent promisary note could be 
>error001a.rdf and error002a.rdf are identical as RDF to error001 and 
>error002 but use XML character references so that the XML is in NFC, and 
>the examples written in this way no longer appear fraudulent, just odd.

They don't appear fraudulent if you look at the XML source.
But if you look at them with a tool, they will still look so,
I guess.

>error001b.rdf is in NFC but uses XML comments to separate the two parts of 
>the non normalized characters. The current version of charmod makes the 
>treatment of files like this as language dependent:
>Formal languages define constructs, which are identifiable pieces occuring 
>in instances of the language such as comments, element tags, processing 
>instructions, runs of character data, etc. Which of those constructs need 
>to be constrained not to begin with a composing character is 
>language-dependent and depends on what processing the language undergoes.
>I18N-IG, if we were to make non normalized literals ill-formed RDF how 
>easy is it to check that a string is in NFC? Is there public domain code 
>we can use? Pointers please.

There is public-domain code:

- Charlint (http://www.w3.org/International/charlint/), in perl
   and written more for clarity (I hope) than efficiency, in particular
   because it reads in the whole Unicode data file before doing anything.

- http://www.unicode.org/unicode/reports/tr15/Normalizer.html, a small
   demo working on a subset of base and combining characters.

- ICU (http://oss.software.ibm.com/icu/userguide/normalization.html).

- Unicode::Normalize, a perl module (I guess much more efficient than
   charlint) at

Regards,    Martin.
Received on Tuesday, 12 March 2002 03:42:11 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:53:56 UTC