Re: NFC and RDF Security consideration from Martin Duerst on 2002-03-12 (w3c-rdfcore-wg@w3.org from March 2002)

From: Martin Duerst <duerst@w3.org>
Date: Tue, 12 Mar 2002 17:16:08 +0900
To: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>, <w3c-rdfcore-wg@w3.org>, <w3c-i18n-ig@w3.org>
Message-Id: <4.2.0.58.J.20020312170341.030fbe40@localhost>

Hello Jeremy,

Many thanks for your examples, which show the cases very well.

At 11:23 02/03/04 +0000, Jeremy Carroll wrote:


>I wish to make the NFC issue more real to the wg by constructing a 
>realistic fraudulent piece of RDF which could be used as part of a scam.

>error003.rdf in the zip shows how a fraudulent promisary note could be 
>constructed.
>
>error001a.rdf and error002a.rdf are identical as RDF to error001 and 
>error002 but use XML character references so that the XML is in NFC, and 
>the examples written in this way no longer appear fraudulent, just odd.

They don't appear fraudulent if you look at the XML source.
But if you look at them with a tool, they will still look so,
I guess.


>error001b.rdf is in NFC but uses XML comments to separate the two parts of 
>the non normalized characters. The current version of charmod makes the 
>treatment of files like this as language dependent:
><http://www.w3.org/TR/2002/WD-charmod-20020220#sec-fully-normalized>http:// 
>www.w3.org/TR/2002/WD-charmod-20020220<http://www.w3.org/TR/2002/WD-charmod 
>-20020220#sec-fully-normalized>#sec-fully-normalized
>[[[
>Formal languages define constructs, which are identifiable pieces occuring 
>in instances of the language such as comments, element tags, processing 
>instructions, runs of character data, etc. Which of those constructs need 
>to be constrained not to begin with a composing character is 
>language-dependent and depends on what processing the language undergoes.
>]]]
>
>I18N-IG, if we were to make non normalized literals ill-formed RDF how 
>easy is it to check that a string is in NFC? Is there public domain code 
>we can use? Pointers please.

There is public-domain code:

- Charlint (http://www.w3.org/International/charlint/), in perl
   and written more for clarity (I hope) than efficiency, in particular
   because it reads in the whole Unicode data file before doing anything.

- http://www.unicode.org/unicode/reports/tr15/Normalizer.html, a small
   demo working on a subset of base and combining characters.

- ICU (http://oss.software.ibm.com/icu/userguide/normalization.html).

- Unicode::Normalize, a perl module (I guess much more efficient than
   charlint) at
   http://homepage1.nifty.com/nomenclator/perl/Unicode-Normalize.html


Regards,    Martin.

Received on Tuesday, 12 March 2002 03:42:11 UTC