RE: forbiddenCharacters data category - related to [ACTIOn-189] from Yves Savourel on 2012-08-27 (public-multilingualweb-lt@w3.org from August 2012)

From: Yves Savourel <ysavourel@enlaso.com>
Date: Mon, 27 Aug 2012 05:52:50 -0600
To: "'Felix Sasaki'" <felix.sasaki@dfki.de>
CC: <public-multilingualweb-lt@w3.org>
Message-ID: <assp.058616c3ac.assp.05864b2f4a.004a01cd844a$7c4c8b90$74e5a2b0$@com>

Hi Felix,

> - the escaping mechanism with \uHHHH would need to be 
> converted to numeric character references &#xHHHH;

Using &#xHHHH; for all would be fine as it would work with all engines.
I was using it to work around the ever-problematic issue of XML invalid characters. 


> - <> need to be converted to &lt;&gt; 

Do you mean that in the XML source I need to have forbiddenCharacters="&lt;&gt;" or forbiddenCharacters="&amp;lt;&amp;gt;" ?

I didn't see anything special about < and > in the XML regex (besides that < literal must be &lt; when in an XML file, but that 


> - Both \u0000 and \u001F are forbidden characters in XML.

U+0000 and U+001F are, not in a \uHHHH notation. That's why I wasn't using &#xHHHH;
But I see your point.

My question then is: how do you work with such character and XML regex?
If you can't then that's one more reason to avoid using XML regex.


> We should either drop the regex at all, use XML Schema 
> regex (I say your counter arguments, so this is probably no option)
> or define a clear specification about what to do when one 
> uses XML Schema regex, e.g. have a pointer to characters that are 
> disallowed in XML and XML Schema regex anyway. 

It seems to me that the third option would be the way. I'll try to post something asap.

Thanks for the feedback.
-yves

Received on Monday, 27 August 2012 11:53:23 UTC