W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > August 2012

RE: forbiddenCharacters data category - related to [ACTIOn-189]

From: Yves Savourel <ysavourel@enlaso.com>
Date: Mon, 27 Aug 2012 05:52:50 -0600
To: "'Felix Sasaki'" <felix.sasaki@dfki.de>
CC: <public-multilingualweb-lt@w3.org>
Message-ID: <assp.058616c3ac.assp.05864b2f4a.004a01cd844a$7c4c8b90$74e5a2b0$@com>
Hi Felix,

> - the escaping mechanism with \uHHHH would need to be 
> converted to numeric character references &#xHHHH;

Using &#xHHHH; for all would be fine as it would work with all engines.
I was using it to work around the ever-problematic issue of XML invalid characters. 

> - <> need to be converted to &lt;&gt; 

Do you mean that in the XML source I need to have forbiddenCharacters="&lt;&gt;" or forbiddenCharacters="&amp;lt;&amp;gt;" ?

I didn't see anything special about < and > in the XML regex (besides that < literal must be &lt; when in an XML file, but that 

> - Both \u0000 and \u001F are forbidden characters in XML.

U+0000 and U+001F are, not in a \uHHHH notation. That's why I wasn't using &#xHHHH;
But I see your point.

My question then is: how do you work with such character and XML regex?
If you can't then that's one more reason to avoid using XML regex.

> We should either drop the regex at all, use XML Schema 
> regex (I say your counter arguments, so this is probably no option)
> or define a clear specification about what to do when one 
> uses XML Schema regex, e.g. have a pointer to characters that are 
> disallowed in XML and XML Schema regex anyway. 

It seems to me that the third option would be the way. I'll try to post something asap.

Thanks for the feedback.
Received on Monday, 27 August 2012 11:53:23 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:31:51 UTC