W3C home > Mailing lists > Public > public-rdf-wg@w3.org > August 2011

Re: RDF-ISSUE-75 (#x0): Valid plain literals containing #x0 are no longer valid in RDF 1.1

From: Richard Cyganiak <richard@cyganiak.de>
Date: Sun, 21 Aug 2011 15:06:04 +0100
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <305CD5D5-0B4C-4C71-8E9F-98FAB01B3EAD@cyganiak.de>
To: Ivan Herman <ivan@w3.org>
On 20 Aug 2011, at 05:39, Ivan Herman wrote:
> Do we know of any place whatsoever where #x0 was used?

I remember a D2RQ support question where a database contained #x0 and that caused an error when serializing it to RDF/XML. So yes, it happens.

> I would propose we flag this explicitly as an issue in the document asking for feedback, with the expectation that we will have this restriction in 1.1

Works for me.

Richard


> 
> 
> Ivan
> 
> On Aug 19, 2011, at 20:44 , RDF Working Group Issue Tracker wrote:
> 
>> 
>> RDF-ISSUE-75 (#x0): Valid plain literals containing #x0 are no longer valid in RDF 1.1
>> 
>> http://www.w3.org/2011/rdf-wg/track/issues/75
>> 
>> Raised by: Richard Cyganiak
>> On product: 
>> 
>> The lexical space of xsd:string doesn't cover all Unicode strings.
>> 
>> I assume we will end up referring to XSD 1.1 for the definition of xsd:string [1]. That document leaves it up to implementations whether they support the XML 1.0 or XML 1.1; accordingly, the definition of allowed characters in an xsd:string is [2] or [3].
>> 
>> The more permissive one from XML 1.1:
>> 
>>   Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
>> 
>> This excludes #x0, Unicode codepoint U+0000. XML 1.0 also excludes a number of other control codes in the #x0-#x1F range.
>> 
>> The definition of “lexical form” in RDF 2004 [4] says “Unicode string”, which according to [5] includes *all* codepoints including the control codes.
>> 
>> So, any string that includes #x0 was a valid untagged plain literal in RDF 2004. In RDF 1.1, it will be typed as an xsd:string, and thus will be an ill-typed literal.
>> 
>> (On the other hand, such strings could never be serialized in RDF/XML or XHTML+RDFa; they were serializable only in N-Triples and Turtle.)
>> 
>> Is this a problem? Can we go ahead with the new literal design despite this restriction? Should we acknowledge it in the RDF Concepts spec?
>> 
>> [1] http://www.w3.org/TR/2005/WD-xmlschema11-2-20050224/datatypes.html#string
>> [2] http://www.w3.org/TR/REC-xml/#dt-character
>> [3] http://www.w3.org/TR/xml11/#NT-Char
>> [4] http://www.w3.org/TR/rdf-concepts/#dfn-lexical-form
>> [5] http://www.unicode.org/versions/Unicode6.0.0/UnicodeStandard-6.0.pdf
>> 
>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
> 
> 
Received on Sunday, 21 August 2011 14:06:34 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:44 GMT