Re: 2001-09-07#6: ns qualified parseType values from Graham Klyne on 2001-09-19 (w3c-rdfcore-wg@w3.org from September 2001)

From: Graham Klyne <Graham.Klyne@MIMEsweeper.com>
Date: Wed, 19 Sep 2001 09:41:34 +0100
To: "dehora" <dehora@eircom.net>
Cc: "W3C Rdfcore" <w3c-rdfcore-wg@w3.org>
Message-Id: <5.1.0.14.2.20010919091311.038ec6a0@joy.songbird.com>
Bill,

I've studied this in a little more detail than for my first 
response.  While I rather like the idea of using qualified names (hence 
URI-refs) to identify parseType values, I am concerned that there are 
aspects of your proposal that are unclear or possibly over-complicated.

I think the key points in your proposal are:

(1) presence of a rdf:parseType attribute, whatever its value, means the 
literal value must be well-formed XML.  (Otherwise, the literal may be an 
arbitrary Unicode string.)

(2) rdf:parseType values "Literal" and "Resource" stand with their current 
meanings.  Values rdf:Literal and rdf:Resource (where 'rdf:' is prefix for 
the namespace of RDF syntax) are equivalent values.

** 1st complication:  some old parsers may misinterpret rdf:Resource and 
Literal;  I think that results in loss of functionality rather than 
complete failure of interoperability, but it's still messy.

(3) a new value, rdf:Canonical is proposed for canonical XML

** 2nd complication:  what does this effect?  Does it mean that the 
contained literal must be canonical XML, or the value emitted by the parser 
is to be canonicalized.  I think you mean the latter, but this raises the 
point that the rdf:parseType value may sometimes modify the allowed form of 
the XML literal ("Resource"), and sometimes may modify the way it is 
handled by the parser.  My intuition is that this is an overloading that 
may give rise to confusion, or worse.

(4) other rdf:parseType options may be introduced using namespace qualified 
values.

...

An alternative to (2) might be to say simply that the specification of 
unqualified values is reserved to the RDF specification, including its 
future revisions and versions.  Thus, "Literal" and "Resource" stand, and 
the alternative forms "rdf:Literal", "rdf:Resource" are not invoked.  Other 
values may be defined in future, but only by RDF specs.  Then, point (4) is 
established as *the* way to introduce separately-defined rdf:parseType values.

I find point (3) is more difficult.  Why do we want to canonicalize XML?  I 
think the answer is to define a form of equivalence that can be tested by 
string comparison.  I'd rather target the equivalence issue directly, and I 
think this is an issue separate from rdf:parseType since it also arises 
with non-XML literals (e.g. dates, numbers, etc.).  Anyway, 
canonicalization doesn't completely solve the problem in the case of 
rdf:parseType="Resource", where the logical definition of equivalence would 
be two literals yielding the same RDF graph.

In summary, I suggest:
(a) retain "Literal", "Resource" as is, and reserve unqualified values to 
present and future RDF specs.
(b) indicate qualified names as the way for other specs to introduce 
alternative rdf:parseType values (noting that they musty always be 
well-formed XML).
(c) drop all reference to canonicalization, and address the 
literal-equivalence question separately, or later.

#g
--

At 05:19 AM 9/14/01 +0100, dehora wrote:
>"Prepare a proposal for namespace qualification values of rdf:parseType
>attributes upon which the group can make a decision."
>
>Forces:
>
>parseType="Literal" is, in the of words the M&S, "a minimum-level
>solution to the requirement to express an RDF statement with a value
>that has XML markup. Additional complexities of XML such as
>canonicalization of whitespace are not yet well defined. Future work of
>the W3C is expected to resolve such issues in a uniform manner for all
>applications based on XML."
>
>Ongoing proposals in the wg may lead to the interpretation that the
>fundamental representation of a literal is a Unicode string with
>optional language tag upon which serializations might have to impose
>syntactic constraints. This is not catered for in the current M&S XML
>syntax.
>
>There is interest in treating XML literal element content as something
>other than well formed XML (such as infoset). There may be interest in
>the future to  apply well known data types over literals.
>
>Any future RDF recommendation should recognise practice, notably by the
>DAML effort, which is using parseType as extensibility mechanism beyond
>that mandated by the current M&S. Further the wg should indicate an
>appropriate method for people wishing to extend parseType.
>
>Clearly, changes to the existing interpretation of the parseType
>attribute value Literal will be backwards incompatible with current
>processors and RDF-XML data modelling assumptions. Therefore, the
>interpretations and reservation of both the attribute values Literal and
>Resource must stand.
>
>The following is the key text in M&S re RDF-XML parseType:
>
>"(P203) The parseType attribute changes the interpretation of the
>element content. The parseType attribute should have one of the values
>'Literal' or' Resource'. The value is case-sensitive. The value
>'Literal' specifies that the element content is to be treated as an
>RDF/XML literal; that is, the content must not be interpreted by an RDF
>processor. The value 'Resource' specifies that the element content must
>be treated as if it were the content of a Description element. Other
>values of parseType are reserved for future specification by RDF. With
>RDF 1.0 other values must be treated as identical to 'Literal'. In all
>cases, the content of an element having a parseType attribute must be
>well-formed XML. The content of an element having a parseType="Resource"
>attribute must further match the production for the content of a
>Description element.
>
>The RDF Model and Syntax Working Group acknowledges that the
>parseType='Literal' mechanism is a minimum-level solution to the
>requirement to express an RDF statement with a value that has XML
>markup. Additional complexities of XML such as canonicalization of
>whitespace are not yet well defined. Future work of the W3C is expected
>to resolve such issues in a uniform manner for all applications based on
>XML. Future versions of RDF will inherit this work and may extend it as
>we gain insight from further application experience."
>
>
>Proposed revised and extended wording. Please note that (p4) is largely
>illustrative at this point and could reasonably be excised with the
>other paragraphs remain in place:
>
>(p1) The parseType attribute changes the interpretation of the element
>content.
>
>(p2) It is recognized that parseType is useful as an extensibility
>mechanism. The preferred technique to extend parseType is through the
>use of qualified names, as discussed in the XML Namespaces
>recommendation. The purpose of using namespaces to denote parseType
>values is to allow extensions to be associated with a vocabulary or
>schema. Note: the XML namespaces notion of default namespaces shall not
>apply to parseType values, in precisely the same sense that default
>namespaces does not apply to attributes. Unqualified values of parseType
>must not considered to be in any namespace.
>
>(p3) The non-namespaced values specified are 'Literal' and 'Resource'.
>The values are case-sensitive. The value 'Literal' specifies that the
>element content is to be treated as an RDF literal. The element content
>is considered opaque: that is, it must not be interpreted or passed on
>by an RDF processor as RDF. The element content however must be well
>formed XML and there are syntactic constraints imposed by this
>serialization [see fixme]. The value 'Resource' specifies that the
>element content must be treated as if it were the content of a
>Description element. The content of an element having a
>parseType="Resource" attribute must further match the production for the
>content of a Description.
>
>(p4) The local part values specified are 'literal', 'resource' and
>'canonical'. These three are bound to the RDF namespace [fixme]. By
>convention, the prefix 'rdf' is used as the namespace qualifier,
>although any prefix can be used. Future editions of this document may
>add new local parts as deemed appropriate. The local part 'literal' has
>the same interpretation as 'Literal'. The local part 'resource' has the
>same interpretation as 'Resource'. The local part 'canonical' specifies
>that the literal should be treated as canonical XML [see fixme];  fixme:
>more+markup examples.
>
>(p5) An RDF-XML processor encountering an unrecognised parseType value
>must continue to behave as if that value was 'Literal'. Note: how XML
>attributes which affect the interpretation of literals are passed along
>is not specified in this document (xml:lang is another case in point).
>Nonetheless processors which find unrecognised parseType values should
>pass on the found parseType value rather than the default case
>'Literal', where they are capable of doing so.
>
>Bill de hÓra

------------------------------------------------------------
Graham Klyne                    MIMEsweeper Group
Strategic Research              <http://www.mimesweeper.com>
<Graham.Klyne@MIMEsweeper.com>
------------------------------------------------------------
Received on Wednesday, 19 September 2001 06:01:23 UTC