RE: 2001-09-07#6: ns qualified parseType values

Hi Graham,

> Graham:
>
>
> ** 1st complication:  some old parsers may misinterpret 
> rdf:Resource and 
> Literal;  I think that results in loss of functionality rather than 
> complete failure of interoperability, but it's still messy.

In the sense that they'll treat rdf:Literal as a single string? 

That's why I mentioned the convention of using 'rdf' as the namespace
prefix. This the best effort approach taken by XML Schema, so we'd be no
messier than they. On the other hand, we'll see most XML tools upgraded
to see namespaces inside attribute values as a result of XML Schema's
blessing of this idiom: indeed I wouldn't be surprised if the tools were
ready before our documents get to recommendation status; DOM2 and SAX2
already expose this information if I recall correctly. The worst case is
an old parser treating 'rdf:Resource' as 'Literal': I acknowledge that
interop would fail in this case where receiving software has not been
upgraded, but that would a software defect rather than a spec defect.
Asking implementers to inspect parseType attribute value strings
containing ':' and check against the qualified name for the RDF
namespace is a simple enough task, certainly no harder that checking the
attribute itself for namespace qualification, which we're already asking
people to do.

> 
> (3) a new value, rdf:Canonical is proposed for canonical XML

As I mentioned, adding new parseTypes would be optional; yes it's a
feature add. My p4 can be dropped and keep the spirit of the proposal.

> 
> ** 2nd complication:  what does this effect?  Does it mean that the 
> contained literal must be canonical XML [...]?

Yes, I mean this. RDF-XML writers that want to mark RDF-XML Literals as
having a canonical parseType _must_ canonicize the Literal on
serialization. It's the writers job to make sense. The meaning of
parseType in my mind is a lexing/parsing clue not a transformation clue.



> An alternative to (2) might be to say simply that the 
> specification of 
> unqualified values is reserved to the RDF specification, 
> including its 
> future revisions and versions.  Thus, "Literal" and 
> "Resource" stand, and 
> the alternative forms "rdf:Literal", "rdf:Resource" are not 
> invoked.  Other 
> values may be defined in future, but only by RDF specs.  

I could live with this, Jeremy has also suggested as much. But it seems
odd that we don't eat our own food.


> Then, point (4) is 
> established as *the* way to introduce separately-defined 
> rdf:parseType values.

I wish I could be as succinct :) 


> I find point (3) is more difficult.  Why do we want to 
> canonicalize XML?  I 
> think the answer is to define a form of equivalence that can 
> be tested by 
> string comparison.  I'd rather target the equivalence issue 
> directly, and I 
> think this is an issue separate from rdf:parseType since it 
> also arises 
> with non-XML literals (e.g. dates, numbers, etc.).  Anyway, 
> canonicalization doesn't completely solve the problem in the case of 
> rdf:parseType="Resource", where the logical definition of 
> equivalence would 
> be two literals yielding the same RDF graph.

[I don't consider whatever is at the end of parseType="Resource" to be a
literal with structure, it's expanded as RDF.]

Literals will require a canonical form, or if one prefers, a shared
abstraction of a literal, for two reasons I see:

-equivalence operations as you point out.
-serialization and roundtripping.

I observe that well formed XML is free to be munged by XML processors,
to the extent that equivalence and round tripping operations on XML-RDF
literals can't be made consistently. So when the M&S says that literals
are to be matched character by character, I see a serious operational
constraint being imposed by the XML syntax at best, and quite probably a
spec defect. 

It's perfectly ok for people to move well formed XML literals around;
but they need to be told that their literals may get chewed up and that
string matching and round tripping may not be viable operations among
parties that don't have out of band agreements on RDF-XML literal
processing. We can point them at the XML Infoset and Canonical XML
documents if they haven't already gone running over there already, but
they still have to generate their own Qnames for these XML formats; why
not just serve them up?

My approach has been to look at Unicode rather than data structures
(though I'm aware that some people have made good arguments for literal
as structure) for the following reasons:

-RDF forbids making statements about literals (literals as structures
might as well be made resources)
-the M&S talks about literals as strings as well as XML (albeit not very
clearly)
-we have a charter to follow (insert handwaving exercise)
-we have a bias to strings in the MT cr and I don't want to bother Pat
with a rewrite ;) 
-Unicode is more general than well formed XML.
-Unicode is more easily matched than well formed XML.

Canonical XML falls naturally out of a Unicode baseline, hence the
suggestion.

With regard to stuff like dates and numbers, that's another issue again:
datatyping. I observe that if we allow namespaced parseTypes values,
people can simply slot in XML Schema simple types for ad hoc literal
processing. You get XML Schema pretty much for free this way.
 


> In summary, I suggest:
> (a) retain "Literal", "Resource" as is, and reserve 
> unqualified values to 
> present and future RDF specs.

Ok.


> (b) indicate qualified names as the way for other specs to introduce 
> alternative rdf:parseType values (noting that they musty always be 
> well-formed XML).

Ok.


> (c) drop all reference to canonicalization, and address the 
> literal-equivalence question separately, or later.

Ok. Do we have a new issue?

[I'll follow this up with revised text]

Bill

Received on Wednesday, 19 September 2001 08:18:35 UTC