- From: dehora <dehora@eircom.net>
- Date: Wed, 19 Sep 2001 13:17:13 +0100
- To: "'Graham Klyne'" <Graham.Klyne@MIMEsweeper.com>
- Cc: "'W3C Rdfcore'" <w3c-rdfcore-wg@w3.org>
Hi Graham, > Graham: > > > ** 1st complication: some old parsers may misinterpret > rdf:Resource and > Literal; I think that results in loss of functionality rather than > complete failure of interoperability, but it's still messy. In the sense that they'll treat rdf:Literal as a single string? That's why I mentioned the convention of using 'rdf' as the namespace prefix. This the best effort approach taken by XML Schema, so we'd be no messier than they. On the other hand, we'll see most XML tools upgraded to see namespaces inside attribute values as a result of XML Schema's blessing of this idiom: indeed I wouldn't be surprised if the tools were ready before our documents get to recommendation status; DOM2 and SAX2 already expose this information if I recall correctly. The worst case is an old parser treating 'rdf:Resource' as 'Literal': I acknowledge that interop would fail in this case where receiving software has not been upgraded, but that would a software defect rather than a spec defect. Asking implementers to inspect parseType attribute value strings containing ':' and check against the qualified name for the RDF namespace is a simple enough task, certainly no harder that checking the attribute itself for namespace qualification, which we're already asking people to do. > > (3) a new value, rdf:Canonical is proposed for canonical XML As I mentioned, adding new parseTypes would be optional; yes it's a feature add. My p4 can be dropped and keep the spirit of the proposal. > > ** 2nd complication: what does this effect? Does it mean that the > contained literal must be canonical XML [...]? Yes, I mean this. RDF-XML writers that want to mark RDF-XML Literals as having a canonical parseType _must_ canonicize the Literal on serialization. It's the writers job to make sense. The meaning of parseType in my mind is a lexing/parsing clue not a transformation clue. > An alternative to (2) might be to say simply that the > specification of > unqualified values is reserved to the RDF specification, > including its > future revisions and versions. Thus, "Literal" and > "Resource" stand, and > the alternative forms "rdf:Literal", "rdf:Resource" are not > invoked. Other > values may be defined in future, but only by RDF specs. I could live with this, Jeremy has also suggested as much. But it seems odd that we don't eat our own food. > Then, point (4) is > established as *the* way to introduce separately-defined > rdf:parseType values. I wish I could be as succinct :) > I find point (3) is more difficult. Why do we want to > canonicalize XML? I > think the answer is to define a form of equivalence that can > be tested by > string comparison. I'd rather target the equivalence issue > directly, and I > think this is an issue separate from rdf:parseType since it > also arises > with non-XML literals (e.g. dates, numbers, etc.). Anyway, > canonicalization doesn't completely solve the problem in the case of > rdf:parseType="Resource", where the logical definition of > equivalence would > be two literals yielding the same RDF graph. [I don't consider whatever is at the end of parseType="Resource" to be a literal with structure, it's expanded as RDF.] Literals will require a canonical form, or if one prefers, a shared abstraction of a literal, for two reasons I see: -equivalence operations as you point out. -serialization and roundtripping. I observe that well formed XML is free to be munged by XML processors, to the extent that equivalence and round tripping operations on XML-RDF literals can't be made consistently. So when the M&S says that literals are to be matched character by character, I see a serious operational constraint being imposed by the XML syntax at best, and quite probably a spec defect. It's perfectly ok for people to move well formed XML literals around; but they need to be told that their literals may get chewed up and that string matching and round tripping may not be viable operations among parties that don't have out of band agreements on RDF-XML literal processing. We can point them at the XML Infoset and Canonical XML documents if they haven't already gone running over there already, but they still have to generate their own Qnames for these XML formats; why not just serve them up? My approach has been to look at Unicode rather than data structures (though I'm aware that some people have made good arguments for literal as structure) for the following reasons: -RDF forbids making statements about literals (literals as structures might as well be made resources) -the M&S talks about literals as strings as well as XML (albeit not very clearly) -we have a charter to follow (insert handwaving exercise) -we have a bias to strings in the MT cr and I don't want to bother Pat with a rewrite ;) -Unicode is more general than well formed XML. -Unicode is more easily matched than well formed XML. Canonical XML falls naturally out of a Unicode baseline, hence the suggestion. With regard to stuff like dates and numbers, that's another issue again: datatyping. I observe that if we allow namespaced parseTypes values, people can simply slot in XML Schema simple types for ad hoc literal processing. You get XML Schema pretty much for free this way. > In summary, I suggest: > (a) retain "Literal", "Resource" as is, and reserve > unqualified values to > present and future RDF specs. Ok. > (b) indicate qualified names as the way for other specs to introduce > alternative rdf:parseType values (noting that they musty always be > well-formed XML). Ok. > (c) drop all reference to canonicalization, and address the > literal-equivalence question separately, or later. Ok. Do we have a new issue? [I'll follow this up with revised text] Bill
Received on Wednesday, 19 September 2001 08:18:35 UTC