- From: Graham Klyne <gk@ninebynine.org>
- Date: Fri, 23 May 2003 11:19:50 +0100
- To: <Patrick.Stickler@nokia.com>, <duerst@w3.org>, <jjc@hplb.hpl.hp.com>
- Cc: <w3c-rdfcore-wg@w3.org>
At 10:47 23/05/03 +0300, Patrick.Stickler@nokia.com wrote: >I.e. > > <foo><span xml:lang='en'>blargh</span></foo> > >should then equal and/or entail both > > <foo><span xml:lang="en">blargh</span></foo> > >and > > <foo rdf:parseType="Literal"> > <span xml:lang="en">blargh</span> > </foo> > >Note the difference in single and double quotes around 'en', >which are subject to canonicalization. [Martin, I'd be interested to hear if this is close to what you were suggesting. In what follows it's effectively flushed out with three test cases.] Well, I didn't see it quite that way, but since there's no right or wrong here all I can do is discuss possibilities. In order to analyze this, I think we need to consider the examples in two steps: (1) translation from XML to abstract syntax (graph) -- what an RDF/XML parser does (2) entailment between graphs. So, taking your three cases above, and making them complete RDF statements, I would anticipate: Test case 1: ------------ <Subj> <foo><span xml:lang='en'>blargh</span></foo> </Subj> --> <Subj> <foo> "<span xml:lang='en'>blargh</span>" or is it this?: <Subj> <foo> "<span xml:lang='en'>blargh</span>" . (I couldn't find a test case for this, but the amp-in-url/test001 [1][2] suggests to me the latter is correct. I'll assume the latter case for the remaining examples. I think the XML mapping to infoset replaces the character entities.) [1] http://www.w3.org/2000/10/rdf-tests/rdfcore/amp-in-url/test001.rdf [2] http://www.w3.org/2000/10/rdf-tests/rdfcore/amp-in-url/test001.nt Test case 2: ------------ <Subj> <foo><span xml:lang="en">blargh</span></foo> </Subj> --> <Subj> <foo> "<span xml:lang=\"en\">blargh</span>" . (using \ to escape quote-in-string -- sorry can't remember correct form) I see this as containing a different character sequence, and does not entail or is not entailed by the first case. Test case 3: ------------ <Subj> <foo rdf:parseType="Literal"> <span xml:lang="en">blargh</span> </foo> </Subj> --> <Subj> <foo> "<span xml:lang='en'>blargh</span>" . Because this is parseType="Literal", the literal content is not canonicalized by the parser, so we end up with a statement that is entailed by and entails that in test case 2, but not test case 1. --------------- So my take on Martin's suggestion is that all (plain) literals are character sequences, some of which might just happen to be valid XML fragments, and are compared accordingly. When parseType="Literal" is used, the C14N is applied by the parser, so that equivalent XML thus tagged resolves to the same literal value. Entailment, then, is based on plain literals simply denoting themselves, without any regard for whether or not they were obtained by C14N of XML. This, to me, seems like a useful simplification of what we currently have, and I'm not aware of any practical application scenario where the small differences in entailments thus achieved are likely to be damaging. #g ------------------- Graham Klyne <GK@NineByNine.org> PGP: 0FAA 69FF C083 000B A2E9 A131 01B9 1C7A DBCA CB5E
Received on Friday, 23 May 2003 07:06:08 UTC