- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 01 Jul 2003 12:57:01 -0400
- To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Cc: Graham Klyne <gk@ninebynine.org>, Dan Connolly <connolly@w3.org>, w3c-i18n-ig@w3.org, "Ralph R. Swick" <swick@w3.org>, misha.wolf@reuters.com, Tim Berners-Lee <timbl@w3.org>, w3c-rdfcore-wg@w3.org, reagle@w3.org
At 20:46 03/06/30 +0100, Jeremy Carroll wrote: >Martin Duerst wrote: >... > >>2) To have the RDF parser handle the fact that for plain text strings, >> sometimes there may be an rdf:parseType="Literal", and sometimes not? > >... > >>In my view, the best solution is clearly 2). >> >>By the way, I was just trying to check to what extent the actual RDF >>Model and Syntax spec is expressing the fact that its authors (or at >>least one of them, Ralph) thought that rdf:parseType="Literal" without >>any actual markup is the same as a plain literal. >>Here is what I have found: >> 3. If E is an empty element (no content), v is the resource whose >> identifier is given by the resource attribute of E. If the content >> of E contains no XML markup or if parseType="Literal" is specified >> in the start tag of E then v is the content of E (a literal). >> Otherwise, >> the content of E must be another Description or container and v is the >> resource named by the (possibly implicit) ID or about of that >> Description >> or container. >>This does not make any distinction WHATSOEVER between >> <foo>literal text</foo> >>and >> <foo rdf:parseType="Literal">literal text</foo> >>Also, the definition of Literal does not distinguish between what's >>now called 'plain' and 'XML' literals: >>Literal >> The most primitive value type represented in RDF, typically a string of >> characters. The content of a literal is not interpreted by RDF itself >> and may contain additional XML markup. Literals are distinguished from >> Resources in that the RDF model does not permit literals to be the >> subject >> of a statement. >>If you have found evidence to the contrary, please tell me. > > >I agree with your reading of M&S (although I would defer to Brian or DaveB >on this one), Good. >unfortunately that was not found workable. Applications needed to know >whether the markup was an XML literal or not. In the absence of helpful >advice from M&S some RDF applications returned effectively an additional >bit of information indicating whether it was a parseType="Literal" or not. I'm not sure I understand this. It is clear that applications need to know whether markup originating in RDF/XML was part of an XML literal, or was part of other RDF (e.g. parseType='Resource' or so). But this seems self-evident and not at issue. Assuming that applications get plain literals and XML literals as native string datatypes, and assuming that the application doesn't want to escape '&' and '<' for plain literals, it is also clear that applications need to make a distinction in some cases. [The two assumptions above are both reasonable implementation choices, but they are not the only choices.] The distinction they need to make is whether something that looks like XML markup in a literal (when passed to the application as a text string) is actually XML markup, or is just a string that looks like XML. For example, applications need to be able, in RDF/XML, to distinguish between <foo rdf:parseType="Literal">Hello <em>World</em>!</foo> and <foo rdf:parseType="Literal">Hello <em>World</em>!</foo> (the later e.g. being used in an example explaining XML). But this does NOT imply that applications need to distinguish between <foo rdf:parseType="Literal">Hello <em>World</em>!</foo> and <foo>Hello <em>World</em>!</foo> So we can conclude tha the fact that some RDF applications (I assume this is more parsers or stores than actual applications) returned an additional bit is not wrong. That the RDF Core WG decided to model this additional bit by defining a new type is again not wrong. The problem is that rather than limiting the distinction to those cases where it was needed (actual markup vs. text that looks like markup), it was based on some syntactical detail of RDF/XML, namely the presence or absence of rdf:parseType="Literal", leading to unnecessary distinctions. >RDF Core was chartered to fix bugs in M&S and this was an area where there >were definitely bugs. I do not consider the fact that M&S describes <foo rdf:parseType="Literal">some text here</foo> and <foo>some text here</foo> as equivalent as a bug. >e.g. the mathml example in M&S requires mechanisms that are not even >hionted at, I can agree with saying that it requires quite some thought to come up with an implementation that does what M&S specifies. But the fact that we are both agreeing that the current proposal conflicts with M&S in several ways seems to be a clear indication that M&S wasn't as undefined as it might seem. >and we have not provided with clear, if somewhat difficult text, defering >to exc-c14n. I don't understand what you wanted to say here. Is it "and we have now provided clear, ..." or "and we are not provided with clear, ..."? >So in brief, M&S was broken, and we were required to fix it. I agree that M&S was not perfect. But I don't agree with fixing what wasn't broken in the first place. >... > > >>>The current phrasing in the editors draft defers to the term exclusive >>>canonical XML: >>>http://www.w3.org/TR/2002/REC-xml-exc-c14n-20020718/#def-exclusive-canoni >>>ca l-XML > >Martin: > >>Just before we forget it, at that place, 'exclusive canonicalization' >>is defined as follows: >>"The exclusive canonical form of a document subset is a physical >>representation >>of the XPath node-set, as an octet sequence, produced by the method described >>in this specification" >>While the 'physical representation' may have been important for the people >>working on digital signatures, it seems definitely the wrong thing for RDF. >>I hope this can be fixed. > > > >I agree its clunky - I don't believe it is cost effective to fix it. Stating that it is exclusive canonicalization, but in terms of characters, not necessarily UTF-8, should not be too difficult to fix (it could be done at CR). Referring to a specific octet representation in the day and age of the Semantic Web just doesn't seem right. >RDF Core should be defering to an XML group as to appropriate >representations of XML. I agree. XML 1.0 clearly says that XML documents are defined as sequences of characters, not octets. >We require that equality is well-defined. The only XML groups we found >when we determined the main outline of this design two years ago was the >c14n group. When they also penned exc-c14n it was clearly a better fit. I don't disagree that exc-c14n is overall a good fit for your purposes. But that does not mean that you have to throw out the language information. >>What is much more important, if using exclusive canonical XML means that >>the xml:lang context of the XML literal in the RDF document is ignored, >>then that's totally wrong. > > >If that's totally wrong, then why is it not wrong for SOAP, or other >applications of exc-c14n? exc-c14n clearly says under what conditions it can be used, so it is an issue for the user to choose it or not depending on his/her needs. Rather than saying "there is this exc-c14n, that seems about right, so we are going to ignore xml:lang", it should be "we need to preserve xml:lang, in a similar way as we do with plain literals, so let's see how we can use exc-c14n the right way". Using exc-c14n with an additional wrapper element would be one easy solution. As for SOAP, I have not found any reference to exc-c14n in any of the three SOAP 1.2 Recommendation documents just recently published. Please tell me if I'm overlooking something. I seem to remember from memory that the question was discussed whether in SOAP, elements such as Envelope and Body should allow xml:lang, and that it was decided that it was okay for these elements to not allow xml:lang because the elements themselves did not contain any real text, language information on that level could not be canceled (remember that this was some time ago, when the solution xml:lang="" was not yet agreed upon) and the structure of header and body was course enough, and closer to the actual application, to require the necessary language info to go there. This is quite different from RDF, where we have very small granulation and an already well established (and used for plain literals) language inheritance. >This seems to be a comment about exc-c14n rather than RDF. > >>It: >>- has never been accepted by the I18N WG (RDF Core agreed with that) > >agreed > >>- is against the XML 1.0 Recommendation > >in as much as exc-c14n is. see above. >>- is against the RDF Model and Syntax Recommendation > >M&S is somewhat vague, but I would concede this point. M&S is somewhat vague in that it allows applications to consider or ignore xml:lang. But it didn't say anything about pick-and-choose. >>- is against the recent RDF last calls > >yes. > >>- is the opposite of what happens with plain literals, and therefore >> highly confusing for users. > >depends on the application. >I would suspect this is true for XHTML based XML literals, which I would >view as the main application. >See below about confusion. > >>To make sure xml:lang is not thrown away for XML literals, there is >>no need to change exclusive canonical XML. > >We lose xml:lang by using exc-c14n out of the box ... viz: >[[ >attributes in the XML namespace, such as xml:lang and xml:space are not >imported into orphan nodes of the document subset >]] > >Because of this, in the LC docs we had a complicated and confusing >work-around that involved putting the xml-literal inside an <rdf-wrapper> >tag, whose sole purpose was to hold the xml:lang attribute. It is >certainly less confusing to have ditched all of that. If the only purpose of the wrapper was to hold the xml:lang tag, then I think a solution similar to the one for plain literals should also work for XML literals. >>As for plain literals, >>xml:lang can be carried separately. > >This is current behaviour. Yes, I know. Is there any reason that the same solution cannot be used for XML literals, if it turns out that <rdf-wrapper> is too clumsy? Regards, Martin.
Received on Tuesday, 1 July 2003 14:36:23 UTC