- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 04 Aug 2003 14:26:51 -0400
- To: w3c-i18n-ig@w3.org, w3c-rdfcore-wg@w3.org
Dear colleagues, The following discussion resulted privately after our teleconference. I proposed, and Brian agreed, that it be sent to the groups. Some purely administrative or otherwise confidential bits have been removed. Regards, Martin. >Date: Sun, 03 Aug 2003 13:57:17 -0400 >To: Brian McBride <bwm@hplb.hpl.hp.com> >From: Martin Duerst <duerst@w3.org> >Hello Brian, > >I think this is an interesting discussion, and I would like to bring >it back to the lists if you allow. I'm following up below, and >would forward it later with your permission. (without some >team-related or administrative bits) > >At 00:10 03/08/01 +0100, Brian McBride wrote: >>Martin Duerst wrote: >> >>[...] > >>>Thanks for writing them up! Actually, I realize that it was only >>>one. Do you think that covers our discussion? >> >>I think the one case may cover both points. But even the one needs >>further refining. > >I'm not sure what you mean with 'both points'. Can you list them >or point them out? > > >>>> - With respect to the issues of the denotation of XML literals and the >>>>unification of text types (plain literals and xml literals being >>>>disjoint), I18N can live with the current design. >>> >>>We have not made any formal decision on this (and maybe never >>>will have to), but: >>>- With respect to the issue of plain literals and XML literals >>> containing only text being disjoint, I think we can live with it >>> (and because this was the state at last call, and we didn't >>> comment on it at last call, we should actually live with it) >>>- The issue is predicated on the outcome of how language information >>> is associated with XML Literals. >> >>Outcome of what? Language has to be explitily expressed in the xml literal. > >If language has to be explicitly expressed within the XML literal, >as it does in the post-lastcall version of the spec, then this >makes it significantly more difficult to do comparison/unification >by applications, and as an overall result, we may not be satisfied >with how plain literals and XML literals can be compared/have >been unified. > >If we return to some state similar to the last-call spec, where >language was externally associated with XML literals, then we know >it is sufficiently easy to compare/unify plain literals and XML >literals, and so we think we can live without an objection. > >So as a summary, I see no reason for an objection on this specific >point, but the considerations in this point are contributing to our >objection on the issue of how language information is associated >with XML literals. > > >>> We can live with it if it is >>> easy for applications to connect plain literals and XML >>> literals without markup and with the same text (this is the >>> point just after the two marked as important in >>> http://www.w3.org/International/2003/07/rdf-literal-issues.html) >> >>hmm, the doc says it should be easy to compare plain literals and xml >>literals. It is - the current spec says they are always different. I >>guess that's not what you want, though. > >You guessed right. The equivalence we are interested in is the >one that would e.g. result from the character/markup semantics >that I described recently. > > >>>- With respect to the denotation of XML literals as sequences of >>> octets, I'm not satisfied. During the teleconf, I was under the >>> impression that this had been changed. I have learned in the >>> meantime that this has not been the case. I have submitted a >>> test case. I think that this is against the Character Model, >>> and collides with other XML Schema types. >> >>I think that you are right that the value space of rdf:XMLLiteral >>intersects with other xsd datatypes. I'm not very comfortable with that, >>but I haven't seen a case that its really broken. > >I think there was a lot of talk in the discussion we had a month >or so ago about XML Literal basically standing for complex content. >My guess is that every single member of the XML Schema WG would >think that saying that complex content is the same as some >octet sequences in hexBinary is broken. > >Turned around, would you think the RDF community would be happy >if the XML Schema WG suddenly decided that they wanted a datatype >for RDF, and (assuming something like cannonical RDF in XML was >available) they decided to make that a sequence of octets? > > >>Could you be more specific about the charmod problem. > >Charmod uses the concept that octets represent characters, >and characters in turn represent other things, such as >markup, text, integers,... Charmod mostly deals with the >lower part, octets representing characters, because that's >where all the character encoding issues come in. The XML >spec (XML 1.0) defines XML (the XML syntax) in terms of >characters. There are some clauses in the spec and an >appendix to describe the bootstrapping from octets, but >there is overall clearly first an octet->character decoding >step and then the rest of the parsing (which may include >some escaping decoding, which is independent of the >(octet->character) encoding because numeric character >references all use Unicode. > >In summary, there is a low-level representational layer >using octets, but it is carefully isolated to not interfere >with the rest of parsing/processing/representation... (which >hopefully at some point higher up includes something >like semantics). > >If we suddenly say that something higher up *IS* something >two layers down, we have a clear layer violation. > > >I have to admit that it is not completely clear how to read >this from the character model document. But this is more a >deficiency of the document as written, rather than of the >architecture and the abstraction as described. > >The most clear-cut sentence in the character model is: > >(http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-RefProcModel) >[S ] Specifications MUST be defined in terms of Unicode characters, not >bytes or glyphs. > >We have received some comments that this may not be appropriate >e.g. for SVG when defining glyphs, or for some compression specification >when defining the result of a compression step in terms of octets/bytes. >I think we also should re-think the description in terms of *process*; >you earlier told us that you don't have a process model, but you have >a representational/denotational model (and processing is implicit in >the RDF/XML syntax spec that describes how to construct a graph). >So we probably should reword the whole thing so that it becomes >clearer that it applies both to a process viewpoint as well as >a more static, representational viewpoint. > > >>> I have no idea whether there would be any procedural points that >>> would allow us, or not allow us, to raise this as a formal >>> objection. >> >>RDFCore have changed the definition of the value space of XMLLiteral, but >>not that part of it - it was canonical XML at last call. Grey area. > >Thanks for your take on this. From some recent mails on rdfcore, >it seems that this issue may be moving in the right direction, >which would be great. >(http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Aug/0011.html) > > >Regards, Martin.
Received on Monday, 4 August 2003 14:27:41 UTC