Some discussion after teleconference from Martin Duerst on 2003-08-04 (w3c-rdfcore-wg@w3.org from August 2003)

From: Martin Duerst <duerst@w3.org>
Date: Mon, 04 Aug 2003 14:26:51 -0400
To: w3c-i18n-ig@w3.org, w3c-rdfcore-wg@w3.org
Message-Id: <4.2.0.58.J.20030804142108.05043010@localhost>
Dear colleagues,

The following discussion resulted privately after our teleconference.
I proposed, and Brian agreed, that it be sent to the groups.
Some purely administrative or otherwise confidential bits have
been removed.

Regards,    Martin.


>Date: Sun, 03 Aug 2003 13:57:17 -0400
>To: Brian McBride <bwm@hplb.hpl.hp.com>
>From: Martin Duerst <duerst@w3.org>

>Hello Brian,
>
>I think this is an interesting discussion, and I would like to bring
>it back to the lists if you allow. I'm following up below, and
>would forward it later with your permission. (without some
>team-related or administrative bits)
>
>At 00:10 03/08/01 +0100, Brian McBride wrote:
>>Martin Duerst wrote:
>>
>>[...]
>
>>>Thanks for writing them up! Actually, I realize that it was only
>>>one. Do you think that covers our discussion?
>>
>>I think the one case may cover both points.  But even the one needs 
>>further refining.
>
>I'm not sure what you mean with 'both points'. Can you list them
>or point them out?
>
>
>>>>   - With respect to the issues of the denotation of XML literals and the
>>>>unification of text types (plain literals and xml literals being
>>>>disjoint), I18N can live with the current design.
>>>
>>>We have not made any formal decision on this (and maybe never
>>>will have to), but:
>>>- With respect to the issue of plain literals and XML literals
>>>   containing only text being disjoint, I think we can live with it
>>>   (and because this was the state at last call, and we didn't
>>>    comment on it at last call, we should actually live with it)
>>>- The issue is predicated on the outcome of how language information
>>>   is associated with XML Literals.
>>
>>Outcome of what?  Language has to be explitily expressed in the xml literal.
>
>If language has to be explicitly expressed within the XML literal,
>as it does in the post-lastcall version of the spec, then this
>makes it significantly more difficult to do comparison/unification
>by applications, and as an overall result, we may not be satisfied
>with how plain literals and XML literals can be compared/have
>been unified.
>
>If we return to some state similar to the last-call spec, where
>language was externally associated with XML literals, then we know
>it is sufficiently easy to compare/unify plain literals and XML
>literals, and so we think we can live without an objection.
>
>So as a summary, I see no reason for an objection on this specific
>point, but the considerations in this point are contributing to our
>objection on the issue of how language information is associated
>with XML literals.
>
>
>>>   We can live with it if it is
>>>   easy for applications to connect plain literals and XML
>>>   literals without markup and with the same text (this is the
>>>   point just after the two marked as important in
>>>   http://www.w3.org/International/2003/07/rdf-literal-issues.html)
>>
>>hmm, the doc says it should be easy to compare plain literals and xml 
>>literals.  It is - the current spec says they are always different.  I 
>>guess that's not what you want, though.
>
>You guessed right. The equivalence we are interested in is the
>one that would e.g. result from the character/markup semantics
>that I described recently.
>
>
>>>- With respect to the denotation of XML literals as sequences of
>>>   octets, I'm not satisfied. During the teleconf, I was under the
>>>   impression that this had been changed. I have learned in the
>>>   meantime that this has not been the case. I have submitted a
>>>   test case. I think that this is against the Character Model,
>>>   and collides with other XML Schema types.
>>
>>I think that you are right that the value space of rdf:XMLLiteral 
>>intersects with other xsd datatypes.  I'm not very comfortable with that, 
>>but I haven't seen a case that its really broken.
>
>I think there was a lot of talk in the discussion we had a month
>or so ago about XML Literal basically standing for complex content.
>My guess is that every single member of the XML Schema WG would
>think that saying that complex content is the same as some
>octet sequences in hexBinary is broken.
>
>Turned around, would you think the RDF community would be happy
>if the XML Schema WG suddenly decided that they wanted a datatype
>for RDF, and (assuming something like cannonical RDF in XML was
>available) they decided to make that a sequence of octets?
>
>
>>Could you be more specific about the charmod problem.
>
>Charmod uses the concept that octets represent characters,
>and characters in turn represent other things, such as
>markup, text, integers,... Charmod mostly deals with the
>lower part, octets representing characters, because that's
>where all the character encoding issues come in. The XML
>spec (XML 1.0) defines XML (the XML syntax) in terms of
>characters. There are some clauses in the spec and an
>appendix to describe the bootstrapping from octets, but
>there is overall clearly first an octet->character decoding
>step and then the rest of the parsing (which may include
>some escaping decoding, which is independent of the
>(octet->character) encoding because numeric character
>references all use Unicode.
>
>In summary, there is a low-level representational layer
>using octets, but it is carefully isolated to not interfere
>with the rest of parsing/processing/representation... (which
>hopefully at some point higher up includes something
>like semantics).
>
>If we suddenly say that something higher up *IS* something
>two layers down, we have a clear layer violation.
>
>
>I have to admit that it is not completely clear how to read
>this from the character model document. But this is more a
>deficiency of the document as written, rather than of the
>architecture and the abstraction as described.
>
>The most clear-cut sentence in the character model is:
>
>(http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-RefProcModel)
>[S ] Specifications MUST be defined in terms of Unicode characters, not 
>bytes or glyphs.
>
>We have received some comments that this may not be appropriate
>e.g. for SVG when defining glyphs, or for some compression specification
>when defining the result of a compression step in terms of octets/bytes.
>I think we also should re-think the description in terms of *process*;
>you earlier told us that you don't have a process model, but you have
>a representational/denotational model (and processing is implicit in
>the RDF/XML syntax spec that describes how to construct a graph).
>So we probably should reword the whole thing so that it becomes
>clearer that it applies both to a process viewpoint as well as
>a more static, representational viewpoint.
>
>
>>>   I have no idea whether there would be any procedural points that
>>>   would allow us, or not allow us, to raise this as a formal
>>>   objection.
>>
>>RDFCore have changed the definition of the value space of XMLLiteral, but 
>>not that part of it - it was canonical XML at last call.  Grey area.
>
>Thanks for your take on this. From some recent mails on rdfcore,
>it seems that this issue may be moving in the right direction,
>which would be great.
>(http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Aug/0011.html)
>
>
>Regards,    Martin.
Received on Monday, 4 August 2003 14:27:41 UTC