- From: Graham Klyne <gk@ninebynine.org>
- Date: Wed, 30 Jul 2003 09:33:54 +0100
- To: Martin Duerst <duerst@w3.org>, Brian McBride <bwm@hplb.hpl.hp.com>
- Cc: rdf core <w3c-rdfcore-wg@w3.org>, i18n <w3c-i18n-ig@w3.org>
At 17:21 29/07/03 -0400, Martin Duerst wrote:
>>C(x) is cannonicalization of x, encoded as a UTF8 octet sequence, e.g.
>>C("&") is the octet sequence corresponding to "&".
>
>This is dangerous, because you have missed one escaping level.
>What is C("&")? If this works the same as C("<br/>"), it
>should be "&" (in UTF-8 if we still keep that, but I think
>Patrick said that was removed).
I think that's fine.  To answer your question:
   C("&") = UTF8("&amp;")
(where UTF8(x) is the UTF-8 encoding of character sequence x)
Moving on two your examples, assuming we're talking about the current 
specifications, I disagree with:
>(3)
>Concrete Syntax: <eg:prop pt="L"><br/></eg:prop>
>(additional example)
>
>Abstract Syntax: "<br/>"^^rdf:XML
>
>Denotation:
>     sequence(character('<'), character('b'), character('r'),
>          character('/'), character('>'))
(a) it's not a character sequence, but an octet sequence,
(b) in the canonical XML representation the '<' and '>' should be escaped.
>(4)
>Concrete Syntax: <eg:prop pt="L"><br/></eg:prop>
>
>Abstract Syntax: "<br></br>"^^rdf:XML
>
>Denotation:
>     sequence(markup('<br>'), markup('</br>'))
I don't understand what you mean by markup(x).  If you mean something like 
C(x), I would agree, so we would have:
    UTF8("<br></br>")
... <breaks off>:
Now I see what you're doing, I think:  using character() and  markup() not 
as mapping functions but as type designators.
If I may lapse into Haskell [2] for a moment, our current design has the 
denotation of a literal being:
    type XMLLiteralDenotation = [Octet]
What you are doing here is changing the type of the denotation to something 
more like:
    data XMLAtom = Character Char | Markup String
    type XMLLiteralDenotation = [XMLAtom]
(The data statement defines a new datatype that is a kind of discriminated 
union:  a string labelled as "Character" or a string labelled as "Markup". 
See [2] for more about the notation)
I think there is a possible approach here that satisfies your goals, but it 
represents a fundamental redesign of the way that literals are handled in 
the formal semantics:  plain literals are no longer self denoting.
Using Haskell type notation again, a plain literal is:
    type PlainLiteral = [Char]
    type PlainLiteralDenotation = [Char]
and the mapping function is:
    PlainLiteralL2V :: PlainLiteral -> PlainLiteralDenotation
    PlainLiteralL2V = id   -- identity function
Your proposal changes this quite radically:
    type PlainLiteral = [Char]
    type PlainLiteralDenotation = [XMLAtom] -- XMLAtom as above
    PlainLiteralL2V :: PlainLiteral -> PlainLiteralDenotation
    PlainLiteralL2V = map Character  -- force character interpretation
My earlier positing [1] on this topic was quite clear that the position I 
was stating was one of proceeding on the basis of no such fundamental 
change.  I think there's a real danger that if we make a fundamental design 
change this late in the process we'll inadvertently introduce some more 
damaging error.
#g
--
[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0339.html
[[
The only way we're going to be able represent this kind of data, *and* to
handle markup in the same uniform framework, is to completely revisit the
design of RDF literal data so that a lexical form is not just a sequence of
Unicode characters, and is self-denoting.  To change that would be a
late-stage fundamental change to the design with who-knows-what kinds of
repercussion.
]]
[2] http://www.haskell.org/tutorial/
     (sections 2.2 and 2.3 cover the type notation used above.)
-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E
Received on Wednesday, 30 July 2003 05:27:48 UTC