- From: Graham Klyne <gk@ninebynine.org>
- Date: Wed, 30 Jul 2003 09:33:54 +0100
- To: Martin Duerst <duerst@w3.org>, Brian McBride <bwm@hplb.hpl.hp.com>
- Cc: rdf core <w3c-rdfcore-wg@w3.org>, i18n <w3c-i18n-ig@w3.org>
At 17:21 29/07/03 -0400, Martin Duerst wrote:
>>C(x) is cannonicalization of x, encoded as a UTF8 octet sequence, e.g.
>>C("&") is the octet sequence corresponding to "&".
>
>This is dangerous, because you have missed one escaping level.
>What is C("&")? If this works the same as C("<br/>"), it
>should be "&" (in UTF-8 if we still keep that, but I think
>Patrick said that was removed).
I think that's fine. To answer your question:
C("&") = UTF8("&amp;")
(where UTF8(x) is the UTF-8 encoding of character sequence x)
Moving on two your examples, assuming we're talking about the current
specifications, I disagree with:
>(3)
>Concrete Syntax: <eg:prop pt="L"><br/></eg:prop>
>(additional example)
>
>Abstract Syntax: "<br/>"^^rdf:XML
>
>Denotation:
> sequence(character('<'), character('b'), character('r'),
> character('/'), character('>'))
(a) it's not a character sequence, but an octet sequence,
(b) in the canonical XML representation the '<' and '>' should be escaped.
>(4)
>Concrete Syntax: <eg:prop pt="L"><br/></eg:prop>
>
>Abstract Syntax: "<br></br>"^^rdf:XML
>
>Denotation:
> sequence(markup('<br>'), markup('</br>'))
I don't understand what you mean by markup(x). If you mean something like
C(x), I would agree, so we would have:
UTF8("<br></br>")
... <breaks off>:
Now I see what you're doing, I think: using character() and markup() not
as mapping functions but as type designators.
If I may lapse into Haskell [2] for a moment, our current design has the
denotation of a literal being:
type XMLLiteralDenotation = [Octet]
What you are doing here is changing the type of the denotation to something
more like:
data XMLAtom = Character Char | Markup String
type XMLLiteralDenotation = [XMLAtom]
(The data statement defines a new datatype that is a kind of discriminated
union: a string labelled as "Character" or a string labelled as "Markup".
See [2] for more about the notation)
I think there is a possible approach here that satisfies your goals, but it
represents a fundamental redesign of the way that literals are handled in
the formal semantics: plain literals are no longer self denoting.
Using Haskell type notation again, a plain literal is:
type PlainLiteral = [Char]
type PlainLiteralDenotation = [Char]
and the mapping function is:
PlainLiteralL2V :: PlainLiteral -> PlainLiteralDenotation
PlainLiteralL2V = id -- identity function
Your proposal changes this quite radically:
type PlainLiteral = [Char]
type PlainLiteralDenotation = [XMLAtom] -- XMLAtom as above
PlainLiteralL2V :: PlainLiteral -> PlainLiteralDenotation
PlainLiteralL2V = map Character -- force character interpretation
My earlier positing [1] on this topic was quite clear that the position I
was stating was one of proceeding on the basis of no such fundamental
change. I think there's a real danger that if we make a fundamental design
change this late in the process we'll inadvertently introduce some more
damaging error.
#g
--
[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0339.html
[[
The only way we're going to be able represent this kind of data, *and* to
handle markup in the same uniform framework, is to completely revisit the
design of RDF literal data so that a lexical form is not just a sequence of
Unicode characters, and is self-denoting. To change that would be a
late-stage fundamental change to the design with who-knows-what kinds of
repercussion.
]]
[2] http://www.haskell.org/tutorial/
(sections 2.2 and 2.3 cover the type notation used above.)
-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9 A131 01B9 1C7A DBCA CB5E
Received on Wednesday, 30 July 2003 05:27:48 UTC