W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > July 2003

Re: first pass parseType="Literal" text for primer

From: Graham Klyne <gk@ninebynine.org>
Date: Wed, 30 Jul 2003 09:33:54 +0100
Message-Id: <>
To: Martin Duerst <duerst@w3.org>, Brian McBride <bwm@hplb.hpl.hp.com>
Cc: rdf core <w3c-rdfcore-wg@w3.org>, i18n <w3c-i18n-ig@w3.org>

At 17:21 29/07/03 -0400, Martin Duerst wrote:
>>C(x) is cannonicalization of x, encoded as a UTF8 octet sequence, e.g.
>>C("&") is the octet sequence corresponding to "&amp;".
>This is dangerous, because you have missed one escaping level.
>What is C("&amp;")? If this works the same as C("<br/>"), it
>should be "&amp;" (in UTF-8 if we still keep that, but I think
>Patrick said that was removed).

I think that's fine.  To answer your question:

   C("&amp;") = UTF8("&amp;amp;")

(where UTF8(x) is the UTF-8 encoding of character sequence x)

Moving on two your examples, assuming we're talking about the current 
specifications, I disagree with:

>Concrete Syntax: <eg:prop pt="L">&lt;br/&gt;</eg:prop>
>(additional example)
>Abstract Syntax: "&lt;br/&gt;"^^rdf:XML
>     sequence(character('<'), character('b'), character('r'),
>          character('/'), character('>'))

(a) it's not a character sequence, but an octet sequence,
(b) in the canonical XML representation the '<' and '>' should be escaped.

>Concrete Syntax: <eg:prop pt="L"><br/></eg:prop>
>Abstract Syntax: "<br></br>"^^rdf:XML
>     sequence(markup('<br>'), markup('</br>'))

I don't understand what you mean by markup(x).  If you mean something like 
C(x), I would agree, so we would have:


... <breaks off>:

Now I see what you're doing, I think:  using character() and  markup() not 
as mapping functions but as type designators.

If I may lapse into Haskell [2] for a moment, our current design has the 
denotation of a literal being:

    type XMLLiteralDenotation = [Octet]

What you are doing here is changing the type of the denotation to something 
more like:

    data XMLAtom = Character Char | Markup String
    type XMLLiteralDenotation = [XMLAtom]

(The data statement defines a new datatype that is a kind of discriminated 
union:  a string labelled as "Character" or a string labelled as "Markup". 
See [2] for more about the notation)

I think there is a possible approach here that satisfies your goals, but it 
represents a fundamental redesign of the way that literals are handled in 
the formal semantics:  plain literals are no longer self denoting.

Using Haskell type notation again, a plain literal is:

    type PlainLiteral = [Char]
    type PlainLiteralDenotation = [Char]

and the mapping function is:

    PlainLiteralL2V :: PlainLiteral -> PlainLiteralDenotation
    PlainLiteralL2V = id   -- identity function

Your proposal changes this quite radically:

    type PlainLiteral = [Char]
    type PlainLiteralDenotation = [XMLAtom] -- XMLAtom as above

    PlainLiteralL2V :: PlainLiteral -> PlainLiteralDenotation
    PlainLiteralL2V = map Character  -- force character interpretation

My earlier positing [1] on this topic was quite clear that the position I 
was stating was one of proceeding on the basis of no such fundamental 
change.  I think there's a real danger that if we make a fundamental design 
change this late in the process we'll inadvertently introduce some more 
damaging error.


[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0339.html

The only way we're going to be able represent this kind of data, *and* to
handle markup in the same uniform framework, is to completely revisit the
design of RDF literal data so that a lexical form is not just a sequence of
Unicode characters, and is self-denoting.  To change that would be a
late-stage fundamental change to the design with who-knows-what kinds of

[2] http://www.haskell.org/tutorial/
     (sections 2.2 and 2.3 cover the type notation used above.)

Graham Klyne
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E
Received on Wednesday, 30 July 2003 05:27:48 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:24:24 UTC