Re: first pass parseType="Literal" text for primer from Martin Duerst on 2003-07-30 (w3c-rdfcore-wg@w3.org from July 2003)

From: Martin Duerst <duerst@w3.org>
Date: Wed, 30 Jul 2003 10:03:56 -0400
To: Graham Klyne <gk@ninebynine.org>, Brian McBride <bwm@hplb.hpl.hp.com>
Cc: rdf core <w3c-rdfcore-wg@w3.org>, i18n <w3c-i18n-ig@w3.org>
Message-Id: <4.2.0.58.J.20030730092853.0697c3b8@localhost>
At 09:33 03/07/30 +0100, Graham Klyne wrote:

>At 17:21 29/07/03 -0400, Martin Duerst wrote:

>Moving on two your examples, assuming we're talking about the current 
>specifications,

I was not speaking about the current specifications.
I was trying to show how we can construct denotations
where denotations of some plain Literals and some
(appropriate!) XML Literals are the same. This is not
so in the current spec, so it would be impossible for
me to talk about the current spec.


>I disagree with:
>
>>(3)
>>Concrete Syntax: <eg:prop pt="L">&lt;br/&gt;</eg:prop>
>>(additional example)
>>
>>Abstract Syntax: "&lt;br/&gt;"^^rdf:XML
>>
>>Denotation:
>>     sequence(character('<'), character('b'), character('r'),
>>          character('/'), character('>'))
>
>(a) it's not a character sequence, but an octet sequence,

In the spec a week ago. Not in my proposal. Not, as far as
I understood from
http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0105.html.
But maybe I misinterpreted that, and XML Literals are still defined
as denoting octet sequences? That would be very unfortunate.


>(b) in the canonical XML representation the '<' and '>' should be escaped.

Yes, but if we explicitly say they are single characters,
there is no need anymore to unescape them.


>>(4)
>>Concrete Syntax: <eg:prop pt="L"><br/></eg:prop>
>>
>>Abstract Syntax: "<br></br>"^^rdf:XML
>>
>>Denotation:
>>     sequence(markup('<br>'), markup('</br>'))
>
>I don't understand what you mean by markup(x).  If you mean something like 
>C(x), I would agree, so we would have:
>
>    UTF8("<br></br>")
>
>... <breaks off>:
>
>Now I see what you're doing, I think:  using character() and  markup() not 
>as mapping functions but as type designators.

Yes, I guess that comes close.


>If I may lapse into Haskell [2] for a moment, our current design has the 
>denotation of a literal being:
>
>    type XMLLiteralDenotation = [Octet]
>
>What you are doing here is changing the type of the denotation to 
>something more like:
>
>    data XMLAtom = Character Char | Markup String
>    type XMLLiteralDenotation = [XMLAtom]
>
>(The data statement defines a new datatype that is a kind of discriminated 
>union:  a string labelled as "Character" or a string labelled as "Markup". 
>See [2] for more about the notation)

That seems to be what I wanted, as far as I understand.


>I think there is a possible approach here that satisfies your goals, but 
>it represents a fundamental redesign of the way that literals are handled 
>in the formal semantics:  plain literals are no longer self denoting.
>
>Using Haskell type notation again, a plain literal is:
>
>    type PlainLiteral = [Char]
>    type PlainLiteralDenotation = [Char]
>
>and the mapping function is:
>
>    PlainLiteralL2V :: PlainLiteral -> PlainLiteralDenotation
>    PlainLiteralL2V = id   -- identity function
>
>Your proposal changes this quite radically:
>
>    type PlainLiteral = [Char]
>    type PlainLiteralDenotation = [XMLAtom] -- XMLAtom as above

Well, that could as well stay as [Char], because the only
'XMLAtom's allowed in PlainLiteralDenotation are characters.

Programming languages are not very good at handling such
kinds of restrictions (the restriction of a sequence of
an union of characters and markup-pieces to only characters
is a sequence of characters) because that's not how
implementations work. Haskell is most probably not as tightly
tied to implementations as C or C++, but still there must
have been such considerations, or such a case was just
considered too rare to deal with.

In a purely denotational framework, I don't see any problem
with this, because obviously a sequence of characters is a
sequence of characters, whether this is implicit (as in the
case of plain literals) or explicit (as in the case of XML
literals that don't contain any markup).

In other words, the notation
     sequence(character('<'), character('b'), character('r'),
          character('/'), character('>'))
is just a notation to make it clear how XML Literals and
plain literals relate, not a notation to indicate an
explicit type identifier.

In still other words, in a denotational world, characters
are characters, and markup is markup if we define that it
is markup, in the same way as in the real world, apples
are apples, and oranges are oranges, without the need
for any of them to have an explicit flag "Hello, I'm an
Orange". A box full of apples is a box full of apples,
independently of whether some other boxes also contain
oranges or not, and independently of whether this box
is allowed to contain oranges or not.


>    PlainLiteralL2V :: PlainLiteral -> PlainLiteralDenotation
>    PlainLiteralL2V = map Character  -- force character interpretation
>
>My earlier positing [1] on this topic was quite clear that the position I 
>was stating was one of proceeding on the basis of no such fundamental 
>change.  I think there's a real danger that if we make a fundamental 
>design change this late in the process we'll inadvertently introduce some 
>more damaging error.

I agree that we shouldn't make such a fundamental change.
But I don't think there is any need to make it.
Denotations are clearly restricted by logic, but they
should not be restricted by any particular programming
language, even a very logical one.


Regards,    Martin.



>#g
>--
>
>[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0339.html
>
>[[
>The only way we're going to be able represent this kind of data, *and* to
>handle markup in the same uniform framework, is to completely revisit the
>design of RDF literal data so that a lexical form is not just a sequence of
>Unicode characters, and is self-denoting.  To change that would be a
>late-stage fundamental change to the design with who-knows-what kinds of
>repercussion.
>]]
>
>[2] http://www.haskell.org/tutorial/
>     (sections 2.2 and 2.3 cover the type notation used above.)
>
>
>-------------------
>Graham Klyne
><GK@NineByNine.org>
>PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E
Received on Wednesday, 30 July 2003 11:03:57 UTC