Re: first pass parseType="Literal" text for primer from Martin Duerst on 2003-07-30 (w3c-rdfcore-wg@w3.org from July 2003)

From: Martin Duerst <duerst@w3.org>
Date: Wed, 30 Jul 2003 17:33:32 -0400
To: Graham Klyne <gk@ninebynine.org>, Brian McBride <bwm@hplb.hpl.hp.com>
Cc: rdf core <w3c-rdfcore-wg@w3.org>, i18n <w3c-i18n-ig@w3.org>
Message-Id: <4.2.0.58.J.20030730164703.0691c068@localhost>
Hello Graham,

At 21:02 03/07/30 +0100, Graham Klyne wrote:
>At 10:03 30/07/03 -0400, Martin Duerst wrote:
>
>>>If I may lapse into Haskell [2] for a moment, our current design has the 
>>>denotation of a literal being:
>>>
>>>    type XMLLiteralDenotation = [Octet]
>>>
>>>What you are doing here is changing the type of the denotation to 
>>>something more like:
>>>
>>>    data XMLAtom = Character Char | Markup String
>>>    type XMLLiteralDenotation = [XMLAtom]

>>>    type PlainLiteral = [Char]
>>>    type PlainLiteralDenotation = [XMLAtom] -- XMLAtom as above
>>
>>Well, that could as well stay as [Char], because the only
>>'XMLAtom's allowed in PlainLiteralDenotation are characters.
>
>But, in any way that I can see, as soon as you want to seamlessly 
>integrate XML literals with plain literals, then you have to account for 
>the fact that XML literals can also contain Markup values.  That is the 
>problem I cannot see a way around.

I just tried this in Haskell, and it indeed didn't work.
But let's just start very simply:

Do you agree with the following statement?
     The set of sequences of apples is a subset of the set of
     sequences that can contain both apples and oranges.

Now, do you agree with the following statement?
     The set of sequences of characters is a subset of the
     set of sequences that can contain both characters and
     'markup values' (where 'markup values' are things such
     as start tags, end tags, comments, PIs,...).

Both of these statements seem to be simple set theory.
If you can find a sequence that is in the later set, but not
the former set (for either of the examples), thus disproving
either (or both) of the subset relationships, I would be
very surprised. It is exactly this subset relationship that
I mean when I say 'seamlessly integrate'. If you mean
something else, please tell me.

The fact that Haskell does not handle subset relationships
of this type doesn't mean that they don't hold, I think.


>>In a purely denotational framework, I don't see any problem
>>with this, because obviously a sequence of characters is a
>>sequence of characters, whether this is implicit (as in the
>>case of plain literals) or explicit (as in the case of XML
>>literals that don't contain any markup).
>
>My choice of Haskell here was precisely because I have found it to be as 
>good as any other notation for dealing with matters like this, and doesn't 
>allow one to get away with any handwaving.

The subset relationships above don't involve handwaving,
or do they?


>Haskell *is* purely denotational, as I understand that term.

I admit that I don't understand that term very well. What
I meant was "in a world that we can just create the way
we want to (or need to), with the restrictions of logic,
but no implementation restrictions".



>This, I think, is where we differ.  Given some value, we *do* need some 
>way to tell if it's character data or markup.  This is clear from the 
>notation that you used:  why did you need the character(...) and 
>Markup(...) indicators?

Given an apple, we know it's an apple. Given an orange, we know
it's an orange. The reason I used character(...) and markup(...) was
because I choose a very explicit notation. Also, I didn't make the
effort of distinguishing characters and markup as such inside the
parentheses. But let's change the notation a bit (back to old examples):

(1) Concrete Syntax: <eg:prop>a</eg:prop>
     Abstract Syntax: "a"
     Denotation:      'a'

(2) Concrete Syntax: <eg:prop>&lt;br/&gt;</eg:prop>
     Abstract Syntax: "<br/>"
     Denotation:      '<', 'b', 'r', '/', '>'

(3) Concrete Syntax: <eg:prop pt="L">&lt;br/&gt;</eg:prop>
     Abstract Syntax: "&lt;br/&gt;"^^rdf:XML
     Denotation:      '<', 'b', 'r', '/', '>'

(4) Concrete Syntax: <eg:prop pt="L"><br/></eg:prop>
     Abstract Syntax: "<br></br>"^^rdf:XML
     Denotation:      <br>, </br>


(5) Concrete Syntax: <eg:prop pt="L">&amp;</eg:prop>
     Abstract Syntax: "&amp"^^rdf:XML
     Denotation:      '&'

(6) Concrete Syntax: <eg:prop>&amp;</eg:prop>
     Abstract Syntax: "&"
     Denotation:      '&'

(7) Concrete Syntax: <eg:prop pt="L"><em>Wow!</em></eg:prop>
     Abstract Syntax: "<em>Wow!</em>"^^rdf:XML
     Denotation:      <em>, 'W', 'o', 'w', '!', </em>

Characters are delimited by single quotes, markup by <>.
The later won't work in Haskell, and it may need some details
worked out (but if we use canonicalization, then all literal
">" characters, e.g. in attribute values, are escaped as
"&gt;", so we can pretty much just scan for the next actual ">",
and this may work for comments and PIs, too.
Now it's very obvious what's characters, and what's markup,
the same way it's obvious when you have an apple and when
you have an orange.


>In the case of plain literals, we know that they consist of characters 
>only, so no additional indication is needed, but when you can mix 
>characters with other things then you need a way to tell them apart.

Why? Why should I have to label apples as apples just because
I also have oranges?


>Your box of apples/oranges argument doen't hold, because there you have 
>containers, not of denotations, but of the actual things.

Let's say that the actual things are bananas and beer bottles,
but they denote apples and oranges. Does that make any difference?
If yes, how?


>If, however, you have a box of tokens that entitle the bearer to collect 
>an apple or an orange, then it's important that there is a way to know 
>which (either by coming from a box of just apple-tokens or orange-tokens, 
>or by having additional information embedded in the token).

I don't understand this analogy. First, I don't understand how
it relates to RDF and XML Literals. Second, I don't understand
how it would be necessary to have a way to know which. If I have
a token that entitles me  to collect an apple or an orange,
I may as well just pick my favorite one out of a mixed box.


>>Denotations are clearly restricted by logic, but they
>>should not be restricted by any particular programming
>>language, even a very logical one.
>
>But it is precisely the logic here (of the formal semantics of literals) 
>that does not allow for the kind of seamless mixing that you request.  We 
>could all fudge around with a programming language and make something 
>work, but a big part of this groups work has been to come up with a sound 
>logical framework that defines the appropriate outcomes independently of 
>any programming language.

Yes, of course. The recent change in the denotation of xsd:string
literals seems to indicate that we can handle subset relationships.
The set of denotations corresponding to the set of xsd:string
literals is a subset of the set of denotations corresponding to
the set of plain literals (due to the fact that there are plain
literals with language information, but no xsd:string literals with
language information). If we can handle subset relationships, then
there should not be such a big problem (or actually, no problem at
all) to handle the subset relationship I discussed above: The
set of denotations corresponding to the set of plain literals
is a subset of the set of denotations corresponding to the
set of XML Literals (due to the fact that XML literal denotations
can contain 'markup values', but plain literals can't).


>(My use of Haskell to illustrate this was not to make any point about 
>implementation, but because it is (also) a logical notation for reasoning 
>about such things.)

Yes, but Haskell doesn't seem to be able to 'reason' about the
subset relationship that I described, although I can see nothing
wrong with that subset relationship.


Regards,    Martin.
Received on Thursday, 31 July 2003 12:06:38 UTC