Re: first pass parseType="Literal" text for primer

At 10:03 30/07/03 -0400, Martin Duerst wrote:

>>If I may lapse into Haskell [2] for a moment, our current design has the 
>>denotation of a literal being:
>>
>>    type XMLLiteralDenotation = [Octet]
>>
>>What you are doing here is changing the type of the denotation to 
>>something more like:
>>
>>    data XMLAtom = Character Char | Markup String
>>    type XMLLiteralDenotation = [XMLAtom]
>>
>>(The data statement defines a new datatype that is a kind of 
>>discriminated union:  a string labelled as "Character" or a string 
>>labelled as "Markup". See [2] for more about the notation)
>
>That seems to be what I wanted, as far as I understand.
>
>
>>I think there is a possible approach here that satisfies your goals, but 
>>it represents a fundamental redesign of the way that literals are handled 
>>in the formal semantics:  plain literals are no longer self denoting.
>>
>>Using Haskell type notation again, a plain literal is:
>>
>>    type PlainLiteral = [Char]
>>    type PlainLiteralDenotation = [Char]
>>
>>and the mapping function is:
>>
>>    PlainLiteralL2V :: PlainLiteral -> PlainLiteralDenotation
>>    PlainLiteralL2V = id   -- identity function
>>
>>Your proposal changes this quite radically:
>>
>>    type PlainLiteral = [Char]
>>    type PlainLiteralDenotation = [XMLAtom] -- XMLAtom as above
>
>Well, that could as well stay as [Char], because the only
>'XMLAtom's allowed in PlainLiteralDenotation are characters.

But, in any way that I can see, as soon as you want to seamlessly integrate 
XML literals with plain literals, then you have to account for the fact 
that XML literals can also contain Markup values.  That is the problem I 
cannot see a way around.

>Programming languages are not very good at handling such
>kinds of restrictions (the restriction of a sequence of
>an union of characters and markup-pieces to only characters
>is a sequence of characters) because that's not how
>implementations work. Haskell is most probably not as tightly
>tied to implementations as C or C++, but still there must
>have been such considerations, or such a case was just
>considered too rare to deal with.
>
>In a purely denotational framework, I don't see any problem
>with this, because obviously a sequence of characters is a
>sequence of characters, whether this is implicit (as in the
>case of plain literals) or explicit (as in the case of XML
>literals that don't contain any markup).

My choice of Haskell here was precisely because I have found it to be as 
good as any other notation for dealing with matters like this, and doesn't 
allow one to get away with any handwaving.  Haskell *is* purely 
denotational, as I understand that term.

>In other words, the notation
>     sequence(character('<'), character('b'), character('r'),
>          character('/'), character('>'))
>is just a notation to make it clear how XML Literals and
>plain literals relate, not a notation to indicate an
>explicit type identifier.

I'm not sure how that's significantly different from what I presented.  I 
gave type expressions dfrom which I thought the corresponding values were 
obvious, where you have listed a typed expression.

In Haskell, a the above plain literal based on XMLAtom would be:

   [Character '<', Character 'b', Character 'r', Character '/', Character '>']

more easily written as:
   map Character ['<','b','r','/','>']

or just:
   map Character "<br/>"

>In still other words, in a denotational world, characters
>are characters, and markup is markup if we define that it
>is markup, in the same way as in the real world, apples
>are apples, and oranges are oranges, without the need
>for any of them to have an explicit flag "Hello, I'm an
>Orange". A box full of apples is a box full of apples,
>independently of whether some other boxes also contain
>oranges or not, and independently of whether this box
>is allowed to contain oranges or not.

This, I think, is where we differ.  Given some value, we *do* need some way 
to tell if it's character data or markup.  This is clear from the notation 
that you used:  why did you need the character(...) and Markup(...) indicators?

In the case of plain literals, we know that they consist of characters 
only, so no additional indication is needed, but when you can mix 
characters with other things then you need a way to tell them apart.

Your box of apples/oranges argument doen't hold, because there you have 
containers, not of denotations, but of the actual things.  If, however, you 
have a box of tokens that entitle the bearer to collect an apple or an 
orange, then it's important that there is a way to know which (either by 
coming from a box of just apple-tokens or orange-tokens, or by having 
additional information embedded in the token).

>>    PlainLiteralL2V :: PlainLiteral -> PlainLiteralDenotation
>>    PlainLiteralL2V = map Character  -- force character interpretation
>>
>>My earlier positing [1] on this topic was quite clear that the position I 
>>was stating was one of proceeding on the basis of no such fundamental 
>>change.  I think there's a real danger that if we make a fundamental 
>>design change this late in the process we'll inadvertently introduce some 
>>more damaging error.
>
>I agree that we shouldn't make such a fundamental change.
>But I don't think there is any need to make it.
>Denotations are clearly restricted by logic, but they
>should not be restricted by any particular programming
>language, even a very logical one.

But it is precisely the logic here (of the formal semantics of literals) 
that does not allow for the kind of seamless mixing that you request.  We 
could all fudge around with a programming language and make something work, 
but a big part of this groups work has been to come up with a sound logical 
framework that defines the appropriate outcomes independently of any 
programming language.

(My use of Haskell to illustrate this was not to make any point about 
implementation, but because it is (also) a logical notation for reasoning 
about such things.)

#g


-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E

Received on Wednesday, 30 July 2003 16:09:21 UTC