Re: DAML ObjectProp vs DatatypeProp from Jonathan Borden on 2001-05-16 (www-rdf-logic@w3.org from May 2001)

From: Jonathan Borden <jborden@mediaone.net>
Date: Wed, 16 May 2001 11:22:25 -0400
To: "Drew McDermott" <drew.mcdermott@yale.edu>, <www-rdf-logic@w3.org>
Message-ID: <11eb01c0de1c$03243910$0a2e249b@nemc.org>
Drew McDermott wrote:

> ...The problem is that DAML has inherited from the SGML/XML
> tradition this vagueness about exactly what the leaves of the tree are
> in a marked-up document.

err... http://www.w3.org/TR/REC-xml provides a set of 89 EBNF productions
that precisely define the XML abstract syntax on top of a UNICODE character
stream.

SGML has had a tradition of precise specification regarding every aspect of
the trees it describes (it's called Groves):
http://www.prescod.net/groves/shorttut/


There are two sorts of leaves:
>
>    Attributes:   <tag name="Smith"> .... </tag>
>
>    Elements with no markup inside:
>                  <name>Smith</name>

actually there are 89 productions, not all are leaves but there are more
than two sorts of leaves, perhaps these two are relevent:

[39]    element    ::=    EmptyElemTag
   | STag content ETag [WFC: Element Type Match]
    [VC: Element Valid]

and

[43]    content    ::=    CharData? ((element | Reference | CDSect | PI |
Comment) CharData?)* /* */

>
> My CS instincts tell me that I should be looking for some notion of a
> "literal" at this point.  I.e., in Java I can write
>
>      name = "Smith";
>
> and the compiler treats "Smith" as a literal string.  Each language
> defines a syntax for literals that makes it unambiguous what the
> literal denotes.  At least, I believe this to be the case.  Does
> anyone know of any exceptions?  For instance, in C-like languages
> 76 is the integer 76, and 0x76 is the integer 118 (because the "0x"
> makes the literal hexadecimal).
>
> Unfortunately, this is not how XML works.  First, there is the
> regrettable choice of quotes to surround all attribute values.

Shrug. SGML doesn't have this restriction. XML makes some choices to provide
a simplified syntax.

> It
> seems to imply that all attributes have string values, which they
> emphatically do not.

First, XML does specifically define its own set of types for attributes:

[54]    AttType    ::=    StringType | TokenizedType | EnumeratedType
[55]    StringType    ::=    'CDATA'
[56]    TokenizedType    ::=    'ID' [VC: ID]
    [VC: One ID per Element Type]
    [VC: ID Attribute Default]
   | 'IDREF' [VC: IDREF]
   | 'IDREFS' [VC: IDREF]
   | 'ENTITY' [VC: Entity Name]
   | 'ENTITIES' [VC: Entity Name]
   | 'NMTOKEN' [VC: Name Token]
   | 'NMTOKENS' [VC: Name Token]

but more to the point the XML Scheme datatypes specification provides for
'binary' attribute datatypes.

I can go on and quote large sections of the XML spec, but I suspect that at
the end of the day we will find that your issues are not with XML itself,
rather RDF?


>
> Anyway, can someone point me to the authoritative source on literal
> data in RDF/DAML?  If there isn't one, I would be inclined to
> recommend:
>
> a) That literals occur *only* as attribute values.  Text in elements
> is just too unconstrained.  The notion of "markup-free text" is rather
> wobbly (probably deprecated by the Authorities); it's not clear even
> how to handle whitespace.

Do you mean that this is an issue _in the absence of a DTD/Schema_? Because
mixed content is found widely deployed apps such as HTML, and is not a
candidate for deprecation.

> b) That there be a unambiguous syntax for literal data, so that one
> would *not* have to declare the intended datatype of every attribute
> value.  The convention that "Smith" sometimes refers to "Smith" and
> sometimes to Smith should be done away with.  If a string is intended,
> there should be a syntax for specifying strings, either '"Smith"',
> "'Smith'", "\"Smith\"", or ""Smith"".  (That last one is kind of
> cute.)

Do you mean XML Schema datatype patterns?

Jonathan Borden
The Open Healthcare Group
http://www.openhealth.org
Received on Wednesday, 16 May 2001 11:38:45 UTC