- From: Jonathan Borden <jborden@mediaone.net>
- Date: Wed, 16 May 2001 11:22:25 -0400
- To: "Drew McDermott" <drew.mcdermott@yale.edu>, <www-rdf-logic@w3.org>
Drew McDermott wrote:
> ...The problem is that DAML has inherited from the SGML/XML
> tradition this vagueness about exactly what the leaves of the tree are
> in a marked-up document.
err... http://www.w3.org/TR/REC-xml provides a set of 89 EBNF productions
that precisely define the XML abstract syntax on top of a UNICODE character
stream.
SGML has had a tradition of precise specification regarding every aspect of
the trees it describes (it's called Groves):
http://www.prescod.net/groves/shorttut/
There are two sorts of leaves:
>
> Attributes: <tag name="Smith"> .... </tag>
>
> Elements with no markup inside:
> <name>Smith</name>
actually there are 89 productions, not all are leaves but there are more
than two sorts of leaves, perhaps these two are relevent:
[39] element ::= EmptyElemTag
| STag content ETag [WFC: Element Type Match]
[VC: Element Valid]
and
[43] content ::= CharData? ((element | Reference | CDSect | PI |
Comment) CharData?)* /* */
>
> My CS instincts tell me that I should be looking for some notion of a
> "literal" at this point. I.e., in Java I can write
>
> name = "Smith";
>
> and the compiler treats "Smith" as a literal string. Each language
> defines a syntax for literals that makes it unambiguous what the
> literal denotes. At least, I believe this to be the case. Does
> anyone know of any exceptions? For instance, in C-like languages
> 76 is the integer 76, and 0x76 is the integer 118 (because the "0x"
> makes the literal hexadecimal).
>
> Unfortunately, this is not how XML works. First, there is the
> regrettable choice of quotes to surround all attribute values.
Shrug. SGML doesn't have this restriction. XML makes some choices to provide
a simplified syntax.
> It
> seems to imply that all attributes have string values, which they
> emphatically do not.
First, XML does specifically define its own set of types for attributes:
[54] AttType ::= StringType | TokenizedType | EnumeratedType
[55] StringType ::= 'CDATA'
[56] TokenizedType ::= 'ID' [VC: ID]
[VC: One ID per Element Type]
[VC: ID Attribute Default]
| 'IDREF' [VC: IDREF]
| 'IDREFS' [VC: IDREF]
| 'ENTITY' [VC: Entity Name]
| 'ENTITIES' [VC: Entity Name]
| 'NMTOKEN' [VC: Name Token]
| 'NMTOKENS' [VC: Name Token]
but more to the point the XML Scheme datatypes specification provides for
'binary' attribute datatypes.
I can go on and quote large sections of the XML spec, but I suspect that at
the end of the day we will find that your issues are not with XML itself,
rather RDF?
>
> Anyway, can someone point me to the authoritative source on literal
> data in RDF/DAML? If there isn't one, I would be inclined to
> recommend:
>
> a) That literals occur *only* as attribute values. Text in elements
> is just too unconstrained. The notion of "markup-free text" is rather
> wobbly (probably deprecated by the Authorities); it's not clear even
> how to handle whitespace.
Do you mean that this is an issue _in the absence of a DTD/Schema_? Because
mixed content is found widely deployed apps such as HTML, and is not a
candidate for deprecation.
> b) That there be a unambiguous syntax for literal data, so that one
> would *not* have to declare the intended datatype of every attribute
> value. The convention that "Smith" sometimes refers to "Smith" and
> sometimes to Smith should be done away with. If a string is intended,
> there should be a syntax for specifying strings, either '"Smith"',
> "'Smith'", "\"Smith\"", or ""Smith"". (That last one is kind of
> cute.)
Do you mean XML Schema datatype patterns?
Jonathan Borden
The Open Healthcare Group
http://www.openhealth.org
Received on Wednesday, 16 May 2001 11:38:45 UTC