- From: Jonathan Borden <jborden@mediaone.net>
- Date: Wed, 16 May 2001 11:22:25 -0400
- To: "Drew McDermott" <drew.mcdermott@yale.edu>, <www-rdf-logic@w3.org>
Drew McDermott wrote: > ...The problem is that DAML has inherited from the SGML/XML > tradition this vagueness about exactly what the leaves of the tree are > in a marked-up document. err... http://www.w3.org/TR/REC-xml provides a set of 89 EBNF productions that precisely define the XML abstract syntax on top of a UNICODE character stream. SGML has had a tradition of precise specification regarding every aspect of the trees it describes (it's called Groves): http://www.prescod.net/groves/shorttut/ There are two sorts of leaves: > > Attributes: <tag name="Smith"> .... </tag> > > Elements with no markup inside: > <name>Smith</name> actually there are 89 productions, not all are leaves but there are more than two sorts of leaves, perhaps these two are relevent: [39] element ::= EmptyElemTag | STag content ETag [WFC: Element Type Match] [VC: Element Valid] and [43] content ::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)* /* */ > > My CS instincts tell me that I should be looking for some notion of a > "literal" at this point. I.e., in Java I can write > > name = "Smith"; > > and the compiler treats "Smith" as a literal string. Each language > defines a syntax for literals that makes it unambiguous what the > literal denotes. At least, I believe this to be the case. Does > anyone know of any exceptions? For instance, in C-like languages > 76 is the integer 76, and 0x76 is the integer 118 (because the "0x" > makes the literal hexadecimal). > > Unfortunately, this is not how XML works. First, there is the > regrettable choice of quotes to surround all attribute values. Shrug. SGML doesn't have this restriction. XML makes some choices to provide a simplified syntax. > It > seems to imply that all attributes have string values, which they > emphatically do not. First, XML does specifically define its own set of types for attributes: [54] AttType ::= StringType | TokenizedType | EnumeratedType [55] StringType ::= 'CDATA' [56] TokenizedType ::= 'ID' [VC: ID] [VC: One ID per Element Type] [VC: ID Attribute Default] | 'IDREF' [VC: IDREF] | 'IDREFS' [VC: IDREF] | 'ENTITY' [VC: Entity Name] | 'ENTITIES' [VC: Entity Name] | 'NMTOKEN' [VC: Name Token] | 'NMTOKENS' [VC: Name Token] but more to the point the XML Scheme datatypes specification provides for 'binary' attribute datatypes. I can go on and quote large sections of the XML spec, but I suspect that at the end of the day we will find that your issues are not with XML itself, rather RDF? > > Anyway, can someone point me to the authoritative source on literal > data in RDF/DAML? If there isn't one, I would be inclined to > recommend: > > a) That literals occur *only* as attribute values. Text in elements > is just too unconstrained. The notion of "markup-free text" is rather > wobbly (probably deprecated by the Authorities); it's not clear even > how to handle whitespace. Do you mean that this is an issue _in the absence of a DTD/Schema_? Because mixed content is found widely deployed apps such as HTML, and is not a candidate for deprecation. > b) That there be a unambiguous syntax for literal data, so that one > would *not* have to declare the intended datatype of every attribute > value. The convention that "Smith" sometimes refers to "Smith" and > sometimes to Smith should be done away with. If a string is intended, > there should be a syntax for specifying strings, either '"Smith"', > "'Smith'", "\"Smith\"", or ""Smith"". (That last one is kind of > cute.) Do you mean XML Schema datatype patterns? Jonathan Borden The Open Healthcare Group http://www.openhealth.org
Received on Wednesday, 16 May 2001 11:38:45 UTC