- From: Dave Beckett <dave.beckett@bristol.ac.uk>
- Date: Thu, 18 Apr 2002 16:33:25 +0100
- To: w3c-rdfcore-wg@w3.org
Rick Jelliffe asked me to forward this to the list. He didn't want to post directly given that this is a draft of WD and clearly we are still busy discussing it internally and aren't really expecting external comments. Dave ------- Forwarded Message From: Rick Jelliffe <ricko@topologi.com> Subject: Comment on RDF datatypes draft Date: Fri, 19 Apr 2002 01:35:05 +1000 While I like the draft, I think it tends to perpetuate XML Schema's fuzzy approach to datatyping: that being that the lexical space versus value space is trumpeted, but not exploited in such a way to render it useable for many (most?) kinds of idiomatic data. The problem can be exemplified like this: "How do I say 'This value is a US-format date'?" In XML Schemas datatypes, we have a date value space (which I leave experts to argue about.) But we only have a single lexical space which corresponds to ISO 8601 more or less: a format no non-geek uses. This desire for a single lexical space (except in the case of boolean) creates several problems: 1) It hinders people who have data in some format already. For example, people who want to make their DTDs RDF-compatible. 2) It requires an extra layer of software to localize it: therefore it is skewed against thin or simple clients and towards back-end data interchange. Thus it is "internationalized" without allowing "localization", which is ultimately always needed to become usable. 3) It is conceptually weak, because it lumps all lexical values together higgeldy piggeldy (sp), as if "true" is the opposite of "0". 4) It only works when referring to data in XML: you cannot type outside data, let alone provide type information about binary data (say, embedded in XML as Bin64) How could the RDF Datatypes proposal be strengthened to cover these cases? The lexical space needs to be compartmentalized into nameable subspaces. In the RDF Datatypes draft, there is a notion of rdfd:lex Lexical Form Idiom. However, this idea is already present in XML, XML Schemas and, most importantly, ISO 8879 SGML: it is called NOTATION. In SGML, a NOTATION is a pysical/lexical form (perhaps even a binary format) which has an implied type (which may be a structured type). Because NOTATION is a property of some resource or range in a resource, the idea of a value space without a lexical space never really crops ups. So a NOTATION is, to all intents and purposes, the name of a type. This seems pretty much the the same as rdfd:lex, except that strengthens it to include non-text formats. So XML is itself a notation. An XML document is a tree of notations, just as much as it is a tree of elements and a tree of entities. The MIME ContentTypes such as plain/text are notations. Compression is a notation. Encoding in UTF-8 is, at another extreme, a notation. So a particular physical document may not only contain multiple nested notations, it may itself be transformed through various notations to get to particular physical forms. "Lexical Space" is just a notation which uses Unicode. So the lexical space can be compartmented into particular notations, but there can be non-text notations too, forms for the same value space. For example, that you should interpret a binary file a list of integers serialized out to words in little-endian order is its notation. So "lexical space" is a particular range of notations. So I suggest revising the datatypes draft: 1) Substitute rdfd:notation instead of rdfd:lex 2) The "canonical lexical space" becomes the "canonical notation" 3) The "canonical lexical mapping" becomes the "canonical namespace mapping" 4) A datatyped literal is a triple <value space, notation, string> 5) Notations can be named by URIs (as in XML Schemas & DTDS) 6) redefine other things, such as the definition of range, to apply to particular notations, thus allowing the same lexical representation to map to different values in different notations for the same type. (Say we have a type which is an enumeration of abstract courses in a meal: "entre" in US locale means "main course" while in EU locale it means "initial course". Or, 3/2/01 will mean a different date depending on the locale.) 7) ramifications worked through, to allow typing of non-XML and embedded binary data Cheers Rick Jelliffe www.topologi.com ------- End of Forwarded Message
Received on Thursday, 18 April 2002 11:33:27 UTC