- From: Patrick Stickler <patrick.stickler@nokia.com>
- Date: Thu, 12 Sep 2002 16:56:58 +0300
- To: "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>, "RDF Core" <w3c-rdfcore-wg@w3.org>, "ext Brian McBride" <bwm@hplb.hpl.hp.com>
Firstly, let me say that I am very sympathetic to Jeremy's interests in making the abstract syntax as easy as possible to realize in various RDF applications, and in principle, am very much in favor of making it as close as possible to practical concrete representations that might be used. That said... ;-) ----- Original Message ----- From: "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com> To: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>; "Patrick Stickler" <patrick.stickler@nokia.com>; "RDF Core" <w3c-rdfcore-wg@w3.org>; "ext Brian McBride" <bwm@hplb.hpl.hp.com> Sent: 12 September, 2002 15:50 Subject: RE: Proposal for abstract syntax representation of inline literals (was Re: weekly call for agenda items) > > I suggest an abstract syntax along the lines of: > > > > An RDF Literal Node can be labelled with one of: > > > > - an RDF String Literal (as now) > > - an RDF XML Literal (as now) > > - a value from the value space of a datatype. > > > > We simply note that an implementation that is unaware of a > > specific datatype > > used in an RDF/XML document will need to store the datatype URI + lexical > > form pair as a fall-back. > > > > An RDF Graph contains precisely one triple for each Literal node in the > > graph. > > > > [As I have previously indicated this extreme syntactic untidiness is > > practically indistinguishable from extreme tidiness, but leaves Pat more > > room to wriggle.] > > Benefits: > > 1. Implementations don't need to store the original string, fits with almost > all plausible implementations of values. (e.g. in a database) What about round-tripping? Does the application then just choose any suitable datatype and lexical representation as it likes, rather than the original pair it recieved? What if the RDF/XML is being returned to the system it originated from, and a datatype that is not recognized or supported by the originating system is used? An application is free to derive and store a native value for each typed literal pairing as part of the internal representation of the typed literal node while still respecting and preserving the datatype/lexicalform label that exists in the abstract graph. The abstract graph should exemplify genericity, system neutrality, and portability. It should not presume anything about system or platform specific characteristics or innate knowledge about datatypes. It is a tool for interchange between *any* arbitrary RDF systems, irregardless of what extra-RDF knowledge those systems might have about particular datatypes, or particular URI schemes, etc. True, many applications will grok the XML Schema predefined types, and probably a number of other commonly used types, and those applications are certainly free to (and would be expected to) interpret typed literal nodes according to the actual values denoted when making comparisons, etc. But that does not mean that the abstract graph should do anything but reflect the actual assertions made, and those assertions are made in terms of a *specific* datatype URIref and a *specific* lexical form, and just as the XML flag and xml:lang information, that should not be discarded in the abstract syntax, but preserved and respected. > 2. Makes the Closed World Assumption on datatyping. This is more accurate > than an Open World Assumption. > When the document author writes > <my:datatype>"10" > they know already whether they mean the string "10", the number 2 (binary), > or 10 or 16 (hex). Yes, as will any application that knows about the datatype <my:datatype>. But knowing that the value is 2 is not necessary to denote the value unambiguously in the abstract syntax. > They are not saying that they are going to make up their mind later about > what the "10" means. <my:datatype> is already defined and they are using > that definition. Someone else shouldn't come and redefine <my:datatype> so > that "10" means something else. How could they? The "owner" of the datatype says what value the lexical form denotes. Someone else can't come along and change the L2V mapping. Using the pairing of datatype URIref and lexical form is just as "closed world" as using the value as the label of the node. The "owner" of a URIref says what it denotes, and if they say that it denotes a particular datatype with particular characteristics, someone else can't come along and change that (at least not in any socially acceptable fashion). > > I agree with Patrick that it has a weakness to do with implementations that > do not know a particular datatype have to fallback to something like his > proposal; which licenses two slightly different behaviours. But even more, how to applications that interact with e.g. a triples store that has opted to map as many typed literal pairing labels to values tell that triples store which datatypes the client application also groks so that the triples store knows which to re-map back into pairing labels versus leaving as values?! This introduces a *huge* interoperability issue, and completely unnecessarily. Each application should itself decide which typed literal nodes it is able to interpret as values, and if it likes, store that value statically internally, but the label of that node is not the value, but the bi-part name of the value consisting of the datatype URIref and a lexical form, and it is that bi-part name that it should use by default when interacting with other RDF applications by generic means (e.g. RDF/XML). > However, note > that XML Schema Datatypes is closed. There are only the predefined basic > types and (possibly user defined) types derived from them. Hence, at least > in principle, an RDF implementation could know all possible predefined > datatypes and the rules for understanding a user definition of a new type. Actually no. XML Schema does not provide the machinery for actually defining the L2V mapping for user-defined datatypes which are direct subtypes of xsd:anySimpleType, therefore it is not possible for an RDF application to know what some arbitrary datatype might mean simply based on the XML Schema machinery. Also, RDF datatyping is not restricted to XML Schema datatypes, and should not be. -- I believe that having values as labels in the abstact graph will introduce portability and interoperability problems between applications as well as misunderstandings and conflict between application developers insofar as different applications/platforms are able to natively interpret and represent different sets of datatyped literals. I also believe it further complicates round-tripping of RDF/XML which already is problemmatic and has been clearly identified as an important functionality of RDF systems. Furthermore, I don't see that having values as labels in the abstract graph actually provides any substantial benefit to developers, as they are free to use actual values in their internal representation anyway, and will anyway have to use bi-part labels for unsupported datatypes. It introduces a variable abstract graph from application to application and two variant representations for typed literals whether the datatype is understood or not. It seems to me to be far clearer, consistent, portable, and reliable to label typed literal nodes with generic pairings than actual values and leave application developers to decide themselves if/how they wish to intern the actual denoted values into their system-specific representations. Cheers, Patrick
Received on Thursday, 12 September 2002 09:57:02 UTC