- From: <Patrick.Stickler@nokia.com>
- Date: Thu, 29 Aug 2002 12:27:50 +0300
- To: <seth@robustai.net>, <www-rdf-comments@w3.org>
> -----Original Message----- > From: ext Seth Russell [mailto:seth@robustai.net] > Sent: 28 August, 2002 20:26 > To: www-rdf-comments@w3.org > Subject: Untidy literals > > > > re: > http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Aug/0247.html > > Where Patrick.Stickler@nokia.com says: > > [[ The present situation, as I see it, is that > 4. The community clearly favors untidy literals ]] > > Well I was there and I certainly don't remember being asked > if I favored > untidy literals or not. I do remember being asked to choose between > mutually distasteful options. Fair enough ;-) > ... that being said ... > > As a implementer I'm not necessarily against untidy literals, > I just simply > do not understand how literals being untidy in the MT will effect my > implementation, if at all. > > How will (should) untidy literals in the MT affect an > implementation of > a RDF application ?? > > ... that being asked .... > > Let me see if my application view of untidy literals matches > with the WG : Well, I won't speak for the WG, but I'll offer some comments in terms of what my understanding of tidy versus untidy literals encompasses. > I think of a literal as a fixed sequence of binary digits .. > for example > '1001100110011001' that is presented to my application as a > sequence of > Unicode characters of some other such thing depending on the > middleware I'm > using. My application can store that sequence of characters > in dozens of > places in memory ... in that sense I would be dealing with > that literal as > untidy .. just like I deal with a bNodes. Well, there is the issue of syntactic untidyness (multiple occurrences of the same literal string repeated in memory) and more importantly semantic untidyness (each occurrence of the same string-equal literal may denote a different datatype value). What you are talking about here is syntactic untidyness, which one would expect to avoid in an actual implementation, so long as it can be done without losing semantic untidyness. I.e., compressing multiple occurrences of the same string-equal literal into a single memory location is fine, so long as that doesn't preclude assigning different interpretations to the occurrences themselves, based on the context of the literal occurrence. > To be efficient, > (because there > are a lot of these strings and some of them are extremely long), my > application contrives to store that string just once and points to it > wherever it is used. In that sense, may I assume that is > dealing with the > literal itself as tidy. Syntactically tidy, yes. > Now I can contrive that nobody form > the outside of > my application can tell whether I am doing that or not .. > this I can do by > dealing with the pointers to the literals in a untidy manner. > But must I > build in this extra level of untidiness in my application? > I simply do not > know based upon the discussions I have heard. > > Philosophically speaking, are literals actually untidy? Insofar as literals may constitute lexical forms and the interpretation of a lexical form is contextual according to the datatype in question, yes. Much, if not most use of inline literals presumes a datatype akin to xsd:string, but alot of inline literals are meant to be intepreted according to other datatypes, which to date has simply been left unspecified at the RDF layer and relegated to the application specific semantics. CC/PP is a good example of this, where e.g. BytesPerPixel takes an inline literal, a lexical form, which is interpreted as denoting an integer value. The true value of the BytesPerPixel property is not a string, it's an integer, so this should be explicit at the RDF layer, not the application layer (IMO, others may disagree). > I > mean every time > you encounter '1001100110011001' do you encounter the *same* > '1001100110011001' or is it a different one? It depends on whether you are talking about the lexical form (string) or what that lexical form denotes. You may repeatedly encounter the same lexical form but not necessarily encounter the same value as denoted by that lexical form. In fact, every single occurrence of that lexical form may denote a completely different value, a completely different thing in the universe. Literals are just local names, and local names are ambiguous. That's why we have constructs such as URIs, so that we have a means to attach names to things which have globally consistent meaning and are never ambiguous. > Certainly you > encounter it > in a different context, ..... yes ... but is it a different > thing every time > you encounter it ? Well, *outside of the context of the > encounter* , can > you distinguish one of the '1001100110011001' from another > one of the > '1001100110011001' ? > > I think not. Well, given the lack of machinery in RDF at present, I agree that it is difficult to distinguish between different contextual interpretations of the same lexical form (at least in a standardized manner). But that is what the untidy datatyping approach is meant to rectify (the alternative tidy approach simply formalizes this inability to express the contextual meaning of inline literals at the RDF layer). Let's take a simple example. Given the lexical representation "10", does that always mean the same thing? Does that always denote the same value? Consider the following literals-in-context: (xsd:integer, "10") (xsd:gDay, "10") (xsd:string, "10") Now, in the first case, "10" denotes the integer value 'ten'. In the second case, "10" denotes the tenth day of the month. And in the third case, "10" denotes the unicode string '10'. Thus, the semantics of the lexical form "10" is contextual and untidy -- it does not act as a global constant as does a URIref or bnode ID. It does not always mean the same thing. The integer ten is not equal to the tenth day of the month is not equal to the string '10' even if they all have identical lexical representations. In this sense, a literal is similar to an XML local name. And a datatype context is similar to a namespace. The local name 'foo' may mean different things in different namespaces, just as a given lexical form such as "10" may mean different things for different datatypes. Now, we could in fact decide that at the RDF layer, we won't capture the contextual untidy semantics of literals, but just say that all we are dealing with are lexical forms (strings) and applications are free to impose contextualized interpretations on those strings as they choose. This would be the tidy option. Fair enough (technically, at least). But that means that RDF reasoners which base their inferences on the RDF MT alone will never be able to capture the fact that two different values are in fact meant (at some level) by the same lexical form and have no choice but to treat all string-equal literals as equivalent in meaning (which technically they would be, given a tidy MT) but this could lead to entailments which would arguably be non-intuitive to users and contrary to the intended meaning. E.g. with a tidy MT, the following entailment would hold: Jenny age "10" . Fred payday "10" . Movie title "10" . entails Jenny age _:x . Fred payday _:x . Movie title _:x . i.e., the precise technical meaning in this case of the above entailment is that the lexical form for Jenny's age is the same as the lexical form for Fred's payday is the same as the lexical form for the movie's title -- which is technically correct -- but the meaning of the above entailment, per the likely intended and/or percieved meaning of the above statements (and in terms of what applications are likely to interpret them as meaning) is that Jenny's age is the same as Fred's payday is the same as the movie's title -- i.e. an integer is the same as a day of the month is the same as a string, which clearly is false insofar as the real world is concerned (at least the one I live in ;-) Now, if we took untidy literal semantics (and abstract syntax), then the above entailment does not hold, as the RDF MT would not be able to assert any equality between the lexical representations disjunct from their context and the determination of value equality other than the single case of both identical datatype and lexical form would be relegated to an extra-RDF application that groks the datatypes in question. I.e. Jenny age _:a"10" . Fred payday _:b"10" . Movie title _:c"10" . does not entail Jenny age _:x . Fred payday _:x . Movie title _:x . For all we know, the above literals *could* have an equivalent meaning, but we can't know that given the information provided above. However, we may make the datatyping assertions which are implicit in the propery names explicit in the RDF thus age rdfs:range xsd:integer . payday rdfs:range xsd:gDay . title rdfs:range xsd:string . where, knowing the semantics of the above datatypes, it becomes crystal clear that we are talking about an integer value, a day of the month value, and a string value which, simply by coincidence, happen to have the same lexical representation. We could also make this distinction explicit for each occurrence, in each statement, by specifying the datatype for each of the literals: Jenny age xsd:integer"10" . Fred payday xsd:gDay"10" . Movie title xsd:string"10" . etc. In the case of the implicit, inline literal _:a"10" the systemID '_:a' is taken to denote "some" datatype, which is simply not specified for the individual occurrence, but is provided by a global range assertion on the property. Thus given age rdfs:range xsd:integer . Jenny age _:a"10" . then the MT gives us I(_:a"10") = I(xsd:integer"10") I.e., the node _:a"10" denotes the integer value ten. > In fact, when you say a literal is untidy, I believe you are > confusing the > mark with the use of the mark. Isn't that distinction very > much like the > distinction that Frege introduced by distinguishing between > the sense and > denotation of a name ? I think the sense of a literal must > be untidy, but > the literal itself (which sits in the model in the domain of > discourse as > that thing denoted) must me be fixed and tidy. > > ... or am I confused as usual .... ? Well, with untidy literal semantics, the sense would be untidy, and contextual, but one could allow the denotation, the mark, to be syntactically tidy, as an issue of graph compression and memory efficiency, so long as one would not infer semantic tidyness from the syntactic tidyness. In fact, I would not expect any triples store worth its salt to mirror an untidy abstract graph syntax religiously, even if it must maintain the untidy semantics reflected in that abstract syntax. There are many ways to optimize the internal representation of the abstract graph with untidy literal nodes while preserving the untidy semantics. I hope the above was at least a little bit helpful. Cheers, Patrick
Received on Thursday, 29 August 2002 05:27:53 UTC