On equivalence of tidy/untidy (was: Re: Reopening tidy/untidy decision)

Summary: it seems that tidy/untidy is an implementation detail...

Frank Manola wrote:

> Sergey--
> 
> I'd like to see some further discussion of points (a) and (3) you're 
> making here, since I think that, while they are key points, I don't feel 
> that they are entirely "substantiated" (at least not yet to my 
> satisfaction), and I'd like some more details.  So adding this stuff to 
> the document is great. I don't feel the same about point (b) because I agree with it, but I 
> don't think it matters that much.  I don't think anyone has claimed 
> that, via specifying a datatype like integer for a value, you are going 
> to capture all the application semantics that are associated with the 
> use of that value in a property, and hence automatically forbid things 
> like comparing ages and shoe sizes.


My impression was that key arguments for untidiness built on the 
assumption that using strings as ranges of properties such as dc:Creator 
or :age was inacceptable, and had to be effectively forbidden by 
treating untyped literals as a kind of labeled existential variables. 
All I wanted to clarify is that doing so simply elevates the problem of 
heterogeneity one level higher, and does not help applications to 
interoperate.

> If you want to go to additional 
> lengths to further specify the types (like defining types for age and 
> shoe size, as some people would do), you can further constrain the 
> interpretations, but clearly most people draw the line somewhere.  Not 
> to mention the fact that you might not want to preclude yourself from 
> doing some data mining type of operation that you hadn't thought of when 
> you designed the type system that involves comparing people's ages and 
> shoe sizes [this gets into my point about wanting different comparison 
> operators, which I'll not get into here].  It seems to me the point 
> we're trying to address here is somewhat simpler:  we've now introduced 
> a datatype facility into RDF, where literals can be typed in several 
> ways.  The question is (unless I'm mistaken), how does *RDF* interpret 
> those literals that haven't been explicitly assigned a datatype by one 
> of these mechanisms?  Do we say they have an implicit datatype of some 
> sort (or have a fixed interpretation in some other way), or do we say 
> they are the lexical things we talk about in the datatype facility, but 
> we don't know what type they are?  Either way, applications are going to 
> associate additional semantics with the values they get from RDF, and 
> RDF won't know anything about those semantics.


I absolutely agree with your conclusion. I think part of the problem is 
that "RDF" does not interpret anything ;) Now, seriously, imagine that 
there is an application layer that is common to every RDF application 
(this is where "RDF" interpretation kicks in). This layers is capable of 
parsing RDF/XML documents into graphs, and provides a set of routines 
for traversing and updating the graphs. (This is, I guess, a rough 
characterization of what "RDF APIs" currently do). This "API" layer has 
no schema support, knows nothing about rules, and has to built-in 
semantics of any RDF properties.

As you formulated the question above, we are talking about two ways of 
implementing this API layer. In one case, all occurrences of an untyped 
literal having the same string content map to one graph node, in the 
other case, each occurrence results in a separate node. These separate 
nodes have internal structure: they contain a single string label. 
Notice that even if they contain say some system IDs in a concrete 
implementation, these IDs are supposed to be transparent to applications 
and the layer itself: each such ID can be replaced by another unique ID 
without change in semantics.

The funny thing is that both ways of dealing with the untyped literals 
sketched above are isomorphic. In more formal terms, the information 
capacity of each of the two data models is equivalent. That is, there is 
a bijective function between the set of "tidy" graphs and the set of 
"untidy" graphs. In fact, each edge of an untidy graph (s, p, o), where 
o is an untidy literal, can be mapped to an edge (s, p, 
stringValueOf(o)) of a tidy graph. A reverse mapping takes (s, p, o) as 
input that creates (s, p, uniqueUntidy(o)) for each untyped o.

The above effectively proves that each conceivable application that 
assumes untidy (or tidy) semantics behaves equivalently if we change the 
graph semantics to tidy (or untidy) and plug in an intermediate 
"conversion" layer between the application and the original untidy (or 
tidy) API layer. That is, "RDF" does not care about (un)tidiness. 
Consider the following "Melnik" test (modestly called after Turing test):

Given: an application X that communicates with the external world using 
RDF/XML documents.
Goal: find out whether X assumes tidy or untidy semantics for untyped 
literals.

My conjecture is that there is no way to distinguish whether an 
application deploys tidy or untidy semantics. Therefore, it's an 
implementation detail, which matters only for defining a standard, 
W3C-blessed RDF API, and is irrelevant for the spec we are working on.

Sergey



> --Frank
> 
> Sergey Melnik wrote:
> 
>>
>> Brian McBride wrote:
>>
>>>
>>> At 22:21 26/09/2002 +0300, Patrick Stickler wrote:
>>>
>>>
>>>> I ask that the proponents of string-based (tidy) semantics
>>>> present their arguments to the WG in the same manner
>>>> as the proponents of value-based (untidy) semantics were
>>>> asked to do prior ro last Friday's vote.
>>>
>>>
>>>
>>>
>>> That seems sensible.  I suggest we collect all the reasons for and 
>>> against each proposal into the rationale document we started this week.  
>>
>>
>>
>>
>> Brian,
>>
>> how can "tidy" folks contribute to that document? I'd like the 
>> reasoning of [1,2] to be included. The points substantiated in [1,2] 
>> are these:
>>
>> a) Untidiness is not required for correct modeling, or 
>> forward/backward compatibility.
>>
>> b) Untidiness does not solve a general issue of using substitute 
>> artifacts in property ranges (claimed by untidy folks). Examples are 
>> using strings instead of names, names instead of persons, strings 
>> instead of integers, integers instead of kilograms, kilograms instead 
>> of masses, integers instead of masses, strings instead of masses. This 
>> is common modeling practice and cannot possibly be forbidden, let 
>> alone by using untidy literals.
>>
>> 3) Untidiness requires changes in existing apps and APIs, whereas tidy 
>> interpretation does not.
>>
>>
>> Sergey
>>
>> [1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Sep/0283.html
>> [2] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Sep/0297.html
>>
> 
> 

Received on Monday, 30 September 2002 12:18:10 UTC