Re: locator syntax, resources, etc. from David G. Durand on 1999-07-23 (www-rdf-comments@w3.org from July to September 1999)

From: David G. Durand <david@dynamicdiagrams.com>
Date: Fri, 23 Jul 1999 00:39:30 -0400
To: w3c-xml-linking-ig@w3.org
Cc: www-rdf-comments@w3.org, www-wca@w3.org
Message-Id: <v04011700b3bd8e8d6ff0@[199.170.24.196]>
At 10:46 AM -0400 7/21/99, Paul Prescod wrote:
>"David G. Durand" wrote:
>> That is exactly why an operation like GetXLinkAnchors is too limiting to be
>> useful.
>
>What do you propose in exchange? My position is "the application" should
>not need to be expert about every media type? If "the application" is (for
>example) a link database or RDF property database, it should be able to
>track references for media types that it does not "natively" understand.
>That implies that we need a plugin that is the equivalent of a "parser"
>but for link targets.
So I
We are only defining target (Xpath results) for XML destinations. I find it
completely acceptable that an application  that intends to do meaningful
processing on a target must be "expert" in the type of data being
referenced. Certainly Xpath (and most useful pointer syntaxes) do _not_
allow identity checking given two arbitrary popinter values without access
to the instance. In fact, there is little useful processing that can be
done on arbitrary pointers without an instance. This ai a reason that some
applications will want to use more restricted pointers.

I realize that we;re not connecting here as well as we should. I think I
understand your position (although I do not agree with it). If you
understand mine, then we may not need to discuss it further, at least on
our own account.

>A parser returns "nodes" (DOM nodes, grove nodes, etc.) What does a "link
>parser" return? That's my fundamental question.

Addresses, that can be interpreted in conjunction with a parsed instance.
Function call return is a limited model that does not encompass the use
cases of spans, which are necessary and are not an integral number of
nodes. This says to me that nodes are not as fundamental as you think they
are -- in indicates to you that spans are not worth implementing
preciesely. We may not be able to agree on this tradeoff between ourselves,
since our perceptions of the same problem lead to incompatible conclusions.

If I could se a compromise, I'd be ecstatic. As it is, I can't, and I see
this discussion as prodcutive of much except bytes at this point. Do you
see any intermediate or union position that I'm too dense to figure out?

>Here's my requirement, given an RDF or XLink annotation database, I need
>to be able to "plug in" media handlers so that I can answer questions like
>"do these two RDF properties apply to the same node?"

This requires a very different kind of model. Since URIs themselves
explicitly do _not_ allow this operation, Xlink is not going to be able to
support it either. I think this is an RDF problem. Even URNs don't support
this operation since multiple URNs may designate the same object without
there being any way to detect this without already knowing the resources
they designate. In a way this is a weakness of the web model: the reasource
designated by a name is part of the conceptual model, but not part of the
interface. Naturally representations of the value of resource _are_ part of
the model, but this is not the same thing. It took me some months to get
this for real. I'd recommend reading the archives of the IETF's various URN
working groups of 4-6 years ago, but I'm not sure what URL to point you at.
Maybe Ron (who fought the good fight consistently in that group) can give
you a pointer.

>Can you specify an implementation model that supports this and is
>implementable for all media types on top of the XLink/XPath/URI/fragment
>machinery?

Since I believe this model to be precluded by the semantics of URIs, I
claim that support for this is impossible in the general case. It's also
practically impossible for XPath, I think.

>It's more powerful but also unimplementable in a manner that scales to
>multiple media types and large databases. Given a ten terabyte database
>and a request for a "delete this element" operation I do not have the time
>to parse every document that COULD POSSIBLY refer to that element. I must
>have that list ready in advance.

Right. This is why engineering 10 terabyte databases is hard.  I think that
the terabyte textbase problem is hard in many ways that this group is not
chartered to handle and that I, at least, am not competent to evaluate,
except in the most trivial way.

>I tend to consider the primary application of out-of-line XLinks to be
>distributed annotation databases. If we implement XLink so that these
>databases are impossible to write, then I think that we will have made a
>large mistake.

I think the word impossible is ill-chosen. Large databases of all sorts are
very difficult. The particular problem you are talking about occurs when
individual web resources are terabytes in size. I think when that is
commonplace, we will know more, and be able to more sensibly evaluate this
problem.

Call me short-sighted, but I think we can't successfully design that
infrastructure and those tradeoffs any better than Ted Nelson could
anticipate the (imperfect) form of the WWW when considering his ideal
vision of Xanadu (even though he was the _only_ person thinking on the
right scale at the time).

>Let's define something concrete and implementable today and extend it to
>the weird cases tomorrow. That's the "way of the Web."

Arbitrary text selections strike me as natural and common. terabyte
web-resources served by HTTP strike me as "weird cases". Your mileage
obviously varies.

>
>I would buy a data model/query model/schema model/API that supported
>n-dimensional spans -- as long as we don't give up on the goal of having a
>data model/query model/schema model/API that is appropriate to all media
>types.

we are only defining fragment IDs for the XML media type. Experts in other
media types must take care of themselves. That's what MIME is for!

>Let me put it this way: there are a variety of places in our information
>system (including query languages, schemas languages and APIs) where we
>must cross the boundary from one media to another. The result of crossing
>that boundary should defined someway. There must be some underlying
>concepts that remain the same so that we can understand and implement
>these cross-boundary situations in an open, extensible, well-defined
>manner.

This would be nice, but I believe that no such conceptual frameork exists.
How to I map a link between the span of all colors from 100,200,100 to
200,200,200 in the HSV space, to the 10 characters at he beginning of this
sentence? We don't know... Only the (weirdo) author of an application that
makes such a link can define that meaning.

>The most obvious and fundamental concept that must be the same across
>media types is the concept of identity. RDF assertions that attach
>properties to /FOO and /FOO/. should be equivalent. When and if SMIL and
>MPEG have similar concepts, we need to be able to unify them. If they all
>have concepts of range and dimension, then we can unify those also.

These are excellent examples of things that are explicitly _inequivalent_
according to the relevant standards. Does knowing that the architects of
the URI/URL explicitly ruled these equivalences out of bounds help you?

>But please let's not wave away the problem and just make media types black
>boxes. If we have to change core applications to support new media types
>we might as well give up. We'll drown in the deluge.

That's why we have media types. That's in fact how they are handled
everywhere they are used, and the practical problems are what slow down the
pace of media-type adoption, but the world has not ended in the years that
MIME has been percolating through the email world. In fact, it's been
improving slowly. The web has improved much more rapidly. Black boxes are
the best way to get modularity, which for data, usually does mean that
people who want to process the data (or interpret references into it) need
to understand what the data format is.

   -- David
_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
http://www.cs.bu.edu/students/grads/dgd/  \  Director of Development
    Graduate Student no more!              \  Dynamic Diagrams
--------------------------------------------\  http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________
Received on Friday, 23 July 1999 13:24:38 UTC