RE: Dispositions of Dave Beckett's comments from Ron Daniel on 2001-04-03 (www-rdf-interest@w3.org from April 2001)

From: Ron Daniel <rdaniel@interwoven.com>
Date: Mon, 2 Apr 2001 17:24:48 -0700
To: "'Aaron Swartz'" <aswartz@swartzfam.com>, "'Dave Beckett'" <dave.beckett@bristol.ac.uk>, "'RDF Interest'" <www-rdf-interest@w3.org>
Cc: "'spec-comments'" <spec-comments@prismstandard.org>
Message-ID: <00a501c0bbd4$7eb13f20$e814000a@interwoven.com>
People interested in the upshot of this thread can
cut to end to see what I'm currently putting into the
PRISM spec.

----

Aaron said:

> it 
> seems to me
> that having a file which claims to be RDF but means one thing 
> to an RDF
> processor and another to a PRISM processor seems to be a bad thing.

Yes, that would be bad if it were really the case. It is not
clear to me that this it is. Correct me if I am wrong, but two
RDF models will 'mean' different things if and only if they
return different results for a query such as
    (some.doc dc:creator ?X)
-- modulo the order of the results that X binds to.

Dave Beckett's suggestion simply says that the order, which
I think you claim not to care about, should be allowed to go a
certain way. Kind of like adding an equivalent of SQL's ORDER BY
clause to the RDF query language, and allowing 'original
document order' as a value for it.

The only way the RDF would 'mean' something different would
be if someone goes in and starts changing the RDF model
derived from the input, adding an rdf:Seq where none was
originally. That is certainly not my intent.

It would be a perfectly legitimate implementation technique to
use rdf:Seq to track the order of statements. But it would
be improper to add an rdf:Seq to the model. The tracking of
statement order needs to be held externally, as a system
annotation about the model it has imported. And there are
many different ways of implementing this that have nothing
to do with rdf:Seq. I'll spare you the litany in the interests
of space.

This is not a data model issue. It is a quality of
implementation issue, which Dave's earlier message
clarified for me. PRISM implementations should prefer
to be implemented on top of RDF software that can
reconstruct the original order, just as they should
prefer to be implemented on top of XML software that
knows about the xml:base attribute.

> Furthermore, I don't see why this is necessary. There is a simple
> RDF-compatible way to deal with this situation, and I'm not 
> sure why you
> can't use it. As Roland pointed out, simply put these in an 
> rdf:Seq and this
> will indicate that order should be maintained to any RDF 
> processor. Just
> like this:
> 
> <dc:creator>
>   <rdf:Seq>
>      <rdf:li>Contributor 1</rdf:li>
>      <rdf:li>Contributor 2</rdf:li>
>   </rdf:Seq>
> </dc:creator>
> 
> Is there any reason why this can't be done?

Yes. What is 'simple' to you and I is not simple to
everyone. Making an
explicit Seq means that the order IS significant, and
as Roland pointed out, there are many times when it is
not. So you introduce options into how things are handled.
By the time you deal with Seq for significant
order, Bag for authorship by committee without
individual attribution, and repeated elements for
other cases, you have done four VERY BAD THINGS:
  1) Raised the cost of training catalogers on which
     model to use.
  2) Raised the implementation and maintenence cost of
     software that will analyze the record and do things
     with it.
  3) Raised the cost of cataloging, because people are being
     asked to make subtle distinctions.
  3) Raised the error rate in the models because:
     a) People use the wrong model. (e.g. using Bag when
        the work was written by individuals, not a group).
        Since the distinctions are subtle, the error rate will
        be high.
     b) Provided a predefined model that was close, but
        not quite a match. Real life is too ambiguous to
        be accurately modeled with so coarse a set of tools
        as Bag, Seq, Alt, and their absence.

Raising costs means I can't sell this stuff to publishers.
Raising the error rates means that logical inference code can't
handle it as cleanly, thus slowing the benefits of the semantic
web.

You have heard of the difference between accuracy and
precision? 'Simply' using Seq provides precision, but
not accuracy. Beware of its unintended consequences.

---------------
Based on discussions so far, here is what I plan on
doing:

Current wording of the part of the spec that responds to
Dave Beckett's suggestion will be changed to:

4.8.3 Further Qualifications
...
Note that although a sequence of dc:creator elements in an
RDF/XML file implicitly defines a sequence (in the XML world), 
RDF parsers have no obligation to preserve that ordering, 
unlike if an explicit rdf:Seq were given. PRISM implementors 
are advised that there are quality of implementation issues
between different RDF processors. In general, implementers
MAY prefer to build on top of an RDF parser that allows
the original order of the statements to be reconstructed.
That will allow the original order of the
authors on a piece to be reconstructed, which might or
might not carry additional meaning to the viewer of a styled version
of the record. Similarly, XML software that can handle the
almost-standardized xml:base attribute will be preferred.
...
Received on Monday, 2 April 2001 20:26:19 UTC