Re: Dispositions of Dave Beckett's comments from Brian McBride on 2001-04-05 (www-rdf-interest@w3.org from April 2001)

From: Brian McBride <bwm@hplb.hpl.hp.com>
Date: Thu, 05 Apr 2001 12:16:09 +0100
To: rdaniel@interwoven.com
CC: "'Aaron Swartz'" <aswartz@swartzfam.com>, "'Dave Beckett'" <dave.beckett@bristol.ac.uk>, "'RDF Interest'" <www-rdf-interest@w3.org>, "'spec-comments'" <spec-comments@prismstandard.org>
Message-ID: <3ACC53F9.6982671A@hplb.hpl.hp.com>
Hi Ron,

You've made some good points here:

  o one can't expect to 'fully' capture all the semantics -
    careless language on my part

  o not all tools are created equal - they will/should have
    different characteristics

  o we'll learn a lot from the needs of RDF applications
    such as PRISM

  o RDF tools which don't preserve document order
    can still do most things with PRISM

Right on!

But I've got a nagging feeling there is more to this than a
'quality of implementation' issue.  So maybe here is 
chance for me to learn something :)

I take your point about wanting to reserve Seq for when
author order is 'really' significant.  This is an 
interesting modeling point, which will come up in
many situations. What do we do when a sequence may be ordered
by different criteria.

One way would be to add a property to the sequence which 
described the ordering criteria.  In PRISM, the absence of
such a property could be interpreted as 'document order' and
there is still the freedom to add different orders 'down the
line'.

This approach would

  o not be a burden on cataloguers
  o is reasonable for developers to cope with
  o encodes the ordering info in the RDF model
  o allows for addition of alternative ordering criteria in 
    a flexible manner

Your suggestion about requirements for an RDF Query
language is interesting.  Query languages, as I
understand them, are defined in terms of some data model.
For example, XML Query is defined in terms of an extended
infoset model.  The RDF data model has no concept of
document order.  And in cases of applications which are
directly updating a database, I'm not sure what document
order would mean.

A key question is whether PRISM is relying on 
preservation of document order to meet its requirements.
If applications will not function correctly unless document
order is preserved, can this be characterized as a 'quality
of implementation' issue?  When you say that users would prefer
to see authors listed in their original order, it seems like
a requirement to me.

Brian  



Ron Daniel wrote:
> 
> Hi Brian,
> 
> You said:
> 
> > I think the solution you propose mean that, for example, if one stored
> > PRISM data in an RDF database, e.g. RDFDB, that one would lose
> > information essential to PRISM.
> 
> Wait a sec. Let's make a distinction between 'important' and
> 'mission critical'. Preserving things like order of authors is
> important for some things, but not mission critical.
> 
> PRISM's main use cases showed goals of
>   1) discovery of resources
>   2) fast determination of rights or rights owner
>   3) enhancement of the content
>   4) targeted distribution of the content
> 
> For discovery, it would be very nice to be able to display
> a record about the resource that listed the authors in the original
> order instead of alphabetical. But is that MANDATORY? Will
> companies fail to find things because of it? No. Would the
> users prefer to see the authors listed in the original order?
> Probably. So stating it as a quality of implementation issue
> seems reasonable.
> 
> > I'm concerned that:
> >
> >   o PRISM applications won't be able to make full use of standard
> > RDF/semantic web tools and components
> 
> I'm certainly planning to use existing RDF tools instead of
> reinventing everything.
> But the simple fact is that not all RDF tools are created equal.
> For asking questions like
>   "gimme all the documents where X is an author"
> they will all return the same results. But not in the same time,
> using the same disc space, on the same platforms, at the same level
> of development effort, at the same cost, or in the same order.
> 
> All the PRISM spec currently says is that implementers should be
> aware of this, and they MAY prefer to use an underlying RDF engine
> that does offer some control over the order. But it is not a MUST
> or even a SHOULD.
> 
> RDF implementations will improve and offer more functionality over
> time. They will do so based on demand from RDF users such as PRISM.
> 
> >   o Semantic web tools won't capture the full semantics of PRISM data,
> > so their ability to reason about it will be impaired
> 
> FYI, this is a hot button issue for me, but I will keep it short.
> 1) You can't capture the 'full semantics' of anything. Nothing exists
>    in isolation except Platonic ideals. We are building models of
>    reality. All models are abstractions of reality which throw
>    away lots of stuff *on purpose* in order to concentrate on what
>    is essential for a particular problem.
> 2) PRISM's purpose is *explicitly* not general reasoning or description.
>    It's purpose is to meet the 4 goals mentioned above.
>    It has been shown that people will pay real costs to achieve those
>    goals because the benefits can exceed the cost. By explicitly limiting
>    the problems PRISM tackles, we can define simple answers for
>    certain issues such as 'The Mona Lisa Problem' that bedevil others.
> 
> >   o If PRISM claims to be RDF compliant whilst having a different data
> > model, this will cause confusion not only for PRISM
> > developers, but also
> > for RDF developers.
> 
> Again, I reject this assertion that PRISM has a different data model.
> If you ask identical RDF queries, you get identical results, modulo the
> order of the results, which you claim is unspecified. 'Unspecified'
> does not mean 'randomized'. It means it is up to the implementer.
> 
> There are, of course, PRISM-specific behaviors, such as how to
> process PRL clauses. But that is not at the RDF level.
> 
> On a more practical note, a feature like SQL's ORDER BY clause will
> prove to be important to many people as they try to put RDF to use
> in real applications. Relational databases regard order as
> insignificant, unless a query says otherwise. Makes sense to me.
> 
> I would accept a milder assertion, that PRISM recommends certain things
> that are not commonly implemented in all RDF processors. But that
> is customer demand.
> 
> > If we could find a way of meeting PRISM's needs whilst fully
> > representing PRISM's semantics in standard RDF syntax, then
> > standard RDF
> > tools are more likely to be useful in PRISM applications and we would
> > reduce the risk of adding further confusion about what the
> > RDF datamodel
> > really is.
> 
> As mentioned above, I take it as axiomatic that one cannot
> fully represent the semantics of anything. What one can do is
> represent them to a degree of accuracy such that the errors are
> acceptable for a given purpose.
> Reordering errors are tolerable given PRISM's goals, but I predict
> that users would show a decided preference for not arbitrarily
> messing around with the order of things.
> 
> > I think we'd probably all agree that the RDF model (not the augmented
> > PRISM model) needed to represent the order of authors needs to use a
> > sequence.
> 
> No, I would not agree with that at all.
> 
> > I'm not sure I understand your objection to using it.
> 
> Because Seq, like Bag and Alt, has a particular meaning that is not
> ALWAYS appropriate.
> 
> > Since the solution you are proposing regards the ordering of
> > authors as
> > always significant
> 
> No, I do not regard the order as ALWAYS significant. It is
> sometimes significant, sometimes not. This is a place where
> simple modeling breaks. You either make the model more complex
> to deal with it, or you do something outside the model.
> Proposing "always use Seq" does not fix the problem. It explicitly
> states that the order is always important, so important that it
> has to be encoded in the model. And that is not true. For the model
> to be more correct, you have to allow Seq to appear sometimes, and not
> appear others. At that point, why bother? The costs of the
> reordering errors look to be less than the costs of the added
> complexity.
> 
>  - you could avoid burdening your cataloguers with
> > making any decisions by simply always requiring the sequence element.
> 
> I find this a surprising statement from anyone who wants general
> purpose 'logic' tools to operate a semantic web on top of many
> collections of RDF data. "Oh, it doesn't matter if the order is
> really significant, just always say that it is". I prefer to
> NOT say such things to machines. No telling where they will run
> off to with it.
> 
> > Presuming there is tool support for generating the RDF/XML,
> > this burdens
> > the cataloguers not a whit.
> 
> That is true. Tool-wise, it doesn't bother the catalogers. It
> does take a little explanation to the tool-builders, but not a lot.
> But it will play hell down the line when people have to decide
> if a Seq element was put in because the order was REALLY important
> or just because somebody wanted to enforce a syntactic rule.
> 
> > I'd really like to help find a way to enable PRISM to fully represent
> > its semantics in standard RDF.
> 
> Put an 'ORDER BY' clause into the requirements list for the eventual
> RDF query language, and make sure that 'document order' is one of its
> allowed expressions. Then, logic engines can decide when order is
> important by analyzing the queries and not the underlying data.
> 
> Ron
Received on Thursday, 5 April 2001 07:15:41 UTC