RE: Dispositions of Dave Beckett's comments from jos.deroo.jd@belgium.agfa.com on 2001-04-03 (www-rdf-interest@w3.org from April 2001)

From: <jos.deroo.jd@belgium.agfa.com>
Date: Tue, 3 Apr 2001 18:23:30 +0100
To: rdaniel@interwoven.com
Cc: bwm@hplb.hpl.hp.com, spec-comments@prismstandard.org, www-rdf-interest@w3.org, dave.beckett@bristol.ac.uk, aswartz@swartzfam.com
Message-Id: <OFA5306792.FD9C3F3B-ON41256A23.005DFE8B@bayer-ag.com>
I really like the sense of this reply and that should give us
a lot of confidence!
It was in a similar reply somewhere that I wrote:
"""Yes, one of the things to give up is completeness and
   *the all knowledge is contained in here* thing.
   Maybe this is so obvious in our digital works that we
   forgot it, but we mostly work with *analogies* which are
   not analog at all, but discrete, finite (logical) forms.
   So giving up that, we can only be *as good as possible*
   (agap, there is a gap :-) and we must be aware of that."""

--
Jos De Roo, AGFA http://www.agfa.com/w3c/jdroo/






rdaniel@interwoven.com@INTERNET@w3.org on 04/03/2001 03:59:52 PM

Please respond to rdaniel@interwoven.com@INTERNET

Sent by:  www-rdf-interest-request@w3.org


To:   bwm@hplb.hpl.hp.com@INTERNET
cc:   spec-comments@prismstandard.org@INTERNET,
      www-rdf-interest@w3.org@INTERNET, dave.beckett@bristol.ac.uk@INTERNET,
      aswartz@swartzfam.com@INTERNET
Subject:  RE: Dispositions of Dave Beckett's comments
Hi Brian,

You said:

> I think the solution you propose mean that, for example, if one stored
> PRISM data in an RDF database, e.g. RDFDB, that one would lose
> information essential to PRISM.

Wait a sec. Let's make a distinction between 'important' and
'mission critical'. Preserving things like order of authors is
important for some things, but not mission critical.

PRISM's main use cases showed goals of
  1) discovery of resources
  2) fast determination of rights or rights owner
  3) enhancement of the content
  4) targeted distribution of the content

For discovery, it would be very nice to be able to display
a record about the resource that listed the authors in the original
order instead of alphabetical. But is that MANDATORY? Will
companies fail to find things because of it? No. Would the
users prefer to see the authors listed in the original order?
Probably. So stating it as a quality of implementation issue
seems reasonable.

> I'm concerned that:
>
>   o PRISM applications won't be able to make full use of standard
> RDF/semantic web tools and components

I'm certainly planning to use existing RDF tools instead of
reinventing everything.
But the simple fact is that not all RDF tools are created equal.
For asking questions like
  "gimme all the documents where X is an author"
they will all return the same results. But not in the same time,
using the same disc space, on the same platforms, at the same level
of development effort, at the same cost, or in the same order.

All the PRISM spec currently says is that implementers should be
aware of this, and they MAY prefer to use an underlying RDF engine
that does offer some control over the order. But it is not a MUST
or even a SHOULD.

RDF implementations will improve and offer more functionality over
time. They will do so based on demand from RDF users such as PRISM.

>   o Semantic web tools won't capture the full semantics of PRISM data,
> so their ability to reason about it will be impaired

FYI, this is a hot button issue for me, but I will keep it short.
1) You can't capture the 'full semantics' of anything. Nothing exists
   in isolation except Platonic ideals. We are building models of
   reality. All models are abstractions of reality which throw
   away lots of stuff *on purpose* in order to concentrate on what
   is essential for a particular problem.
2) PRISM's purpose is *explicitly* not general reasoning or description.
   It's purpose is to meet the 4 goals mentioned above.
   It has been shown that people will pay real costs to achieve those
   goals because the benefits can exceed the cost. By explicitly limiting
   the problems PRISM tackles, we can define simple answers for
   certain issues such as 'The Mona Lisa Problem' that bedevil others.

>   o If PRISM claims to be RDF compliant whilst having a different data
> model, this will cause confusion not only for PRISM
> developers, but also
> for RDF developers.

Again, I reject this assertion that PRISM has a different data model.
If you ask identical RDF queries, you get identical results, modulo the
order of the results, which you claim is unspecified. 'Unspecified'
does not mean 'randomized'. It means it is up to the implementer.

There are, of course, PRISM-specific behaviors, such as how to
process PRL clauses. But that is not at the RDF level.

On a more practical note, a feature like SQL's ORDER BY clause will
prove to be important to many people as they try to put RDF to use
in real applications. Relational databases regard order as
insignificant, unless a query says otherwise. Makes sense to me.

I would accept a milder assertion, that PRISM recommends certain things
that are not commonly implemented in all RDF processors. But that
is customer demand.

> If we could find a way of meeting PRISM's needs whilst fully
> representing PRISM's semantics in standard RDF syntax, then
> standard RDF
> tools are more likely to be useful in PRISM applications and we would
> reduce the risk of adding further confusion about what the
> RDF datamodel
> really is.

As mentioned above, I take it as axiomatic that one cannot
fully represent the semantics of anything. What one can do is
represent them to a degree of accuracy such that the errors are
acceptable for a given purpose.
Reordering errors are tolerable given PRISM's goals, but I predict
that users would show a decided preference for not arbitrarily
messing around with the order of things.

> I think we'd probably all agree that the RDF model (not the augmented
> PRISM model) needed to represent the order of authors needs to use a
> sequence.

No, I would not agree with that at all.

> I'm not sure I understand your objection to using it.

Because Seq, like Bag and Alt, has a particular meaning that is not
ALWAYS appropriate.

> Since the solution you are proposing regards the ordering of
> authors as
> always significant

No, I do not regard the order as ALWAYS significant. It is
sometimes significant, sometimes not. This is a place where
simple modeling breaks. You either make the model more complex
to deal with it, or you do something outside the model.
Proposing "always use Seq" does not fix the problem. It explicitly
states that the order is always important, so important that it
has to be encoded in the model. And that is not true. For the model
to be more correct, you have to allow Seq to appear sometimes, and not
appear others. At that point, why bother? The costs of the
reordering errors look to be less than the costs of the added
complexity.

 - you could avoid burdening your cataloguers with
> making any decisions by simply always requiring the sequence element.

I find this a surprising statement from anyone who wants general
purpose 'logic' tools to operate a semantic web on top of many
collections of RDF data. "Oh, it doesn't matter if the order is
really significant, just always say that it is". I prefer to
NOT say such things to machines. No telling where they will run
off to with it.

> Presuming there is tool support for generating the RDF/XML,
> this burdens
> the cataloguers not a whit.

That is true. Tool-wise, it doesn't bother the catalogers. It
does take a little explanation to the tool-builders, but not a lot.
But it will play hell down the line when people have to decide
if a Seq element was put in because the order was REALLY important
or just because somebody wanted to enforce a syntactic rule.

> I'd really like to help find a way to enable PRISM to fully represent
> its semantics in standard RDF.

Put an 'ORDER BY' clause into the requirements list for the eventual
RDF query language, and make sure that 'document order' is one of its
allowed expressions. Then, logic engines can decide when order is
important by analyzing the queries and not the underlying data.

Ron
Received on Tuesday, 3 April 2001 12:25:54 UTC