RE: Use of www-rdf-dspace for comments re: early draft note, DSpa ce History System from Seaborne, Andy on 2003-05-07 (www-rdf-dspace@w3.org from May 2003)

From: Seaborne, Andy <Andy_Seaborne@hplb.hpl.hp.com>
Date: Wed, 7 May 2003 16:37:17 +0100
To: "'SIMILE public list'" <www-rdf-dspace@w3.org>
Message-ID: <5E13A1874524D411A876006008CD059F06C23641@0-mail-1.hpl.hp.com>
[1]
http://web.mit.edu/simile/www/resources/history-harmony/history-statement-of
-work.htm
[2]
http://web.mit.edu/simile/www/documents/historySystem/descriptiveNote/descri
ptiveNote.pdf

Comments on "DSpace History System Descriptive Note" version of 1/May/2003
[2]
============================================================================
==
    Andy Seaborne
    7 May2003

Overall Comments
----------------

1/ Some use cases would be a good idea to give more detail to the
recommendations for changes and explain where the existing DSpace History
mechanism approach can continue as is.

Specifically, a sample of queries that need to be supported would be useful.

2/ RDFSchema

A better term is "vocabulary" as an RDF Schema isn't like an XML schema. An
RDF Schema (vocabulary) is a collection of definitions of classes and
properties.  It MAY be used for validation but that is not the sole
intention.  Strictly, RDF can never be "invalid" at the RDF level (open work
assumption - there is one corner case in datatypes which does allow a
contradiction) but an application may impose an additional rule that it
wishes to ensure that only terms from known vocabularies are used.

Presumably this is the case for the history system - no additional
annotation or other data is allowed.  That is a design choice to be made
(actually, this is not clear as later the document discusses annotation with
HTML).

3/ The history store should be a single database to allow searching across
all records, not the current database of indeices to serializations in a
file.


Section 1.2
-----------

"The current history system uses RDF as a model for generating XML that is
stored in order to track the history of a managed item"

I'm quite sure what this means - this section seems to use "model" in two
senses - "RDF model" (a collection of statements - a term used in M&S 1999
and now being downplayed in the latest specs) and "model" as in "to design".

The use of XML is surely just the syntax in which the RDF is stored.  The
same RDF graph can be encoded in XML is more than one way.

Section 1.3.1
-------------

"For query purposes, values should be simple types, and RDF resources should
be used whenever applicable."

I am not sure why we are restricting the RDF here.  Some values are not
simple types (example from Dublin Core: people's names).  A query might be
"find all books [a type] with author family name 'Rowling'".

Better would be to adopt whatever is the practice of the vocabularies and
domains employed.

"RDF resources" : I don't understand this part of the sentence.


Section 3
---------

Use cases would help ground the choices.

I presume that the history store will become a single database so that
queries can be effectively made over many records.  As I understand it, at
the moment, this is not the case.

SIMILE may still wish to retain a record-per-file for management and
tracking of the history store - the main storage should be the database.

(aside: the examples seem to be Jena.  Namespace names are not preserved so
"dc:" for Dublin Core is used and retained to aid readability).

Section: 3.1.1
--------------

Agreed: SIMILE should use standard vocabularies, not import them.  It may
wish to add or refine properties/classes as well.


Section 3.1.2
-------------

"Duplicate Properties" : confusing title.  Its not multiple statements on
the same resource with the same property; its multiple definitions of the
same concept.  

Use of standard vocabularies means unique URIs : the same short name has
different URIs.

Dublin Core: http://purl.org/dc/terms/hasPart

I could not find hasPart in Harmony ABC:
http://metadata.net/harmony/ABC/ABC.rdfs
I did find http://metadata.net/harmony#isPartOf.


[The namespace URI http://metadata.net/harmony# does not lead to a
vocabulary: the vocabulary definition is at URL
http://metadata.net/harmony/ABC/ABC.rdfs]

We should define when two properties are the same : i.e. they are
subProperties of each other (RDFS) or owl:sameAs.

This also raises the issue of RDFS-level inference.  One possibility is to
run a forward-chainer over data input once and not do run-time inference.
The choice will depend on the complexity of the vocabularies and cross term
mappings.


Section 3.1.3
-------------

"An RDF Resource is the highest level of abstraction
within RDF and can represent any concrete or abstract resource."

I don't understand this. - it seems to be 'resources represent resources'.

I agree that type information (by which I mean use of rdf:type) would be
good but that does not mean query is more difficult.  Query can find things
by some defining property (e.g. book by ISBN).

A resource can have multiple types in RDF.

I agree that URIs should be opaque.

"After applying recommendation" example is wrong

1- its not valid XML.
2- rdf:Collection is not a valid type defined in the RDF namespace
(rdf:parseType="Collection" is an RDF/XML syntax device for things of type
rdf:List).


Section 3.1.4
-------------

Agree.

The property value is the empty string.  That is the RDF definition of
<x></x>.


Section 3.2.1
-------------

Agree: Use Dublin Core as defined.  Dublin Core qualifiers are sub
properties of the core terms.  Either use query rewriting (yuk!) or simple
inference (forward or backward).

Section 3.2.2
-------------

Agree: Use RDF mechanisms.  Keep URIs opaque.

However, there is another mechanism to consider if there is an open ended
set of bit_stream_logo; use bNodes and identifying properties (this may be
closer to the Dspace History design):

 <rdf:Description rdf:about="http://www.dspace.org/collection/1721/113">
  <ex:logo_bitstream>
     <ex:BitStreamLogo>
        <ex:bit_stream_logo_id>202</ex:bit_stream_logo_id>
     </ex:BitStreamLogo>
  </ex:logo_bitstream>
 </rdf:Description>

or in N3:

<http://www.dspace.org/collection/1721/113>
      ex:logo_bitstream [ rdf;type               ex:BitStreamLogo ;
                          ex:bit_stream_logo_id  "202"
                        ] .

This also gives a rdf:type to the bNode.

Section 3.2.3
-------------

Agreed.  And the History system becomes one database doesn't it (needed for
searching)?


Section 3.2.4
-------------

Not sure about this.  I would like to understand the use case for annotation
here.

Use of XML literals/XHTML may be possible but would mandate XHTML which is
probably not sensible.  Need to decide on what is being stored: (X)HTML
document or fragements?  If fragments, what context is ued to display it
(style sheet etc)?. 

Also: can use CDATA instead of escaping HTML.


Section 3.3.1
-------------

Agreed.  Note that the Harmony namespace does not have date or version
number.  But then the project is official over so the vocabulary shouldbe
frozen.


Section 3.3.2
-------------

I think I agree: SIMILE should adopt the Harmony approach unchanged if
possible as this aids interaction with other systems.


Section 4
---------

See comment about mainatining a single database of history.
Received on Wednesday, 7 May 2003 11:37:42 UTC