- From: Seaborne, Andy <Andy_Seaborne@hplb.hpl.hp.com>
- Date: Wed, 7 May 2003 16:37:17 +0100
- To: "'SIMILE public list'" <www-rdf-dspace@w3.org>
[1] http://web.mit.edu/simile/www/resources/history-harmony/history-statement-of -work.htm [2] http://web.mit.edu/simile/www/documents/historySystem/descriptiveNote/descri ptiveNote.pdf Comments on "DSpace History System Descriptive Note" version of 1/May/2003 [2] ============================================================================ == Andy Seaborne 7 May2003 Overall Comments ---------------- 1/ Some use cases would be a good idea to give more detail to the recommendations for changes and explain where the existing DSpace History mechanism approach can continue as is. Specifically, a sample of queries that need to be supported would be useful. 2/ RDFSchema A better term is "vocabulary" as an RDF Schema isn't like an XML schema. An RDF Schema (vocabulary) is a collection of definitions of classes and properties. It MAY be used for validation but that is not the sole intention. Strictly, RDF can never be "invalid" at the RDF level (open work assumption - there is one corner case in datatypes which does allow a contradiction) but an application may impose an additional rule that it wishes to ensure that only terms from known vocabularies are used. Presumably this is the case for the history system - no additional annotation or other data is allowed. That is a design choice to be made (actually, this is not clear as later the document discusses annotation with HTML). 3/ The history store should be a single database to allow searching across all records, not the current database of indeices to serializations in a file. Section 1.2 ----------- "The current history system uses RDF as a model for generating XML that is stored in order to track the history of a managed item" I'm quite sure what this means - this section seems to use "model" in two senses - "RDF model" (a collection of statements - a term used in M&S 1999 and now being downplayed in the latest specs) and "model" as in "to design". The use of XML is surely just the syntax in which the RDF is stored. The same RDF graph can be encoded in XML is more than one way. Section 1.3.1 ------------- "For query purposes, values should be simple types, and RDF resources should be used whenever applicable." I am not sure why we are restricting the RDF here. Some values are not simple types (example from Dublin Core: people's names). A query might be "find all books [a type] with author family name 'Rowling'". Better would be to adopt whatever is the practice of the vocabularies and domains employed. "RDF resources" : I don't understand this part of the sentence. Section 3 --------- Use cases would help ground the choices. I presume that the history store will become a single database so that queries can be effectively made over many records. As I understand it, at the moment, this is not the case. SIMILE may still wish to retain a record-per-file for management and tracking of the history store - the main storage should be the database. (aside: the examples seem to be Jena. Namespace names are not preserved so "dc:" for Dublin Core is used and retained to aid readability). Section: 3.1.1 -------------- Agreed: SIMILE should use standard vocabularies, not import them. It may wish to add or refine properties/classes as well. Section 3.1.2 ------------- "Duplicate Properties" : confusing title. Its not multiple statements on the same resource with the same property; its multiple definitions of the same concept. Use of standard vocabularies means unique URIs : the same short name has different URIs. Dublin Core: http://purl.org/dc/terms/hasPart I could not find hasPart in Harmony ABC: http://metadata.net/harmony/ABC/ABC.rdfs I did find http://metadata.net/harmony#isPartOf. [The namespace URI http://metadata.net/harmony# does not lead to a vocabulary: the vocabulary definition is at URL http://metadata.net/harmony/ABC/ABC.rdfs] We should define when two properties are the same : i.e. they are subProperties of each other (RDFS) or owl:sameAs. This also raises the issue of RDFS-level inference. One possibility is to run a forward-chainer over data input once and not do run-time inference. The choice will depend on the complexity of the vocabularies and cross term mappings. Section 3.1.3 ------------- "An RDF Resource is the highest level of abstraction within RDF and can represent any concrete or abstract resource." I don't understand this. - it seems to be 'resources represent resources'. I agree that type information (by which I mean use of rdf:type) would be good but that does not mean query is more difficult. Query can find things by some defining property (e.g. book by ISBN). A resource can have multiple types in RDF. I agree that URIs should be opaque. "After applying recommendation" example is wrong 1- its not valid XML. 2- rdf:Collection is not a valid type defined in the RDF namespace (rdf:parseType="Collection" is an RDF/XML syntax device for things of type rdf:List). Section 3.1.4 ------------- Agree. The property value is the empty string. That is the RDF definition of <x></x>. Section 3.2.1 ------------- Agree: Use Dublin Core as defined. Dublin Core qualifiers are sub properties of the core terms. Either use query rewriting (yuk!) or simple inference (forward or backward). Section 3.2.2 ------------- Agree: Use RDF mechanisms. Keep URIs opaque. However, there is another mechanism to consider if there is an open ended set of bit_stream_logo; use bNodes and identifying properties (this may be closer to the Dspace History design): <rdf:Description rdf:about="http://www.dspace.org/collection/1721/113"> <ex:logo_bitstream> <ex:BitStreamLogo> <ex:bit_stream_logo_id>202</ex:bit_stream_logo_id> </ex:BitStreamLogo> </ex:logo_bitstream> </rdf:Description> or in N3: <http://www.dspace.org/collection/1721/113> ex:logo_bitstream [ rdf;type ex:BitStreamLogo ; ex:bit_stream_logo_id "202" ] . This also gives a rdf:type to the bNode. Section 3.2.3 ------------- Agreed. And the History system becomes one database doesn't it (needed for searching)? Section 3.2.4 ------------- Not sure about this. I would like to understand the use case for annotation here. Use of XML literals/XHTML may be possible but would mandate XHTML which is probably not sensible. Need to decide on what is being stored: (X)HTML document or fragements? If fragments, what context is ued to display it (style sheet etc)?. Also: can use CDATA instead of escaping HTML. Section 3.3.1 ------------- Agreed. Note that the Harmony namespace does not have date or version number. But then the project is official over so the vocabulary shouldbe frozen. Section 3.3.2 ------------- I think I agree: SIMILE should adopt the Harmony approach unchanged if possible as this aids interaction with other systems. Section 4 --------- See comment about mainatining a single database of history.
Received on Wednesday, 7 May 2003 11:37:42 UTC