RE: Open Items from SIMILE/History System Call Today from Seaborne, Andy on 2003-05-08 (www-rdf-dspace@w3.org from May 2003)

From: Seaborne, Andy <Andy_Seaborne@hplb.hpl.hp.com>
Date: Thu, 8 May 2003 18:41:10 +0100
To: "'jason_kinner@dynamicdigitalmedia.com'" <jason_kinner@dynamicdigitalmedia.com>, "'www-rdf-dspace@w3.org'" <www-rdf-dspace@w3.org>
Message-ID: <5E13A1874524D411A876006008CD059F06C236BC@0-mail-1.hpl.hp.com>
Jason,

> 3. It was offered that the History System might apply the RDF-Schema that 
> defines the object model to perform inferencing on-write, meaning that 
> subproperty values and their parent property values would be stored
side-by-
> side.  This may have an advantage if the query engine (likely to be Joseki
in 
> this case) does not support inferencing on its own.  I'd like to ask for 
> comments on whether this may be a good idea.

Just a point of clarification: Joseki does expose inference in Jena2 (you
need Joseki2 - the released Joseki1 works with Jena1).

Given this, the list a-d is not visible to external clients but is a
time/space tradeoff at the server.  When the schema is changing a lot
(development time), then making late (runtime) inferences is natural.  When
the schema is stable, the server may wish to use space to improve runtime
CPU costs.  In other words, it is not a fixed choice but can change in time;
the external functional appearance should be the same.

In a production system, the schema should be versioned (its namespace
change) if the meaning changes, except, possibly for additions that do not
influence the existing part of the schema.  A new property/class will then
say whether it is the same as the old one.  Interpretation of the data with
respect to the schema can then be controlled to be the old or the new, or
the "current" one.  Different parts of SIMILE may require different policies
at different times.

The choice suggested, inference-on-write, I take to be to expand all the
inferred triples during the ingestion process.  This form is fastest and
largest.  There are some half-way forms, such as expanding the
class/property hierarchy but leaving the data alone, that gain some in the
time dimension for rather less space.  

I would suggest that intially (prototypes/demos) either all inference is
done late at access time and the ground data and vocabularies are used
unmodified, or the suggested inference-on-ingest is done which can be used
in an inference-less runtime environment.  The design should cope with this
changing over time.

Both these are simple to implement and clear.  Other tradeoffs can wait
until beyond prototypes so we have sufficient data to do performance
experiments.

[[Whatever is decided here, I hope that all data ingested is kept somewhere
exactly as received even if the SIMILE system works on a processed form.]]

	Andy


----- Detail for clarity:

The stack is, simplified:

    RDF NetAPI
    Query (RDQL, SPO [a simple triples access language]) in the Model API
abstraction
    Graph pattern matching
    Inference system (may be null)
    Storage

Query and inference are separate issues - all inference for data-level
access is done through a generic interface of access to RDF statements.
Those statemens may be inferred or may be ground facts.  [There are
specialised use cases for query controlliong inference rules in part queries
- RQL can do this - wanting to know if X is a direct subclass of Y  Needed
more for valiadting the data or accessing the schema than application use].

Example:

<p>  rdfs:subPropertyOf <q> .

<x>  <p> "foo" .

then the statement

<a>  <q> "foo" .

is also seen in the RDF model.  The schema may be in the same model, or a
different one bound to the first.  Some inference services, like validation,
can not be performed directly like this but query/runtime data-level access
is this way.  This is aside from testing the values of literals.

	Andy

-----Original Message-----
From: jason_kinner@dynamicdigitalmedia.com
[mailto:jason_kinner@dynamicdigitalmedia.com] 
Sent: 8 May 2003 16:38
To: www-rdf-dspace@w3.org
Subject: Open Items from SIMILE/History System Call Today



All -

We had a good discussion about the initial draft of the History System 
descriptive note.  Thanks to everyone who participate, by email or by phone.

Other than simple errors and omissions, the following topics remain open:

1. The DSpace system uses CNRI Handles to identify certain objects.  Given
that 
Handles are generally useful for refering to resources, should the Handle
refer 
to the current version of the resource, or to the version current at time of

creation?  Should Handles be used universally to refer to all objects or
just 
those actually retrievable through resolving the Handle?

2. When referring to items that are referenced in the current History System

output using a database identifier (typically an integer), how should the 
revised History System refer to them?  The descriptive note recommends URIs
in 
order to capture metadata about the item, but a few ideas were thrown around

during the call:

  a. GUIDs - Every item gets a GUID instead of a database ID
  b. URNs - Keep the database ID, but make it part of a URN (e.g. - 
urn:bistream:555)
  c. URLs - If the resource is accessible via a URL, use the URL
  d. Handles - See #1, above

3. It was offered that the History System might apply the RDF-Schema that 
defines the object model to perform inferencing on-write, meaning that 
subproperty values and their parent property values would be stored side-by-
side.  This may have an advantage if the query engine (likely to be Joseki
in 
this case) does not support inferencing on its own.  I'd like to ask for 
comments on whether this may be a good idea.  Some observations:

  a. The data stored would not be resilient to changes in the schema,
requiring 
a "rebuild" if the schema changes.
  b. The query engine would not need to be schema-aware at all.
  c. The query engine would be isolated from changes in the schema.
  d. The stored RDF models would also be isolated from changes in the
schema.

An issue in #3 is whether isolating stored metadata, which was created in
the 
context of one version of the schema, /should/ inherit changes to the
schema.  
Mark Butler made several valid RDF processing points in a prior post, and
that 
information should probably play into this discussion.

Thanks again for your feedback.  I will be working on another draft for
early 
next week.

Regards,

Jason Kinner
Dynamic Digital Media, LLC
856.296.5711 (mobile)
215.243.7377 (phone)
jason_kinner@dynamicdigitalmedia.com
Received on Thursday, 8 May 2003 13:41:30 UTC