- From: Seaborne, Andy <Andy_Seaborne@hplb.hpl.hp.com>
- Date: Thu, 8 May 2003 18:41:10 +0100
- To: "'jason_kinner@dynamicdigitalmedia.com'" <jason_kinner@dynamicdigitalmedia.com>, "'www-rdf-dspace@w3.org'" <www-rdf-dspace@w3.org>
Jason, > 3. It was offered that the History System might apply the RDF-Schema that > defines the object model to perform inferencing on-write, meaning that > subproperty values and their parent property values would be stored side-by- > side. This may have an advantage if the query engine (likely to be Joseki in > this case) does not support inferencing on its own. I'd like to ask for > comments on whether this may be a good idea. Just a point of clarification: Joseki does expose inference in Jena2 (you need Joseki2 - the released Joseki1 works with Jena1). Given this, the list a-d is not visible to external clients but is a time/space tradeoff at the server. When the schema is changing a lot (development time), then making late (runtime) inferences is natural. When the schema is stable, the server may wish to use space to improve runtime CPU costs. In other words, it is not a fixed choice but can change in time; the external functional appearance should be the same. In a production system, the schema should be versioned (its namespace change) if the meaning changes, except, possibly for additions that do not influence the existing part of the schema. A new property/class will then say whether it is the same as the old one. Interpretation of the data with respect to the schema can then be controlled to be the old or the new, or the "current" one. Different parts of SIMILE may require different policies at different times. The choice suggested, inference-on-write, I take to be to expand all the inferred triples during the ingestion process. This form is fastest and largest. There are some half-way forms, such as expanding the class/property hierarchy but leaving the data alone, that gain some in the time dimension for rather less space. I would suggest that intially (prototypes/demos) either all inference is done late at access time and the ground data and vocabularies are used unmodified, or the suggested inference-on-ingest is done which can be used in an inference-less runtime environment. The design should cope with this changing over time. Both these are simple to implement and clear. Other tradeoffs can wait until beyond prototypes so we have sufficient data to do performance experiments. [[Whatever is decided here, I hope that all data ingested is kept somewhere exactly as received even if the SIMILE system works on a processed form.]] Andy ----- Detail for clarity: The stack is, simplified: RDF NetAPI Query (RDQL, SPO [a simple triples access language]) in the Model API abstraction Graph pattern matching Inference system (may be null) Storage Query and inference are separate issues - all inference for data-level access is done through a generic interface of access to RDF statements. Those statemens may be inferred or may be ground facts. [There are specialised use cases for query controlliong inference rules in part queries - RQL can do this - wanting to know if X is a direct subclass of Y Needed more for valiadting the data or accessing the schema than application use]. Example: <p> rdfs:subPropertyOf <q> . <x> <p> "foo" . then the statement <a> <q> "foo" . is also seen in the RDF model. The schema may be in the same model, or a different one bound to the first. Some inference services, like validation, can not be performed directly like this but query/runtime data-level access is this way. This is aside from testing the values of literals. Andy -----Original Message----- From: jason_kinner@dynamicdigitalmedia.com [mailto:jason_kinner@dynamicdigitalmedia.com] Sent: 8 May 2003 16:38 To: www-rdf-dspace@w3.org Subject: Open Items from SIMILE/History System Call Today All - We had a good discussion about the initial draft of the History System descriptive note. Thanks to everyone who participate, by email or by phone. Other than simple errors and omissions, the following topics remain open: 1. The DSpace system uses CNRI Handles to identify certain objects. Given that Handles are generally useful for refering to resources, should the Handle refer to the current version of the resource, or to the version current at time of creation? Should Handles be used universally to refer to all objects or just those actually retrievable through resolving the Handle? 2. When referring to items that are referenced in the current History System output using a database identifier (typically an integer), how should the revised History System refer to them? The descriptive note recommends URIs in order to capture metadata about the item, but a few ideas were thrown around during the call: a. GUIDs - Every item gets a GUID instead of a database ID b. URNs - Keep the database ID, but make it part of a URN (e.g. - urn:bistream:555) c. URLs - If the resource is accessible via a URL, use the URL d. Handles - See #1, above 3. It was offered that the History System might apply the RDF-Schema that defines the object model to perform inferencing on-write, meaning that subproperty values and their parent property values would be stored side-by- side. This may have an advantage if the query engine (likely to be Joseki in this case) does not support inferencing on its own. I'd like to ask for comments on whether this may be a good idea. Some observations: a. The data stored would not be resilient to changes in the schema, requiring a "rebuild" if the schema changes. b. The query engine would not need to be schema-aware at all. c. The query engine would be isolated from changes in the schema. d. The stored RDF models would also be isolated from changes in the schema. An issue in #3 is whether isolating stored metadata, which was created in the context of one version of the schema, /should/ inherit changes to the schema. Mark Butler made several valid RDF processing points in a prior post, and that information should probably play into this discussion. Thanks again for your feedback. I will be working on another draft for early next week. Regards, Jason Kinner Dynamic Digital Media, LLC 856.296.5711 (mobile) 215.243.7377 (phone) jason_kinner@dynamicdigitalmedia.com
Received on Thursday, 8 May 2003 13:41:30 UTC