- From: Kevin Smathers <kevin.smathers@hp.com>
- Date: Wed, 25 Jun 2003 14:15:45 -0700
- To: "John S. Erickson" <john.erickson@hp.com>
- Cc: www-rdf-dspace@w3.org
Quads aren't inherently representable in RDF although they can certainly be translated into reified statements. There are a couple of issues here that we've just started exploring in Genesis. First there is the overhead of reifying statements (according to spec), and then the complexity of querying the reified statements. Both are problematic, but if you want your internal reification information to be carried out into the RDF document then you will have to choose some means of representing it. Borrowing from the example of four-statement reification in RDF, Genesis does something similar for larger graphs by changing the direct statements of the graph into indirect statements tied together through a node that gives the combined graph of statements their identity. For well known graphs this works pretty well; the graph can be identified, but the storage overhead is reduced to just a couple of extra statements per graph rather than a couple of extra statements per statement. Our current prototype takes this approach. My own favorite alternative is to make graph identity (or statement identity) equivalent to the wrapper that contains the serialized RDF. Round trip from RDF to internal quads sets the identity element of the quad based on e.g. the filename of the file (or attachment, or PGP signed block of text, etc.) which was read to create those statements. Assuming you can identify the source of the statements, you can recreate the quads. The down side is that if you want each statement to have an identity which can be distinguished from the rest of its graph then you are left with the need to query or otherwise subindex from the source. Another alternative is to abandon round-trip through standard RDF and either add local extensions that represent identity of the statements in the serialized form, or else assign a new identity to the statements after each round-trip. I think there are reasonable arguments to be made for any of these approaches. Cheers, -kls John S. Erickson wrote: >This looks pretty good, Mark! > >Might need to explode the definition of "provenance" in this context --- here >you imply *some* definition of whereItCameFrom and whoAuthoredIt, but there >might in fact be domain-specific definitions of "provenance objects" (i.e. >aggregations of provenance-informing properties that are useful to a >*particular* community). > >John > >----- Original Message ----- >From: "Butler, Mark" <Mark_Butler@hplb.hpl.hp.com> >To: <www-rdf-dspace@w3.org> >Sent: Wednesday, June 25, 2003 11:38 AM >Subject: Provenance for section 3 in technologies.tex > > > > >>Some proposed text to describe metadata provenance in section 3 - any >>comments? >> >>Metadata provenance: One of the key differences between the Semantic Web and >>pre-existing systems is that the Semantic Web relies on using metadata from >>many disparate sources, rather than having a centrally managed store of >>metadata information. This means it is important to consider the provenance >>of the metadata i.e. where it came from and who authored it. This >>information is important because it enables the system processing the >>metadata to make decisions about how to use it, for example if it possesses >>several varying versions of metadata about the same object. In order to >>guarantee provenance it may be necessary to use additional technologies e.g. >>cryptographically ensure that the originator information is correct and that >>the metadata has not been tampered with. Once the metadata has been ingested >>by the system, the system can also make choices about how to represent the >>provenance information e.g. by reifying individual statements or whether >>adopting representations like quads that record the origin of individual >>statements. Note that the usage of the term provenance is quite different to >>its usage in the library community where it is used to refer to the record >>of ownership of the item described by the metadata. >> >> >> >> -- ======================================================== Kevin Smathers kevin.smathers@hp.com Hewlett-Packard kevin@ank.com Palo Alto Research Lab 1501 Page Mill Rd. 650-857-4477 work M/S 1135 650-852-8186 fax Palo Alto, CA 94304 510-247-1031 home ======================================================== use "Standard::Disclaimer"; carp("This message was printed on 100% recycled bits.");
Received on Wednesday, 25 June 2003 17:16:55 UTC