Re: dataset syntax metadata from Lee Feigenbaum on 2012-09-26 (public-rdf-wg@w3.org from September 2012)

From: Lee Feigenbaum <lee@thefigtrees.net>
Date: Wed, 26 Sep 2012 12:50:48 -0400
To: Sandro Hawke <sandro@w3.org>
CC: W3C RDF WG <public-rdf-wg@w3.org>
Message-ID: <50633268.5040606@thefigtrees.net>
I'm not sure if this is at all helpful input, but here's how we handle 
metadata -- in general -- in Anzo. Pat, you may avert your eyes because 
the semantics are inconsistent at best.

A couple of "regular" named graphs

<p1> { <p1> a ex:Person ; foaf:name "Lee" ...  }
<p2> { <p2> a ex:Person ; foaf:name "Lynn" ... }

Named graphs have corresponding "metadata" graphs

<mdg1> { <mdg1> a anzo:MetadataGraph . <p1> a anzo:NamedGraph ; 
anzo:hasMetadataGraph <mdg1> ; anzo:createdBy ... ; anzo:lastModifiedBy 
... ; anzo:lastModifiedAt ... ; ... }
<mdg2> { <mdg2> a anzo:MetadataGraph . <p2> a anzo:NamedGraph ; 
anzo:hasMetadataGraph <mdg2> ; anzo:createdBy ... ; anzo:lastModifiedBy 
... ; anzo:lastModifiedAt ... ; ... }

We also have first-class datasets, that are represented roughly like:

<ds1> { <ds1> a anzo:Dataset ; anzo:hasDefaultGraph <p1> ; 
anzo:hasNamedGraph <p1>, <p2> }

Of course, <ds1> is also a regular named graph, so there's a 
corresponding metadata graph with metadata about the dataset:

<mdg3> { <mdg3> a anzo:MetadataGraph . <ds1> a anzo:NamedGraph ; 
anzo:hasMetadataGraph <mdg3> ; anzo:createdBy ... ; anzo:lastModifiedBy 
... ; anzo:lastModifiedAt ... ; ... }

Among other things, we use these datasets directly within SPARQL by 
extending SPARQL with a FROM DATASET clause:

SELECT ...
FROM DATASET <ds1>
WHERE { ... }

...which would be equivalent in this example to

SELECT ...
FROM <p1>
FROM NAMED <p1>
FROM NAMED <p2>
WHERE { ... }

When we import TriG, we generally are just doing either a replace or an 
add on the data in the named graphs in the TriG file. We generally don't 
automatically create anzo:Dataset's based on the contents of a 
particular TriG file. Instead, if we were exporting and then importing a 
dataset, we'd just include the <ds1> graph in our export so we'd have it 
back again in an import in the future.

Regarding your question (a), Sandro, you can always find the metadata 
graph for a particular graph (including a dataset graph) simply by 
querying for the anzo:hasMetadataGraph triple.

Anyway, for what it's worth.

Lee

On 9/26/2012 8:53 AM, Sandro Hawke wrote:
> I'm surprised at some of the responses about the metadata questions in 
> my "Dataset Syntax - checking for consensus" email [1].
>
> When people publish RDF for real, don't they usually put some triples 
> in it which indicates who created it, when it was created, and maybe 
> why?   Maybe some folks don't do this, but many people consider this 
> an essential practice.   My sense is that every computer format either 
> has a metadata mechanism built into it, or one somehow gets hacked in 
> later (like the javadoc conventions). In a few cases (like the Adobe 
> formats) that metadata is expressed in RDF.
>
> When people publish an RDF dataset, aren't they going to want to do 
> the same thing?
>
> Yes, sometimes you can just throw that metadata into a named graph, 
> but what if (a) you don't get a chance to tell the consumer which 
> named graph you put it in, and (b) some named graphs are 
> opaque/untrustred, perhaps because they contain old information or 
> information from other souces (eg a Web Crawl).    (While these might 
> not be the cases you work with, it seems to me they'll be quite common 
> if this syntax ever catches on.)
>
> Folks who are not convinced we need a metadata mechanism -- how do you 
> imagine solving this problem?  How can someone reading a serialized 
> dataset figure out which triples are the metadata?
>
>       -- Sandro
>
>
>
> [1] http://lists.w3.org/Archives/Public/public-rdf-wg/2012Sep/0249.html
>
>
Received on Wednesday, 26 September 2012 16:51:16 UTC