Re: dataset syntax metadata from Sandro Hawke on 2012-09-26 (public-rdf-wg@w3.org from September 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 26 Sep 2012 13:48:43 -0400
To: Lee Feigenbaum <lee@thefigtrees.net>
CC: W3C RDF WG <public-rdf-wg@w3.org>
Message-ID: <50633FFB.8030600@w3.org>
On 09/26/2012 12:50 PM, Lee Feigenbaum wrote:
> I'm not sure if this is at all helpful input, but here's how we handle 
> metadata -- in general -- in Anzo. Pat, you may avert your eyes 
> because the semantics are inconsistent at best.
>

:-)  Thanks for the details...

> A couple of "regular" named graphs
>
> <p1> { <p1> a ex:Person ; foaf:name "Lee" ...  }
> <p2> { <p2> a ex:Person ; foaf:name "Lynn" ... }
>
> Named graphs have corresponding "metadata" graphs
>
> <mdg1> { <mdg1> a anzo:MetadataGraph . <p1> a anzo:NamedGraph ; 
> anzo:hasMetadataGraph <mdg1> ; anzo:createdBy ... ; 
> anzo:lastModifiedBy ... ; anzo:lastModifiedAt ... ; ... }
> <mdg2> { <mdg2> a anzo:MetadataGraph . <p2> a anzo:NamedGraph ; 
> anzo:hasMetadataGraph <mdg2> ; anzo:createdBy ... ; 
> anzo:lastModifiedBy ... ; anzo:lastModifiedAt ... ; ... }
>
> We also have first-class datasets, that are represented roughly like:
>
> <ds1> { <ds1> a anzo:Dataset ; anzo:hasDefaultGraph <p1> ; 
> anzo:hasNamedGraph <p1>, <p2> }
>
> Of course, <ds1> is also a regular named graph, so there's a 
> corresponding metadata graph with metadata about the dataset:
>
> <mdg3> { <mdg3> a anzo:MetadataGraph . <ds1> a anzo:NamedGraph ; 
> anzo:hasMetadataGraph <mdg3> ; anzo:createdBy ... ; 
> anzo:lastModifiedBy ... ; anzo:lastModifiedAt ... ; ... }
>
> Among other things, we use these datasets directly within SPARQL by 
> extending SPARQL with a FROM DATASET clause:
>
> SELECT ...
> FROM DATASET <ds1>
> WHERE { ... }
>
> ...which would be equivalent in this example to
>
> SELECT ...
> FROM <p1>
> FROM NAMED <p1>
> FROM NAMED <p2>
> WHERE { ... }
>
> When we import TriG, we generally are just doing either a replace or 
> an add on the data in the named graphs in the TriG file. We generally 
> don't automatically create anzo:Dataset's based on the contents of a 
> particular TriG file. Instead, if we were exporting and then importing 
> a dataset, we'd just include the <ds1> graph in our export so we'd 
> have it back again in an import in the future.
>
> Regarding your question (a), Sandro, you can always find the metadata 
> graph for a particular graph (including a dataset graph) simply by 
> querying for the anzo:hasMetadataGraph triple.
>

What if I put some anzo:hasMetadataGraph triples in my [other-vendor] 
SPARQL system, then told Anzo to incorporate that data into my corporate 
processing system.   That could really confuse the system, right?

In your commercial environment I guess that's not a big problem -- you 
can just say "well, don't do that!".    Or do you  support the idea of 
from-the-wild data feeds, which are then filtered and queried?   What if 
some of those accidentally or maliciously had hasMetadataGraph triples 
in them?     I suppose you could block those on import, but that 
wouldn't work for other use cases, where you're trying to exchange 
datasets with metadata.


> Anyway, for what it's worth.
>

It is nice to be grounded in reality.   Plus, Anzo is cool.

       - s

> Lee
>
> On 9/26/2012 8:53 AM, Sandro Hawke wrote:
>> I'm surprised at some of the responses about the metadata questions 
>> in my "Dataset Syntax - checking for consensus" email [1].
>>
>> When people publish RDF for real, don't they usually put some triples 
>> in it which indicates who created it, when it was created, and maybe 
>> why?   Maybe some folks don't do this, but many people consider this 
>> an essential practice.   My sense is that every computer format 
>> either has a metadata mechanism built into it, or one somehow gets 
>> hacked in later (like the javadoc conventions). In a few cases (like 
>> the Adobe formats) that metadata is expressed in RDF.
>>
>> When people publish an RDF dataset, aren't they going to want to do 
>> the same thing?
>>
>> Yes, sometimes you can just throw that metadata into a named graph, 
>> but what if (a) you don't get a chance to tell the consumer which 
>> named graph you put it in, and (b) some named graphs are 
>> opaque/untrustred, perhaps because they contain old information or 
>> information from other souces (eg a Web Crawl).    (While these might 
>> not be the cases you work with, it seems to me they'll be quite 
>> common if this syntax ever catches on.)
>>
>> Folks who are not convinced we need a metadata mechanism -- how do 
>> you imagine solving this problem?  How can someone reading a 
>> serialized dataset figure out which triples are the metadata?
>>
>>       -- Sandro
>>
>>
>>
>> [1] http://lists.w3.org/Archives/Public/public-rdf-wg/2012Sep/0249.html
>>
>>
>
>
Received on Wednesday, 26 September 2012 17:48:57 UTC