Re: dataset syntax metadata from Lee Feigenbaum on 2012-09-26 (public-rdf-wg@w3.org from September 2012)

From: Lee Feigenbaum <lee@thefigtrees.net>
Date: Wed, 26 Sep 2012 13:54:22 -0400
To: Sandro Hawke <sandro@w3.org>, RDF WG <public-rdf-wg@w3.org>
Message-ID: <5063414E.6000407@thefigtrees.net>
On 9/26/2012 1:48 PM, Sandro Hawke wrote:
> On 09/26/2012 12:50 PM, Lee Feigenbaum wrote:
>> I'm not sure if this is at all helpful input, but here's how we handle
>> metadata -- in general -- in Anzo. Pat, you may avert your eyes
>> because the semantics are inconsistent at best.
>>
>
> :-)  Thanks for the details...
>
>> A couple of "regular" named graphs
>>
>> <p1> { <p1> a ex:Person ; foaf:name "Lee" ...  }
>> <p2> { <p2> a ex:Person ; foaf:name "Lynn" ... }
>>
>> Named graphs have corresponding "metadata" graphs
>>
>> <mdg1> { <mdg1> a anzo:MetadataGraph . <p1> a anzo:NamedGraph ;
>> anzo:hasMetadataGraph <mdg1> ; anzo:createdBy ... ;
>> anzo:lastModifiedBy ... ; anzo:lastModifiedAt ... ; ... }
>> <mdg2> { <mdg2> a anzo:MetadataGraph . <p2> a anzo:NamedGraph ;
>> anzo:hasMetadataGraph <mdg2> ; anzo:createdBy ... ;
>> anzo:lastModifiedBy ... ; anzo:lastModifiedAt ... ; ... }
>>
>> We also have first-class datasets, that are represented roughly like:
>>
>> <ds1> { <ds1> a anzo:Dataset ; anzo:hasDefaultGraph <p1> ;
>> anzo:hasNamedGraph <p1>, <p2> }
>>
>> Of course, <ds1> is also a regular named graph, so there's a
>> corresponding metadata graph with metadata about the dataset:
>>
>> <mdg3> { <mdg3> a anzo:MetadataGraph . <ds1> a anzo:NamedGraph ;
>> anzo:hasMetadataGraph <mdg3> ; anzo:createdBy ... ;
>> anzo:lastModifiedBy ... ; anzo:lastModifiedAt ... ; ... }
>>
>> Among other things, we use these datasets directly within SPARQL by
>> extending SPARQL with a FROM DATASET clause:
>>
>> SELECT ...
>> FROM DATASET <ds1>
>> WHERE { ... }
>>
>> ...which would be equivalent in this example to
>>
>> SELECT ...
>> FROM <p1>
>> FROM NAMED <p1>
>> FROM NAMED <p2>
>> WHERE { ... }
>>
>> When we import TriG, we generally are just doing either a replace or
>> an add on the data in the named graphs in the TriG file. We generally
>> don't automatically create anzo:Dataset's based on the contents of a
>> particular TriG file. Instead, if we were exporting and then importing
>> a dataset, we'd just include the <ds1> graph in our export so we'd
>> have it back again in an import in the future.
>>
>> Regarding your question (a), Sandro, you can always find the metadata
>> graph for a particular graph (including a dataset graph) simply by
>> querying for the anzo:hasMetadataGraph triple.
>>
>
> What if I put some anzo:hasMetadataGraph triples in my [other-vendor]
> SPARQL system, then told Anzo to incorporate that data into my corporate
> processing system.   That could really confuse the system, right?

It could. I think we block that and some other system-managed predicates 
at import. But really, unless it's malicious, there's no cause for 
someone to do that. (So we protect against the malicious case, and don't 
concern ourselves with the incidental case that is highly unlikely.)

> In your commercial environment I guess that's not a big problem -- you
> can just say "well, don't do that!".    Or do you  support the idea of
> from-the-wild data feeds, which are then filtered and queried?   What if
> some of those accidentally or maliciously had hasMetadataGraph triples
> in them?

I guess I already answered this -- we protect against the malicious case 
and the accidental case just... doesn't happen. It's the social benefits 
of naming things with URIs -- you can be pretty sure that if two people 
are using the same URI in good faith that they mean the same thing.

Lee

> I suppose you could block those on import, but that
> wouldn't work for other use cases, where you're trying to exchange
> datasets with metadata.


>
>> Anyway, for what it's worth.
>>
>
> It is nice to be grounded in reality.   Plus, Anzo is cool.
>
>        - s
>
>> Lee
>>
>> On 9/26/2012 8:53 AM, Sandro Hawke wrote:
>>> I'm surprised at some of the responses about the metadata questions
>>> in my "Dataset Syntax - checking for consensus" email [1].
>>>
>>> When people publish RDF for real, don't they usually put some triples
>>> in it which indicates who created it, when it was created, and maybe
>>> why?   Maybe some folks don't do this, but many people consider this
>>> an essential practice.   My sense is that every computer format
>>> either has a metadata mechanism built into it, or one somehow gets
>>> hacked in later (like the javadoc conventions). In a few cases (like
>>> the Adobe formats) that metadata is expressed in RDF.
>>>
>>> When people publish an RDF dataset, aren't they going to want to do
>>> the same thing?
>>>
>>> Yes, sometimes you can just throw that metadata into a named graph,
>>> but what if (a) you don't get a chance to tell the consumer which
>>> named graph you put it in, and (b) some named graphs are
>>> opaque/untrustred, perhaps because they contain old information or
>>> information from other souces (eg a Web Crawl).    (While these might
>>> not be the cases you work with, it seems to me they'll be quite
>>> common if this syntax ever catches on.)
>>>
>>> Folks who are not convinced we need a metadata mechanism -- how do
>>> you imagine solving this problem?  How can someone reading a
>>> serialized dataset figure out which triples are the metadata?
>>>
>>>       -- Sandro
>>>
>>>
>>>
>>> [1] http://lists.w3.org/Archives/Public/public-rdf-wg/2012Sep/0249.html
>>>
>>>
>>
>>
>
>
Received on Wednesday, 26 September 2012 17:54:53 UTC