- From: Sandro Hawke <sandro@w3.org>
- Date: Wed, 26 Sep 2012 14:05:22 -0400
- To: Lee Feigenbaum <lee@thefigtrees.net>
- CC: RDF WG <public-rdf-wg@w3.org>
On 09/26/2012 01:54 PM, Lee Feigenbaum wrote: > On 9/26/2012 1:48 PM, Sandro Hawke wrote: >> On 09/26/2012 12:50 PM, Lee Feigenbaum wrote: >>> I'm not sure if this is at all helpful input, but here's how we handle >>> metadata -- in general -- in Anzo. Pat, you may avert your eyes >>> because the semantics are inconsistent at best. >>> >> >> :-) Thanks for the details... >> >>> A couple of "regular" named graphs >>> >>> <p1> { <p1> a ex:Person ; foaf:name "Lee" ... } >>> <p2> { <p2> a ex:Person ; foaf:name "Lynn" ... } >>> >>> Named graphs have corresponding "metadata" graphs >>> >>> <mdg1> { <mdg1> a anzo:MetadataGraph . <p1> a anzo:NamedGraph ; >>> anzo:hasMetadataGraph <mdg1> ; anzo:createdBy ... ; >>> anzo:lastModifiedBy ... ; anzo:lastModifiedAt ... ; ... } >>> <mdg2> { <mdg2> a anzo:MetadataGraph . <p2> a anzo:NamedGraph ; >>> anzo:hasMetadataGraph <mdg2> ; anzo:createdBy ... ; >>> anzo:lastModifiedBy ... ; anzo:lastModifiedAt ... ; ... } >>> >>> We also have first-class datasets, that are represented roughly like: >>> >>> <ds1> { <ds1> a anzo:Dataset ; anzo:hasDefaultGraph <p1> ; >>> anzo:hasNamedGraph <p1>, <p2> } >>> >>> Of course, <ds1> is also a regular named graph, so there's a >>> corresponding metadata graph with metadata about the dataset: >>> >>> <mdg3> { <mdg3> a anzo:MetadataGraph . <ds1> a anzo:NamedGraph ; >>> anzo:hasMetadataGraph <mdg3> ; anzo:createdBy ... ; >>> anzo:lastModifiedBy ... ; anzo:lastModifiedAt ... ; ... } >>> >>> Among other things, we use these datasets directly within SPARQL by >>> extending SPARQL with a FROM DATASET clause: >>> >>> SELECT ... >>> FROM DATASET <ds1> >>> WHERE { ... } >>> >>> ...which would be equivalent in this example to >>> >>> SELECT ... >>> FROM <p1> >>> FROM NAMED <p1> >>> FROM NAMED <p2> >>> WHERE { ... } >>> >>> When we import TriG, we generally are just doing either a replace or >>> an add on the data in the named graphs in the TriG file. We generally >>> don't automatically create anzo:Dataset's based on the contents of a >>> particular TriG file. Instead, if we were exporting and then importing >>> a dataset, we'd just include the <ds1> graph in our export so we'd >>> have it back again in an import in the future. >>> >>> Regarding your question (a), Sandro, you can always find the metadata >>> graph for a particular graph (including a dataset graph) simply by >>> querying for the anzo:hasMetadataGraph triple. >>> >> >> What if I put some anzo:hasMetadataGraph triples in my [other-vendor] >> SPARQL system, then told Anzo to incorporate that data into my corporate >> processing system. That could really confuse the system, right? > > It could. I think we block that and some other system-managed > predicates at import. But really, unless it's malicious, there's no > cause for someone to do that. (So we protect against the malicious > case, and don't concern ourselves with the incidental case that is > highly unlikely.) > >> In your commercial environment I guess that's not a big problem -- you >> can just say "well, don't do that!". Or do you support the idea of >> from-the-wild data feeds, which are then filtered and queried? What if >> some of those accidentally or maliciously had hasMetadataGraph triples >> in them? > > I guess I already answered this -- we protect against the malicious > case and the accidental case just... doesn't happen. It's the social > benefits of naming things with URIs -- you can be pretty sure that if > two people are using the same URI in good faith that they mean the > same thing. > Right, but... > Lee > >> I suppose you could block those on import, but that >> wouldn't work for other use cases, where you're trying to exchange >> datasets with metadata. > For instance, exchanging the results of a web crawl. That might quite reasonably, non-maliciously contain hasMetadataGraph triples inside the graphs; meanwhile, the crawler needs to communicate metadata to the client. This is where we need @meta or use-the-default-graph or something, yes? -- Sandro >> >>> Anyway, for what it's worth. >>> >> >> It is nice to be grounded in reality. Plus, Anzo is cool. >> >> - s >> >>> Lee >>> >>> On 9/26/2012 8:53 AM, Sandro Hawke wrote: >>>> I'm surprised at some of the responses about the metadata questions >>>> in my "Dataset Syntax - checking for consensus" email [1]. >>>> >>>> When people publish RDF for real, don't they usually put some triples >>>> in it which indicates who created it, when it was created, and maybe >>>> why? Maybe some folks don't do this, but many people consider this >>>> an essential practice. My sense is that every computer format >>>> either has a metadata mechanism built into it, or one somehow gets >>>> hacked in later (like the javadoc conventions). In a few cases (like >>>> the Adobe formats) that metadata is expressed in RDF. >>>> >>>> When people publish an RDF dataset, aren't they going to want to do >>>> the same thing? >>>> >>>> Yes, sometimes you can just throw that metadata into a named graph, >>>> but what if (a) you don't get a chance to tell the consumer which >>>> named graph you put it in, and (b) some named graphs are >>>> opaque/untrustred, perhaps because they contain old information or >>>> information from other souces (eg a Web Crawl). (While these might >>>> not be the cases you work with, it seems to me they'll be quite >>>> common if this syntax ever catches on.) >>>> >>>> Folks who are not convinced we need a metadata mechanism -- how do >>>> you imagine solving this problem? How can someone reading a >>>> serialized dataset figure out which triples are the metadata? >>>> >>>> -- Sandro >>>> >>>> >>>> >>>> [1] >>>> http://lists.w3.org/Archives/Public/public-rdf-wg/2012Sep/0249.html >>>> >>>> >>> >>> >> >> >
Received on Wednesday, 26 September 2012 18:05:31 UTC