Re: dataset syntax metadata from Sandro Hawke on 2012-09-26 (public-rdf-wg@w3.org from September 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 26 Sep 2012 10:37:25 -0400
To: Andy Seaborne <andy.seaborne@epimorphics.com>
CC: public-rdf-wg@w3.org
Message-ID: <50631325.3010407@w3.org>

On 09/26/2012 10:09 AM, Andy Seaborne wrote:
>
>
> On 26/09/12 13:53, Sandro Hawke wrote:
>> I'm surprised at some of the responses about the metadata questions in
>> my "Dataset Syntax - checking for consensus" email [1].
>>
>> When people publish RDF for real, don't they usually put some triples in
>> it which indicates who created it, when it was created, and maybe why?
>> Maybe some folks don't do this, but many people consider this an
>> essential practice.   My sense is that every computer format either has
>> a metadata mechanism built into it, or one somehow gets hacked in later
>> (like the javadoc conventions).  In a few cases (like the Adobe formats)
>> that metadata is expressed in RDF.
>
> We have RDF -  it can already express metadata!
>
>> When people publish an RDF dataset, aren't they going to want to do the
>> same thing?
>
> Dunno - maybe they are just putting a collection of graphs on the web 
> and linking to it (e.g. N-Quads dumps).
>
> The "what it is" and "where it came from" is out-of-band e.g. on the 
> web page linking to the file.
>

My understanding is that in many situations, embedded metadata (in 
contrast to metadata that has to be maintained elsewhere) has proven its 
value enough to be considered an absolute requirement.


>> Yes, sometimes you can just throw that metadata into a named graph, but
>> what if (a) you don't get a chance to tell the consumer which named
>> graph you put it in, and (b) some named graphs are opaque/untrustred,
>> perhaps because they contain old information or information from other
>> souces (eg a Web Crawl).    (While these might not be the cases you work
>> with, it seems to me they'll be quite common if this syntax ever catches
>> on.)
>>
>> Folks who are not convinced we need a metadata mechanism -- how do you
>> imagine solving this problem?  How can someone reading a serialized
>> dataset figure out which triples are the metadata?
>
> Can't they look for it with a query?
>
> SELECT * { GRAPH ?g { :s rdf:type :metadataRecord } }
>

No, because (in case (b) above) there might be some obsolete or 
incorrect metadataRecords in some of the data being managed.

> although the unnamed graph is a good place to put it IMO.
>
> Just don't invent a fixed name for the metagraph.
>

The Giant Global Graph?  :-)

I think you're saying not to use something like:

      <http://www.w3.org/ns/metagraph> { ... metadata here ... }


That hadn't even occurred to me, and I don't really like it.

I think it would be better than nothing, though -- it would at least 
address the use case of a client just given a dataset figuring out how 
the dataset was intended to be used.    If the group does NOT provide a 
standard metadata mechanism, this might end up being the best option in 
the community, sadly, since at least it minimizes any kind of conflict 
or misunderstanding.

       -- Sandro

>     Andy
>
>
>

Received on Wednesday, 26 September 2012 14:37:40 UTC