W3C home > Mailing lists > Public > public-rdf-wg@w3.org > December 2011

Re: dataset semantics

From: Dan Brickley <danbri@danbri.org>
Date: Mon, 19 Dec 2011 10:04:54 +0100
Message-ID: <CAFNgM+b7-HLwut_J6PZok2VpPZ6cyqctZkqAL7ozuxgwBSJ9cA@mail.gmail.com>
To: Sandro Hawke <sandro@w3.org>
Cc: Pat Hayes <phayes@ihmc.us>, David Wood <david@3roundstones.com>, RDF WG <public-rdf-wg@w3.org>
On 19 December 2011 06:26, Sandro Hawke <sandro@w3.org> wrote:
>> > The other part of the truth conditions has to do with the relationship
>> > between the things named by the label URIs and the graphs they label.
>> >
>> > Unfortunately, I think we need to allow for several possible
>> > relationships there, MAYBE even in the same dataset, which makes things
>> > rather complicated.
>> Blech. Why do we NEED to do this?
> Well, Dan Brickley was arguing for this most strongly.  I wasn't
> convinced.   I was thinking we'd show how to do even this, then maybe
> simplify if it turns out not to be needed.    That may be a bad tactic,
> since it's much easier to add functionality later than to remove it.

Hmm, now I ought to remember that.  What I do remember is saying that
at least two idioms are very common and useful.

(you may have nice names for these)
1. I'm running a SPARQL store and I use GRAPH URI such as
<http://danbri.org/foaf.rdf> as a URI label for a graph whose contents
are results of a recent fetch of that data.
2. I do the same, but I'm concerned to keep a tighter papertrail and
clearer logs, so I use different transaction ID and graph for each
time I 'ask the Web'; these might be uuid or http-based somehow, but
each lookup gets a different archived named graph, with its own URI.
3. I do (2.) but also keep my
where-and-how-I-got-it-and-what-URI-I-gave-it in a systematic fashion;
in terms of vocabulary and also where I find this 'table of contents'

My strongest opinion is that pattern (1.) is both common and also
commonly inadequate; different REST consultations very commonly give
different results. Related, nobody has yet claimed that log:semantics
in N3 should be a functional property; in other words, different GET /
REST lookup results are naturally part of our world.

Now we could force everyone to manage things in style (2.). That would
work, but it would be hugely disruptive and there's a good chance a
lot of folk would ignore us and keep using the URI they fetched it
from as the graph URI that labels the stored graph.

Since both are useful and neither is likely to win, I can't see a path
other than allowing both idioms. Am I missing something?


>> > One example of the relationship is what I called graphState in a
>> > different thread.  In that case, the dataset being true would imply that
>> > for each <U,G> in the dataset, the state of the resource U is the graph
>> > G.   (Here, I mean "state" and "resource" in exactly the REST sense.)
>> And that this graph is true? Ie, is the graph itself asserted when the dataset is asserted?
> (No, discussed in another email thread.)
>> > Another example is an out of date version of graphState, maybe call it
>> > graphStateWas.  In this case, the dataset being true would imply that
>> > for each <U,G> in the dataset, the state of the resource U is, or used
>> > to be, graph G.
>> Why would we need this? Surely when something is changed, it is no longer asserting what it did before the change. That is kind of the point of allowing change, seems to me.
> I might not have framed that quite right.  I was aiming for this:
> I fetch some resources, draw some conclusions, and want to be able to
> publish them along with references to my sources.  Since I know the
> sources can change, I want to make copies of all of them.  Then I'll
> refer to the original sources, indicating the time I accessed them, and
> pointing the copy I'm maintaining of them, as I saw them at the time.
> (Or it could be a copy some shared service is maintaining, like
> archive.org.)
>> >
>> > Another example of the relationship is something I gather Cambridge
>> > Semantics uses, which I'll call subjectOf.   (In one of their deployment
>> > modes, triples are divided into two type, which I'll call A and B, based
>> > on which predicate they use.  The dataset is constructed such that for
>> > each <U, G> in the dataset, every type-A triple in G is of the form
>> > { <U> ?P ?O }.  The type-B triples are a little more complicated.)  In
>> > this case, the dataset being true would imply the dataset being
>> > segmented in this complicated but useful way.
>> With all respect to Cambridge Semantics, if they are the only user of this odd convention, then I really dont think we as a WG should even be considering standardizing it. Unless someone can make a case for why it is going to be generally useful.
>> And in any case, this sounds like a syntactic restriction rather than a semantic condition. Having the dataset be segmented is not going to alter the interpretations of any of the triples (is it?). So the semantics (and hence the entailments) can ignore this.
> Well, if we're putting this whole concept into the subjectOf predicate,
> isn't that considered part of the semantics of that predicate, rather
> like a range restriction.
>> >
>> > It's *rather* tempting to just use triples for this, making graphState,
>> > graphStateWas, subjectOf, etc, be predicates.   That way the semantics
>> > of datasets would be much simpler, with the complications bundled into
>> > the semantics of those particular predicates.
>> >
>> > I'm guess I'm suggesting extending the definition of dataset to be a
>> > default graph and rather than a set of pairs <U,G>, be a set of triples
>> > <U, R, G>, where R is optional.  If R is omitted, you have the kind of
>> > dataset we're used to now, where we have no idea what that relation is
>> > supposed to be (unless the author tells us humans).
>> So I should interpret <U, R, G> to mean that the relation R holds between the resource U and the graph G, and U is *never* simply a name of the graph, is that right? That is we never have the graph  simply being the resource identified by the IRI ?
> Well, if R is "=", then you do.   But you have to say that, explicitly.
>> >
>> >> Can one assert a dataset (ie claim it to be true)?
>> >
>> > Yes.
>> >
>> >> How does one do that?
>> >
>> > The same way you do with RDF.  It kind of depends on your application.
>> > Maybe you publish it on the web; maybe you send it to some agent; maybe
>> > you publish it and send the URL somewhere, etc.
>> And is this in fact done? Do people transmit SPARQL datasets around the Web? What would be a typical transaction involving a dataset? When it is done, what typically happens to the RDF triples in the graphs in the dataset? Do other applications extract them and mash them up with other RDF? Or are they always kept in their dataset 'context'?
> I don't know if anyone is doing this, and I rather doubt anyone is doing
> it in a standardized manner.   Well, there's CKAN.   I haven't look at
> what's dataset-like there.   (Void uses the word "dataset" to mean
> g-box, so it may be a bit hard to tell.)  *shrug*
> I think this kind of work, as with RDF, may need to be done rather
> speculatively, because the benefits to adopting this technology *before*
> it's a standard are so slight.   People exchange datasets inside their
> company and with people they know, but without standards for what it
> means, why would one publish them on the open Web?
> But, yeah, honestly, I have no idea.    I do know a lot of people asked
> for a standard for "Named Graphs", and when I ask what they mean, I
> understand them to be saying they want to be able to, in their RDF,
> refer to other bits of RDF.
>   -- Sandro
>> Pat
>> >
>> >   -- Sandro
>> >
>> >
>> >
>> >
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 19 December 2011 11:02:01 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 22:02:02 UTC