Re: Web Semantics of Datasets (v0.2)

On Mon, 2011-10-10 at 15:03 +0200, Ivan Herman wrote:
> Sandro,
> 
> I need clarifications on 'R'.
> 
> I presume 'R' is a time interval. 

Well, it's a set of points in time.   I don't think we need to constrain
it more than that.   (That is, we don't need to make it all the points
between a starting time and an ending time, even that's how people will
often model it.)

> Does it mean that for any dataset to be valid, a time interval should be defined for it? I guess we can say that if there is no such 'R' as part of a dataset definition, it is considered to be... undefined? 'All Time'?

All RDF has this kind of implied R, at least when it talks about
real-world stuf, since there is time in the real world.

Right?   Any time you have an RDF graph that makes claims about the real
world, there's an implied or out-of-band set of points in time which the
author and reader understand it to apply to.  When Tim says:

  timbl:i foaf:name "Tim Berners-Lee".

we understand that he means that's his name at some points in time,
roughly "now".  We know it might have been different in the past, and
might be different in the future.   For most purposes, we don't really
care too much exactly when R starts and stops.

As Richard has pointed out, for some applications -- eg government
statisitcal data -- we do care, a lot.  data.gov.uk has been using the
dc:temporal property to convey R, I believe.  (Jeni told me this in a
meeting; I haven't had a chance to look into the details.  It's in-scope
for the Government Linked Data WG to make a Recommendation about this.)

So, pretty much all RDF in practice has an R, and usually it's not
declared and we muddle along.  Sometimes it really needs to be declared
for us to use the data well.   There's no standard way to do that yet.

Datasets, in this Web Semantics proposal, are no different, except that
I think it's important to be more clear about it, because the part of
the real world datasets are talking about is the Web, which machines
interact with directly.  They have a harder time with "roughly now".  So
I think we should be more clear about when, exactly, the datasets named
g-boxes have the given contents. 

> How does this affect deployed datasets that may have G-s that vary in time already, but where there is no such time definition? Should we require SPARQL 1.1 to have a function that returns 'R' for a given dataset?

It would be good to provide ways for dataset providers to publish their
R, but I don't think we should require they do it.   Leaving it implied
is sometimes good enough.

> I wonder whether we can shy away from mentioning time altogether and accept that fact that <N,G> refers to a name for a G-box, ie, to something that can change over time, and our spec remain silent on this...

We can, but I think we would be doing the users a significant
disservice.  There is an observable connection between g-boxes with
dereferenceable names and their contents.  I think we need to make sure
people understand when that observable connection will line up with the
connection shown in datasets.

 -- Sandro

> Ivan
> 
> 
> On Oct 10, 2011, at 13:30 , Sandro Hawke wrote:
> 
> > Here's some revised wording for the proposal, getting a bit closer to
> > spec text.   It's still somewhat informal, and mixing normative and
> > non-normative bits, and best-practice.   And it's not as clear as it
> > should be about handling change over time.
> > 
> >    -- Sandro
> > ===
> >  A dataset D is true iff (1) its default graph is true and (2) for
> >  every pair of <N,G> in D, N names something (a "resource", sometimes
> >  called a "g-box") which, at every time T in R, has G as its current
> >  state.
> > 
> >  It follows from AWWW that if N is an IRI which can be dereferenced,
> >  a successful, correct dereference of N at any time T in R must yield
> >  a serialization ("representation") of G.
> > 
> >  In order to know whether a dereference occurs at a time in R, it is
> >  useful to have R declared in the default graph of D, or in another
> >  nearby, easy-to-find data source.  Where possible, is is helpful to
> >  have R be All Time; that is, having N name a resource whose state,
> >  by definition, never changes.
> > 
> >  In RDF data, N may be used (1) directly, to name the g-box,
> >  expressing things like the license that applies to its state, or who
> >  controls it; and (2) indirectly, to refer to G as the current state
> >  of the g-box.  Indirect reference can be used to express things
> >  about an RDF Graph (a "g-snap"), like that it was the graph some
> >  entity asserted at some time.  Indirection is done in the semantics
> >  of the predicates with which N is used.
> > 
> >  When N is used indirectly, the reference to G only holds inside time
> >  range R, of course.  Care must be taken not to use N as if it
> >  necessarily referred to G, outside of R.  Since R is defined to be
> >  the same for all elements of D, indirect reference is safe in the
> >  default graph.   
> > 
> > 
> > 
> > 
> 
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
> 

Received on Monday, 10 October 2011 14:27:16 UTC