Re: Web Semantics of Datasets (v0.2)

On Oct 10, 2011, at 9:27 AM, Sandro Hawke wrote:

> On Mon, 2011-10-10 at 15:03 +0200, Ivan Herman wrote:
>> Sandro,
>> 
>> I need clarifications on 'R'.
>> 
>> I presume 'R' is a time interval. 
> 
> Well, it's a set of points in time.   I don't think we need to constrain
> it more than that.   (That is, we don't need to make it all the points
> between a starting time and an ending time, even that's how people will
> often model it.)

> 
>> Does it mean that for any dataset to be valid, a time interval should be defined for it? I guess we can say that if there is no such 'R' as part of a dataset definition, it is considered to be... undefined? 'All Time'?
> 
> All RDF has this kind of implied R, at least when it talks about
> real-world stuf, since there is time in the real world.

Oh Sandro, that is complete nonsense. You might as well say that RDF has quantum electrodynamics built into it. And anyway, having time in the 'world' being described, and having temporality built into the logic you are using to describe it, are two VERY different ideas, and the former most definitly does not entail the latter. 

You are talking about 'time' as though it was something obvious and simple. It is not. There are literally DOZENS of alternative temporal ontologies of times, and we will presumably have to choose one. Just for one example, your casual choice, above, of allowing arbitrary point-sets rather than intervals means that you have a structure which has bnever, AFAIK, been given a proper algebra, and for which (again AFAIK) no temporal logic or axioms have ever been written. 

> 
> Right?   Any time you have an RDF graph that makes claims about the real
> world, there's an implied or out-of-band set of points in time which the
> author and reader understand it to apply to.

No, completely and totally wrong. Or at any rate, if this is true, then the entire RDF standard so far has been wrong. 

>  When Tim says:
> 
>  timbl:i foaf:name "Tim Berners-Lee".
> 
> we understand that he means that's his name at some points in time,
> roughly "now".  

And when we read this triple, is it true at that "now" (which is likely to be very different from the "now" when TIm published it) ? Which "now" are you talking about? 

> We know it might have been different in the past, and
> might be different in the future.   For most purposes, we don't really
> care too much exactly when R starts and stops.
> 
> As Richard has pointed out, for some applications -- eg government
> statisitcal data -- we do care, a lot.  data.gov.uk has been using the
> dc:temporal property to convey R, I believe.  (Jeni told me this in a
> meeting; I haven't had a chance to look into the details.  It's in-scope
> for the Government Linked Data WG to make a Recommendation about this.)

Wrong. I dont care about the details, but if they are using ANY kind of property to record time, then they are working with the RDF semantics, and using a timeless logic to describe times and time relationships. That is not what you are advocating: it is not using a temporally located logic with a built-in notion of 'present' or 'now'. 

Check out the distinction between 'valid time' and 'transaction time' in temporal RDBs.

> 
> So, pretty much all RDF in practice has an R, and usually it's not
> declared and we muddle along.  Sometimes it really needs to be declared
> for us to use the data well.   There's no standard way to do that yet.
> 
> Datasets, in this Web Semantics proposal, are no different, except that
> I think it's important to be more clear about it, because the part of
> the real world datasets are talking about is the Web, which machines
> interact with directly.  They have a harder time with "roughly now".  So
> I think we should be more clear about when, exactly, the datasets named
> g-boxes have the given contents. 
> 
>> How does this affect deployed datasets that may have G-s that vary in time already, but where there is no such time definition? Should we require SPARQL 1.1 to have a function that returns 'R' for a given dataset?
> 
> It would be good to provide ways for dataset providers to publish their
> R, but I don't think we should require they do it.   Leaving it implied
> is sometimes good enough.
> 
>> I wonder whether we can shy away from mentioning time altogether and accept that fact that <N,G> refers to a name for a G-box, ie, to something that can change over time, and our spec remain silent on this...
> 
> We can, but I think we would be doing the users a significant
> disservice.  There is an observable connection between g-boxes with
> dereferenceable names and their contents.  I think we need to make sure
> people understand when that observable connection will line up with the
> connection shown in datasets.

Seriously, as I cannot be at the F2F. If the WG decides to incorporate time into the RDF semantic model in the way being suggested here, then I will quit the WG and someone else can edit the revized semantics document. This is a research project and I would not undertake it without substantial funding for at least a years' fulltime effort. Three years would be more realistic.

Pat

> 
> -- Sandro
> 
>> Ivan
>> 
>> 
>> On Oct 10, 2011, at 13:30 , Sandro Hawke wrote:
>> 
>>> Here's some revised wording for the proposal, getting a bit closer to
>>> spec text.   It's still somewhat informal, and mixing normative and
>>> non-normative bits, and best-practice.   And it's not as clear as it
>>> should be about handling change over time.
>>> 
>>>   -- Sandro
>>> ===
>>> A dataset D is true iff (1) its default graph is true and (2) for
>>> every pair of <N,G> in D, N names something (a "resource", sometimes
>>> called a "g-box") which, at every time T in R, has G as its current
>>> state.
>>> 
>>> It follows from AWWW that if N is an IRI which can be dereferenced,
>>> a successful, correct dereference of N at any time T in R must yield
>>> a serialization ("representation") of G.
>>> 
>>> In order to know whether a dereference occurs at a time in R, it is
>>> useful to have R declared in the default graph of D, or in another
>>> nearby, easy-to-find data source.  Where possible, is is helpful to
>>> have R be All Time; that is, having N name a resource whose state,
>>> by definition, never changes.
>>> 
>>> In RDF data, N may be used (1) directly, to name the g-box,
>>> expressing things like the license that applies to its state, or who
>>> controls it; and (2) indirectly, to refer to G as the current state
>>> of the g-box.  Indirect reference can be used to express things
>>> about an RDF Graph (a "g-snap"), like that it was the graph some
>>> entity asserted at some time.  Indirection is done in the semantics
>>> of the predicates with which N is used.
>>> 
>>> When N is used indirectly, the reference to G only holds inside time
>>> range R, of course.  Care must be taken not to use N as if it
>>> necessarily referred to G, outside of R.  Since R is defined to be
>>> the same for all elements of D, indirect reference is safe in the
>>> default graph.   
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>> FOAF: http://www.ivan-herman.net/foaf.rdf
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Received on Monday, 10 October 2011 21:31:43 UTC