Re: ISSUE-29: Proposal to resolve DOM origin generalization from Ivan Herman on 2010-08-12 (public-rdfa-wg@w3.org from August 2010)

From: Ivan Herman <ivan@w3.org>
Date: Thu, 12 Aug 2010 13:22:28 +0200
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: RDFa WG <public-rdfa-wg@w3.org>
Message-Id: <E07831AC-A8C4-4C29-9779-FCF0AC31041D@w3.org>
Manu,

exactly as you state the issue is on the 'lower' level interfaces of IRI and *Literal. (As an aside, the current interface does not have an origin for a blank node; Why?). As long as we maintain the approach (which I think we should!) that if one has an implementation of those basic interfaces as well as for RDFTriple, the remaining interfaces, eg, anything you describe on property groups, can be implemented on top of those, we should first agree what can/should be done on those before we run ahead and look at the details of property groups. The way I understand it, property groups are immensely useful macro-like operations in this respect. So let me concentrate on the basics only.

You say, below:

[[[
> Both Ivan and Nathan raised concerns that comparing triples would cause
> confusion because of the .info property on each subject, predicate,
> object in a triple. Toby mentioned that adding the .info property to the
> triple would solve that issue. Ivan and Nathan stated that it would
> still make triple-to-triple comparison difficult.
]]]

this is not exactly what I said. In fact, I agreed with Toby on this. The way I envisaged things is to say

- there is no origin/source/info (or whatever) attribute on IRI, *Literal or Bnode
- each triple is, essentially, a quad; the fourth entry, let us call it 'info' for the time being, is a structure that contains (or may contain?) information on the origins of the source, the predicate, and the object

When we get to triples (ie, with the filter operation) we should have the choice of:

- get triples only, in which case two triples that differ only on the info will be returned only once because they are conceptually merged, and the fourth element in the returned structure is Null
- get quads in which case, well, if the fourth elements are different for two triples/quads then the two quads will be returned independently

(probably the clean approach is to have two operations, ie, filter and filter_quad or something similar.)

Of course, when serializing the graph the fourth element will disappear, ie, the triples will be queried along the first alternative.

Ie, we do _not_ have to define comparison of triples; we have to define how we _get_ to triples, ie, whether we get them as triples or quads.

How this translates to the higher level operations of property groups is a secondary issue in this sense. It becomes a question of convenience.

As I said before: if we push the idea of an info down to IRI-s or Literals or BNodes, _then_ we have a problem if we want to use any system out there. At least that is my experience so far... And even if I found a way, using quads is a fairly 'standard' way of handling triple stores (the two filters have their direct analogy in RDFLib). 

(Truth must be said: a system like RDFLib uses the fourth element for a URI, really, so I am pushing things a bit by using it for a separate datastructure. Oh well...)

Ivan


On Aug 12, 2010, at 05:14 , Manu Sporny wrote:

> On 08/09/2010 12:17 AM, Manu Sporny wrote:
>> 3) ISSUE-29: DOM origin generalization (on Manu)
>>   http://www.w3.org/2010/02/rdfa/track/issues/29
>> 
>> We need to re-think how .origin is exposed via the RDFa API as well
>> as what it means for a triple to have an origin. .origin also doesn't
>> make sense if the RDFa API is not implemented in a DOM environment.
> 
> In a previous spec, the .origin property was specified for the subject,
> predicate, and object for a triple. This had three problems:
> 
> 1. .origin didn't make sense in a non-DOM environment.
> 2. Just specifying the .origin wasn't very flexible when it came to
>   carrying other data that may be important to developers.
> 3. The name "origin" could be confused with the Origin property in
>   HTTP.
> 
> This resulted in several changes to the editors draft spec:
> 
> 1. .origin was renamed to .source and placed in a dictionary called
>   .info for each subject, predicate and object in a triple.
> 2. .source was optional, so non-DOM environments would still be able
>   to be compliant with the RDFa API.
> 3. .info could be used to carry any arbitrary developer information.
> 4. .source would not be easily confused with other DOM/HTTP concepts.
> 
> Both Ivan and Nathan raised concerns that comparing triples would cause
> confusion because of the .info property on each subject, predicate,
> object in a triple. Toby mentioned that adding the .info property to the
> triple would solve that issue. Ivan and Nathan stated that it would
> still make triple-to-triple comparison difficult.
> 
> I had mentioned that we can state that default comparisons should ignore
> the .info property altogether unless the developer specifically overrode
> the comparison operator to take the .info property into account.
> 
> Mark suggested removing the .info property from subjects, predicates,
> objects and triples and migrating it to the Property Group object. This
> way, comparison wouldn't be affected and the developer could specify
> whether or not they wanted to retrieve the DOM node associated with a
> particular part of a Property Group. I forgot that we had already done
> this, but hadn't explained how the .info gets populated and what type of
> information it can store. However, we may want to remove .info entirely
> and replace it with a method and leave the implementation of how to
> track .info up to developers. More on this below...
> 
> Mark also mentioned modifying the Data Query interface such that a
> developer could specify whether or not they wanted to include DOM
> information or not. This is the current select() interface on DataQuery:
> 
>   Sequence<PropertyGroup> select (in Object? query,
>                                   in optional Object template);
> 
> We would change that interface to this:
> 
>   Sequence<PropertyGroup> select (in Object? query,
>                                   in optional Object template,
>                                   in optional array options);
> 
> 'options' could be an array of string options that should be used to
> build the resulting Property Groups when performing a query.
> 
> So, a query might look like the following:
> 
> pgs = query.select(..., ..., ["source",]);
> 
> would return an array of property groups that include the source
> information for the PG and all properties of the PG. We would have to
> extend the PropertyGroup interface by adding one of the two following
> methods:
> 
>    Sequence<any> source (in string predicate);
> 
> OR
> 
>    Sequence<any> info (in optional string predicate,
>                        in optional string name);
> 
> I'm leaning toward the latter because it allows the developer more
> flexibility in having many more informational items associated with a
> PropertyGroup. For example, with the latter all of these are possible:
> 
> // get the Property Group's subject declaration elements:
> subjectElements = pg.info(None, "source");
> 
> // get the object declaration elements for all "foaf:name" properties
> objectElements = pg.info("foaf:name", "source");
> 
> One drawback to this approach is that you don't know where predicate's
> sources are, but I couldn't think of a use case where you'd care as most
> predicates are found in @rel/@property/@typeof properties and I couldn't
> think of a case where you'd want to retrieve those.
> 
> I guess we could do something like this:
> 
> subjectElements = pg.info("foaf:name", "subjectSource");
> predicateElements = pg.info("foaf:name", "predicateSource");
> objectElements = pg.info("foaf:name", "objectSource");
> 
> A non-DOM-based RDFa API would return an empty array for each of these
> queries, which would be fine.
> 
> Now, as far as implementations go, I'd expect that each subject,
> predicate, object of a triple would store the source element (which is
> exactly what the current draft text states). So, Ivan, Nathan and Toby's
> concerns are still there because developers may not know that we intend
> that comparisons should be done with only the raw triple data, excluding
> source. So we will have to state something about doing comparisons
> between core RDF types.
> 
> This is all preliminary of course and is open to debate and general
> discussion. I haven't spent a great deal of time thinking through every
> detail of this proposal, so there may be holes in it or someone may have
> a different approach that we haven't heard about yet. Thoughts?
> 
> -- manu
> 
> -- 
> Manu Sporny (skype: msporny, twitter: manusporny)
> President/CEO - Digital Bazaar, Inc.
> blog: WebID - Universal Login for the Web
> http://blog.digitalbazaar.com/2010/08/07/webid/2/
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Thursday, 12 August 2010 11:21:14 UTC