Re: establishing conversational context (was: Creating non-Atom LDPRs: AtomStrict & AtomRelax) from Henry Story on 2013-02-04 (public-ldp-wg@w3.org from February 2013)

From: Henry Story <henry.story@bblfish.net>
Date: Mon, 4 Feb 2013 11:37:18 +0100
To: "Wilde, Erik" <Erik.Wilde@emc.com>
Cc: public-ldp-wg@w3.org, Tim Berners-Lee <timbl@w3.org>
Message-Id: <AE64338C-29DE-4988-AE97-ACED130DD0CC@bblfish.net>
On 4 Feb 2013, at 08:57, "Wilde, Erik" <Erik.Wilde@emc.com> wrote:

> hello henry.
> 
> i am just responding briefly because i have work to do today. i like a lot
> where you're taking this, but i'd like to respond in more detail to some
> of the things you've been writing. most importantly, i'd like to stress
> again that i always meant my comparisons with atom and atompub as looking
> at design patterns. so we should try to learn from how they work (and
> where they may be awkward), but of course we should use RDF's different
> capabilities and metamodel.

I am ccing Tim Berners Lee, because as you will see this may be of interest to
him. Given that he is very busy we'll both have time to get some work done
today, while waiting to see if he has something to add :-) Also
this will allow others to catch up.

As you explain below, the Atom protocol essentially makes a distinction 
between application/atom+xml content POSTed to a container and everything
else. 

The question that needs answering are then: 

Q1: what is the role of the Atom metadata <entry> wrapper ? 
 ( I illustrated this role in the previous mail 
   http://lists.w3.org/Archives/Public/public-ldp-wg/2013Feb/0021.html )
Q2: Do we need this semantic equivalent of the <entry> or <feed> metadata wrapper ?
Q3: What are the different ways of getting the same effect?

In order to move quickly let me answer all three questions simultaneously
with an example, that will also show how in RDF we can avoid the escaped
markup anti-pattern, shown in the Turtle here:

> 
> On 2013-02-03 14:54 , "Henry Story" <henry.story@bblfish.net> wrote:
>> <> a :Entry;
>>  :title "My foaf profile";
>>  :updated "2013-02-02T16:34:06Z"^^xsd:dateTime;
>>  :author [ :name "Jack Daniels" ];
>>  :content """
>>    <> a foaf:PersonalProfileDocument;
>>      foaf:primaryTopic <#i> .
>> 
>> <#i> a foaf:Person;
>>    foaf:name "Jack Daniels";
>>    foaf:interest <http://en.wikipedia.org/wiki/Whiskey>;
>>    foaf:knows <http://www.spiritofspeyside.com/#j>,
>>               <http://www.bunnahabhain.com/#jim> .
>> 
>> [] a <http://dbpedia.org/resource/Whisky> ;
>>    likes <#i>, <http://www.bunnahabhain.com/#jim> .
>> """^^^xxx:Turtle .
>> Here we only have 1 syntax but we still have the data and the
>> metadata as two seperate contents.
> 
> thanks for spelling this out. you make good points why we don't want this,
> and i think if there's one thing that's a very well-established
> anti-pattern in protocol design, it's escaped markup. it's terrible to
> handle, brittle, processing model hell, and even worse in multi-syntax
> environments such as RDF.

So let me rewrite this in N3 
( first version: http://infomesh.net/2002/notation3/
  latest version: http://www.w3.org/TeamSubmission/n3/ )
which permits us to speak about data - ie. it contains syntax for 
graphs ( as sparql does ). 

POSTing an Atom entry in N3 would be done elegantly 
like this:

----------------------------------
POST /account/ HTTP/1.1
Content-Type: text/n3
...

@prefix log: <http://www.w3.org/2000/10/swap/log#> .
@prefix : <http://bblfish.net/work/atom-owl/2006-06-06/#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<> a :Entry;
  :title "My foaf profile";
  :updated "2013-02-02T16:34:06Z"^^xsd:dateTime;
  :author [ :name "Jack Daniels" ];
  log:semantics {
   <> a foaf:PersonalProfileDocument;
       foaf:primaryTopic [ = <#i> ;
         a foaf:Person;
         foaf:name "Jack Daniels";
         foaf:interest <http://en.wikipedia.org/wiki/Whiskey>;
         foaf:knows <http://www.spiritofspeyside.com/#j>,
                    <http://www.bunnahabhain.com/#jim> 
       ] .

  [] a <http://dbpedia.org/resource/Whisky> ;
     foaf:likes <#i>, <http://www.bunnahabhain.com/#jim> .
  } .
---------------------------------------

The above parses correctly in cwm, though I am not sure
exactly what the <> inside the brackets refers to -
I think the same as the external <> .

The above seems like the right way to proceed to specify
the semantics of Atom. We have replaced the :content 
relation with  the log:semantics relation, and we have
thereby removed the escaped markup anti-pattern.

( Whether the log:semantics is the right relation is something
for further discussion. )

We now have made the distinction between data and metadata 
explicit ( whilst also moving into the space of modal logics ).
What is inside the { } is the data that is being spoken about
by the relations outside the { }. This is the same distinction we
know in every day language between 
 
S1: Luke, I am your father.
S2: D. Vader told Luke at 234:22 Stellar Time near a large hole that he was Luke's father .

You can commit to S2 whilst suspending judgement on S1 ( whereas it is not
possible to commit to S2 and not also agree that D. Vader can speak )

This then also meshes nicely with the mime type problem you mention below
since turtle's mime type is not "text/turtle" but "text/n3".

It is just that your claim is now that we need N3 to be properly able to
model Atom. Which if true I think Tim Berners Lee will be very happy 
to know about, because he can then start a working group to finish specifying
N3 ( now that Turtle is pretty much finished ).

Todo:
---- 

You need to help us to understanding what the use cases are which we
cannot solve without the Atom metadata wrapper, semanticised with the equivalent
N3 notation. Or at least what we cannot do elegantly ( the Turtle allows us to
wrap the content in a string, but we then loose the relative URI notation that
is so useful ).

To narrow down this task, we need to see what, if anything, we can do without.
So you need to show us what we cannot do by putting metadata
in the content, or by putting it in the HTTP headers.

These two options are described below:

Provisional Solution 1: Metadata in content
-------------------------------------------

For example it is perfectly possible to POST graphs that say things about 
themselves currently. We could move the metadata into the data, 
for example like this.

{
 <> a foaf:PersonalProfileDocument, :Entry;
    :title "My foaf profile";
    :updated "2013-02-02T16:34:06Z"^^xsd:dateTime;
    :author <#i>;
    foaf:primaryTopic <#i> .

 <#i> a foaf:Person;
     foaf:name "Jack Daniels";
     foaf:interest <http://en.wikipedia.org/wiki/Whiskey>;
     foaf:knows <http://www.spiritofspeyside.com/#j>,
                <http://www.bunnahabhain.com/#jim> 

  [] a <http://dbpedia.org/resource/Whisky> ;
     foaf:likes <#i>, <http://www.bunnahabhain.com/#jim> .
}

This is relevant to ISSUE-11 "Do we need to define server-managed properties 
or do we leave them to applications?" http://www.w3.org/2012/ldp/track/issues/11

For such a POST we could have a rule such that 

{ ?g log:includes { <> :title ?obj }   } => { <> :title ?obj } 
{ ?g log:includes { <> :author ?obj }   } => { <> :author ?obj }  

and so on for a limited list of properties. Ie when the server
receives the POST just above it can create an entry with the
metadata:

 <> rdfs:member [ = <jack> ;
                  :title "My Foaf Profile";
                  :author <jack#i> ] .


One advantage of a rule like this is that we could get metadata
like this out of all kinds of formats, such as HTML, jpegs, 
etc...

Provisional Solution 2: Metadata in Headers
--------------------------------------------

The other solution is to move the metadata into the headers,
using the Web Linking RFC5988
http://tools.ietf.org/html/rfc5988

HTTP after all comes with a method to distinguish data and
metadata: headers and  content. This is not super-elegant
because the HTTP syntax is pretty awkward and its semantics
vague - but as far as the semantics goes that is not that
different from atom, and the advantage is that it is widely
deployed.

So the solution proposed here ( which would be one way of
answering ISSUE-15: "sharing binary resources and metadata" )
would be to send the following:

----------------------------------
POST /account/ HTTP/1.1
Content-Type: text/turtle
Link: <#i>; rel="author"
Link: <>; title="My foaf Profile"
...

@prefix log: <http://www.w3.org/2000/10/swap/log#> .
@prefix : <http://bblfish.net/work/atom-owl/2006-06-06/#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<> a foaf:PersonalProfileDocument;
   foaf:primaryTopic <#i> .

<#i> a foaf:Person;
    foaf:name "Jack Daniels";
    foaf:interest <http://en.wikipedia.org/wiki/Whiskey>;
    foaf:knows <http://www.spiritofspeyside.com/#j>,
                <http://www.bunnahabhain.com/#jim> .

  [] a <http://dbpedia.org/resource/Whisky> ;
     foaf:likes <#i>, <http://www.bunnahabhain.com/#jim> .
  } .
---------------------------------------

Now the ball is in your court. What is it that we cannot do using
either Provisional Solutions 1 or 2? This is not a trick question,
but a serious one which could help Tim argue for something that we
have been long waiting to have done: namely N3 ( which I think should
be called N4 ) as a W3C standard.

> 
> however, why did we end up with this unfortunate example? because in this
> case, only escaping, as you mention, allows us to separate protocol data
> and content. we can avoid getting in this mess by setting conversational
> context via media types, ideally (sorry for repeating this) by actually
> using a protocol media type, or by using profiles for augmenting the
> generic RDF media type identifiers.
> 
> why am i repeating this: it's the problem we started with in many
> different places quite a while ago: how can i distinguish between just
> putting LDP triples in an entry's content, or adding LDP data. in atom,
> this all works perfectly just because of media types. for example, if you
> have a collection that exposes application/atom+xml and application/xml,
> then this is exactly and all you need. you can do these two things:
> 
> POST collection
> Content-Type: application/atom+xml
> <entry> .... </entry>
> 
> POST collection
> Content-Type: application/xml
> <entry> .... </entry>

so the answer to this has to be that in the LDP case
we should distinguish by using Turtle for the second 
case and N3 for the first case. Only N3 gives us the
expressive power to correctly represent Atom 
semantically.

> 
> and they do different things, because of the media type. in the first
> case, we use the protocol to send a request about creating an entry with
> some metadata and probably content in it. fine, this will become an
> "aggregate" member.
> 
> in the second case, we use the protocol to ask the LDP service to please
> create a new XML resource with opaque XML in it, and the entry we're
> POSTing will just be the content of the newly created member (it will
> become a "media resource"). in the second case, LDP will create an <entry>
> by itself, and redirect us to it, so that in the end we have created two
> entries with one POST: the LDP member that is the entry created by LDP
> itself, and the <entry> markup that we have POSTed that is treated by LDP
> as being opaque XML.
> 
> and that's not all: in the second case, i'll end up with two URIs: let's
> call them collection/member/42 for the LDP entry created by the LDP
> service, and collection/member/42/content for the XML that i have POSTed.
> i can GET both of them, and i can send the URIs of both of them to
> anybody. again, media types make this work robustly:
> 
> - when GETting collection/member/42, the server responds with a
> application/atom+xml resource, making it clear that this is not just XML
> that happens to look like atom, it is actually a resource that promises to
> follow the rules of the conversational context of atom. thus i can start
> processing the contents as atom: i use atom's processing model and
> protocol. in that entry, you,ll find a link to the entry's content at
> collection/member/42/content
> 
> - when getting collection/member/42/content, the server responds with a
> application/xml resource, making it clear that it's serving XML that's
> payload (a media resource) and not protocol, so clients shouldn't use
> atom's processing model and protocol to interpret it. you could look at
> this like it's "escaped by media type", the generic media type makes it
> clear that this resource (as it's being served from this server) has no
> web-level behavior associated with it, it's just XML data. a media  tyoe
> is a promise to follow rules, and for this resource, the atom server does
> not promise to follow any rules, because the XML could by anything and the
> server neither knows nor cares.
> 
> last but not least: because of these clear rules, it is logically
> impossible to POST something that is labeled as application/atom+xml but
> does not follow the media type rules. said differently, you cannot have a
> collection that accepts application/atom+xml as media resources: if you
> want to build an atompub server managing XML, it has to expose other XML
> media types that that, either specific ones, or the generic
> application/xml.
> 
> i know that i am mostly reiterating things here, but i am hoping that we
> are at a point now where these things make more sense, because of the
> issues we run into with our protocol design. i do believe that when
> designing protocols, being able to clearly establish conversational
> context is essential. how we do it is something we still need to discuss,
> but for now all i want to say strongly is that we definitely should avoid
> escaped markup. but i think we all agree on that one anyway ;-:
> 
> cheers,
> 
> dret.
> 

Social Web Architect
http://bblfish.net/
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Monday, 4 February 2013 10:37:55 UTC