Re: Linked Data Platform ISSUE-20: What is the base URI of a POSTed document?

On 10/11/2012 05:06 AM, Andy Seaborne wrote:
> On 10/10/12 18:05, Henry Story wrote:
>> On 10 Oct 2012, at 17:03, Andy Seaborne
>> <> wrote:
>>> This does have a real consequence to implementation:
>>> A design that
>>> 1/ receive POST -- some general receipt handling
>>> 2/ content-type: parse body as RDF
>>> 3/ Decide it's a container
>>> 4/ dispatch request to container
>>> 5/ Create new BPR
>>> trying to create an abstraction of "incoming RDF", does not work
>>> because the parsing happens before the operation is known to be a
>>> container with specific action of creating the new BPR.
>> There are a few answers to that:
>>   A. you simply don't parse the RDF and just serialise it to disk into
>>    the file name created around 3 in your design. Doing that
>> everything will
>>    work just right, because the relative URLs will automatically turn
>> into
>>    the right URLs when fetched in the next round.
>>     (I imagine that this is exactly what MUST happen in WebDAV or Atom)
> Aside: I think this is pushing it a bit too far - RDF is a data model,
> Turtle a transfer syntax.  The Turtle bytes aren't the data - the RDF
> triples (absolute URIs) are.

I totally agree with "The Turtle bytes aren't the data". But imposing
absolute URIs to define RDF graph is plain wrong, and highly
impractical. Please let me describe something we do for an internal
project at W3C, using banana-rdf [1].

Basically, we need to go back and forth between Java/Scala objects and
RDF graphs. We call that "binding", but it's very similar to
"mapping", as in ORM. This is a very common thing that people would
like to do in the IT industry :-)

When you transform an object into a graph in the RDF world, you don't
need to have absolute URIs. Actually, you do *not* want to have
absolute URIs. People doing Functional Programming will just tell you
that at this point, you're interested in the *state* of the graph (its
shape the literals at the leaves) but not yet in the *identity* of the
object (a global identifier that you could share).

Then, you want to store this object, using LDP for example (that's
what we do already). In this case, POSTing to the Container for the
objects of the same kind is a common use-case for us. The cool thing
is that the Container will be in charge of giving an identity to this
RDF graph.

So, here is a summary of the workflow:
* given an object O of type T
* given a binding strategy S for objects of type T
* given a Container C (as in LDP) in your application
* apply S to O and get an RDF graph G *with relative URIs*
* POST G to C
* you get back the identity of G as an absolute URI

This is very similar to a Relational Database assigning a rowid, or
even your programming language letting you manipulate variable names
for things that are actually addresses in memory: not knowing where it
will be stored must not prevent you from defining the data.

In conclusion: being able to define relative Graphs that you can POST
to a container is just too important.



>>   B. You parse the incoming stream into a graph that accepts relative
>> URLs,
>>     and then in 3/ either
>>       a- place it into a store that accepts relative URLs
>>       b- resolve the URLs against the full store url
>>    C. You delay the parsing until around 3 or 4 when you know the full
>>     URL.
> No dispute it can be implemented but if the particular implementation
> choice is forced I think we are in (a minor) "willful violation" of RFC
> 3986. Implementation choices should be invisible.
>> The fact that A works, is very good reason to believe that my proposal -
>> which Steve Battle named A) is the correct design.
>> B seems to make a good case for having at least parsers that can parse
>> documents with relative URLs without needing to resolve them.
> That would be a change - the output would not be strict RDF.  The data
> would have to be modified later to "correct" the URIs.
>> C. Should be quite possible to do, since downloading documents should
>> be done asynchronously, and takes time, whereas finding out from the
>> path that a resource needs to be created can be done extremely quickly.
> In the SPARQL GSP, POST to a graph means "add triples" - this is inline
> with RFC 2616 where it says POST can be "Extending a database through an
> append operation".   The base URI is the target graph, there being only
> one URI to consider.
> We could phrase is as "the base URI is the target of the request" and
> then make the target the newly create resource.
> Base URIs matter a lot more in RDF syntax - we're just pushing the
> boundaries of specs not designed with the current (new) usage in mind.
>      Andy

Received on Thursday, 11 October 2012 12:46:46 UTC