Re: RDF-ISSUE-131 (mobile-datasets): How can one create an RDF dataset without being a web server? [RDF Graphs]

On 05/17/2013 10:49 AM, Richard Cyganiak wrote:
> I'm not really sure why RDF-WG is spending time on this issue at all.

It's in our charter to provide a way to work with multiple graphs. That 
was one of the top two priorities for RDF when this group was created.  
Being able to do this in datasets created outside of web servers seems 
to me to be important.

>   As Sandro's research shows, there are several adequate ways of addressing this already.

I said there were options.  I don't see how any of them is workable 
without Working Group action.

> (Personally I find option 3 particularly elegant, and don't find that it contradicts any spec I know of. Yes, protocols that want to use URI-bearing payloads should define how a base URL can be established for a given payload. If a protocol doesn't do that, then consumers have to make up a base. That doesn't break anything, as long as the consumer re-serialises everything using relative URIs again.)

I think if we're going to go with option 3, we need to explicitly 
license this in the specs, saying it's okay to communicate with an 
undefined base.  Maybe that's a single well-worded sentence, but I think 
it's actually a big mental shift, easily missed or misunderstood.  Maybe 
something like 
https://dvcs.w3.org/hg/rdf/raw-file/5d1f10084f79/rdf-concepts/index.html#section-skolemization

If the client sends the same document twice, are they to be understood 
as serializing the same graph?

I'm not sure how the re-serialization you're talking about could work.   
Is that the same graph or a different graph?

It seems to me that these unresolved URIs are so close to blank nodes 
that they should just be blank nodes, so that we know how to deal with them.

(Relative Nodes vs Blank Nodes....)

(cf LDP-ISSUE-53, but LDP is very much about defining a protocol, so for 
them it's much simpler to just declare the answer by fiat, as they do.)

Hm.   Looking at SPARQL.   If the client wants to INSERT DATA on the 
server, and it needs named graphs, ...  it's going to need a server 
that's extended to understand blank node graph names, or it needs to 
invent UUIDs, or ... no, option 3 doesn't work, because there's already 
a defined base.   So Option 3 requires changing the behavior of existing 
SPARQL; Option 4 requires (someday) extending SPARQL in an obvious way.

     -- Sandro

>
> Best,
> Richard
>
>
> On 17 May 2013, at 14:24, "RDF Working Group Issue Tracker" <sysbot+tracker@w3.org> wrote:
>
>> RDF-ISSUE-131 (mobile-datasets): How can one create an RDF dataset without being a web server? [RDF Graphs]
>>
>> http://www.w3.org/2011/rdf-wg/track/issues/131
>>
>> Raised by: Sandro Hawke
>> On product: RDF Graphs
>>
>> In general, the SPARQL definition of datasets (adopted into RDF 1.1 by WG resolution on 29 October 2012) satisfies our charter deliverable of allowing people to work with multiple graphs.   However, it requires that each graph be labeled with an IRI, and creating such an IRI can be problematic.
>>
>> It's easy enough for software to make up IRIs for graphs if it happen to be a web server, in charge of some range of web addresses.   But how can other software do this?   For instance, how can a web client create a dataset to send as one of several parameters in an HTTP POST operation?   And how can a web client use datasets for HTTP PATCH (as the LDP Working Group wants to do).  And how can something use datasets in a UDP or TCP based protocol?
>>
>> At the moment, a few options come to mind:
>>
>> Option 1 - Use RFC-4122 Random UUIDs as graph names.   These are IRIs that look like urn:uuid:7a745845-5a5e-46ad-9ae7-6ec202741183, where the hex parts are 118 random bits, and 10 fixed bits.   In theory, collision is unlikely if a good source of randomness is available.  Perhaps the randomness can be improved by including a hash of the other parts of the dataset.   Note that use of non-resolvable IRIs like this is bad practice for Linked Data.
>>
>>   <urn:uuid:7a745845-5a5e-46ad-9ae7-6ec202741183> { ... contents of graph ... }
>>
>> Option 2 - Use a UUID-like string as an IRI base or prefix for graph names.   (Slight variation on Option 1.)  By going outside the RFC-4122 syntax, we can include a "local part" in the IRI.   Something like:
>>
>>   @prefix my: <tag:w3.org,2013:uuid:7a745845-5a5e-46ad-9ae7-6ec202741183:>
>>   ...
>>   my:g2 { ... contents of graph 2 ... }
>>
>> Option 3 - Use a "relative" dataset, where the graph names are written as relative IRIs but the base for IRI-resolution is not known to the system generating the dataset and is assigned to some new, unique IRI base by each receiver.    This is arguably not licensed by the current RDF drafts or the SPARQL 1.1 spec.  Some client libraries will not store or serialize RDF with relative IRIs.
>>
>>   <#g3> { ... contents of graph 3 ... }
>>
>> Option 4 - Use blank nodes as graph names.  This is not allowed in Datasets as defined in the current RDF drafts or the SPARQL 1.1 spec.  Some client libraries will not store or serialize RDF datasets with blank node graph names.   As with other uses of blank nodes, knowing they cannot be referenced by other documents allows certain optimizations, and they can be Skolemized for use in systems that do not want/allow blank nodes.
>>
>>   _:g4 { ... contents of graph 4 ... }
>>
>> Option 5 - Do not directly support this use case in RDF 1.1.   Instead, require systems to use an extended RDF which allows blank node graph names, eg JSON-LD, or variations on TriG and N-Quads which may arise for this purpose.
>>
>>
>>
>>
>

Received on Tuesday, 21 May 2013 12:19:53 UTC