Re: Question about implementing triple pattern fragments client

Hi Greg,

Welcome to the list!
Happy to see that you're apparently working on / considering
the implementation of a TPF client :-)

> 1. Provide the client with the endpoint URL  
> `http://fragments.dbpedia.org/2014/en` (known a priori)

Note that, with triple pattern fragments,
there isn't something like “the” endpoint URL.
Each fragment can serve as a starting point.

For instance, these fragments could be starting points
of the same dataset:
-  
http://fragments.dbpedia.org/2014/en?subject=&predicate=http%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23type&object=http%3A%2F%2Fdbpedia.org%2Fontology%2FArtist
-  
http://fragments.dbpedia.org/2014/en?subject=&predicate=&object=http%3A%2F%2Fdbpedia.org%2Fresource%2FBelgium
-  
http://fragments.dbpedia.org/2014/en?subject=&predicate=http%3A%2F%2Fdbpedia.org%2Fontology%2FbirthPlace&object=

This is why my implementation calls it a "start fragment"
rather than an endpoint.

> my concern is that the URL dereferenced in step 2 may end up being  
> the same as the URL for the unbounded triple pattern `{ ?s ?p ?o }`.

This could be, and seems logical to humans,
but doesn't have to be, as illustrated above.

> This seems to be the case for the DBPedia endpoint, and while the  
> DBPedia endpoint pages its data, the TPF spec is [pretty  
> clear](http://www.hydra-cg.com/spec/latest/triple-pattern-fragments/#paging)  
> about paging being optional. So is it the case that all TPF clients  
> need to be concerned about the possibility of requesting the entire  
> dataset when all they are after is the hypermedia controls?

That is indeed a valid concern. However,
- The start fragment can be chosen arbitrarily
   (But yes, its URL needs to be obtained from somewhere.
    However, the server could generate URLs independently of the controls.)
- While such a fragment might be large, it can be parsed in a streaming way,
   and the client can stop retrieving it as soon as the controls have arrived.
- If the fragment is large, the server itself strongly benefits from  
pagination.
   It is thus not unreasonable to assume the server will page content.

> 1. Suggest ("MUST" or "SHOULD") that the public endpoint URL that is  
> used as the entry point to a TPF server when no other URL is known  
> be different from the URL of the unbound triple pattern fragment.

As there is no dedicated endpoint URL, we cannot suggest it exactly like that.
However, it might be possible to suggest that the URL of some small  
fragment is communicated.
(It could even be empty.)

> 2. Either require paging ("MUST" or at least "SHOULD") or require  
> that there be a separate URL to retrieve estimated counts for a  
> fragment.

I'm less in favor of this, as it complicates the API,
but foremost, because the resulting media type would be less useful.
If we compare this to the human Web, we wouldn't have a page
that has only a number on it.

On a side note, we are currently benchmarking the influence
of page size on clients (and caches) of triple pattern fragments.
One particular case we consider, is that the first page is empty,
and that the second page contains all triples.
This would lead to the effect you describe.

>> The use of the requested URL in the representation is especially  
>> important in
>> the common case in which fragments or pages are accessible through (subtly)
>> different URLs, such as `http://example.org/example?s=a&page=3` and
>> `http://example.org/example?page=3&s=a`.
>
> If the server provides a URI template for this resource, presumably  
> something like `http://example.org/example{?s,p,o}`, why wouldn't a  
> request with the URI `http://example.org/example?page=3&s=a` be  
> invalid?

Sure, it would be an invalid expansion of the template!
But the server is still allowed to serve a fragment at this URL
(yet this behavior is entirely server-dependent).
Maybe “a” is part of another set of hypermedia controls
that extends triple pattern fragments; maybe it's something else.

The point is that, as a client, you shouldn't make any assumptions.
Indeed, you are only allowed to expand the template with s/p/o,
but the server has more freedom.

To address this issue, I would add the above explanation
to the specification document, especially the “invalid” remark.

Best,

Ruben

Received on Friday, 16 January 2015 08:43:57 UTC