Question about implementing triple pattern fragments client from Gregory Williams on 2015-01-15 (public-hydra@w3.org from January 2015)

From: Gregory Williams <greg@evilfunhouse.com>
Date: Thu, 15 Jan 2015 14:36:01 -0800
To: public-linked-data-fragments@w3.org
Message-Id: <82508553-3375-413C-828B-56A3E78AC71D@evilfunhouse.com>
Hello,

In writing some client code to support triple pattern fragments, I had some questions/concerns about the TPF spec that I’d appreciate feedback on.

Accessing Hypermedia Controls
-----------------------------

[Section 3.5](http://www.hydra-cg.com/spec/latest/triple-pattern-fragments/#controls) describes the expected formatting of hypermedia controls and says:

> The hypermedia control fulfills the hypermedia constraint that each
> representation should contain the controls towards next steps. As a result,
> clients can use Triple Pattern Fragments without any prior knowledge.

However, I have concerns about how trying to use TPF "without any prior knowledge" might work in the general case. I believe the process would go something like this (using the DBPedia TPF endpoint as an example):

1. Provide the client with the endpoint URL `http://fragments.dbpedia.org/2014/en` (known a priori)
2. Client dereferences endpoint URL and parses the content to access the hypermedia controls
3. Client expands the URI template found in the hypermedia controls to begin query execution (accessing triple counts, ordering triples for execution, etc.)

If this is correct, my concern is that the URL dereferenced in step 2 may end up being the same as the URL for the unbounded triple pattern `{ ?s ?p ?o }`. This seems to be the case for the DBPedia endpoint, and while the DBPedia endpoint pages its data, the TPF spec is [pretty clear](http://www.hydra-cg.com/spec/latest/triple-pattern-fragments/#paging) about paging being optional. So is it the case that all TPF clients need to be concerned about the possibility of requesting the entire dataset when all they are after is the hypermedia controls? Similarly, if paging isn't used, accessing triple pattern counts in order to sort triple patterns before query execution may end up causing massive overhead as unnecessary data is returned on all but one of the requests (and at that point you've already got all the data needed simply to execute the query locally).

If I've understood this correctly, in my opinion a few changes would be of great practical benefit to client implementations:

1. Suggest ("MUST" or "SHOULD") that the public endpoint URL that is used as the entry point to a TPF server when no other URL is known be different from the URL of the unbound triple pattern fragment.

2. Either require paging ("MUST" or at least "SHOULD") or require that there be a separate URL to retrieve estimated counts for a fragment.



URI template expansion
----------------------

The note in [section 3.6](http://www.hydra-cg.com/spec/latest/triple-pattern-fragments/#paging) says:

> The use of the requested URL in the representation is especially important in
> the common case in which fragments or pages are accessible through (subtly)
> different URLs, such as `http://example.org/example?s=a&page=3` and
> `http://example.org/example?page=3&s=a`.

If the server provides a URI template for this resource, presumably something like `http://example.org/example{?s,p,o}`, why wouldn't a request with the URI `http://example.org/example?page=3&s=a` be invalid? Appending extra query parameters to a valid, expanded URI template seems acceptable, but arbitrarily adding a query parameter in the middle of a URI template seems like it should probably be either forbidden or undefined behavior. I would think that either the URI template should define the placement of next/previous page data, or the server should consistently generate URIs that have next/previous page data (which isn't defined by the URI template) appended to the otherwise-valid expansion of the URI template.


thanks,
.greg
Received on Thursday, 15 January 2015 22:36:28 UTC