Re: Question about implementing triple pattern fragments client from Ruben Verborgh on 2015-01-20 (public-hydra@w3.org from January 2015)

From: Ruben Verborgh <ruben.verborgh@ugent.be>
Date: Tue, 20 Jan 2015 12:08:06 +0100
To: Gregory Williams <greg@evilfunhouse.com>
Cc: public-linked-data-fragments@w3.org, Kjetil Kjernsmo <kjetil@kjernsmo.net>
Message-Id: <6EBC4251-09FE-4607-B930-A7357253EE61@ugent.be>
Hi Greg,

>> Note that, with triple pattern fragments,
>> there isn't something like “the” endpoint URL.
>> Each fragment can serve as a starting point.
> 
> Yes, I understand that. However, simply for usability reasons, won’t there often be a URI that you pass around as a preferred entry fragment (e.g. the fragment with the shortest URI)? Or do you honestly believe that any time somebody wants to access a new TPF server they’ll use an arbitrary/random fragment as the entry point?

You're right, there will probably be a single URI the server advertises as a starting point.
But the current spec allows for this: what the server can communicate is not restricted.

I expect the easiest way would indeed be to communicate the shortest URI;
another option (uncommon on today's Web) is to communicate the template URI,
of by extension, the entire form. The server can decide.

Do you think the spec needs a change to mention this?

> I obviously come at this from a SPARQL perspective, but as I began to look at the TPF spec and the DBPedia fragments server, I had to do much more mucking about with curl and rapper than I had expected just to figure out what the URI template was. I had expected there to be an explanation on the front page pointing me at an entry point URI (instead of having it be silently linked from the “Welcome to DBPedia!” text near the top (but completely different from the link on the “DBpedia – Linked Data Fragments” page header text).

I fully agree there are usability issues with the current DBpedia fragments landing page.
It is a quickly created hybrid between http://data.linkeddatafragments.org/ and http://client.linkeddatafragments.org/.

Note that, funnily, the machine-readable version is actually better and more complete:
    $ curl -s -H 'Accept: text/turtle' http://fragments.dbpedia.org/ | grep "> a void:Dataset"
    <http://fragments.dbpedia.org/2014/en#dataset> a void:Dataset;
    <http://fragments.dbpedia.org/2014/nl#dataset> a void:Dataset;
    <http://fragments.dbpedia.org/#dataset> a void:Dataset;
It leads you straight to the dataset (and exposes one that is hidden in the HTML version ;-)

I will make the DBpedia human landing page more clear at some point.
For the time being, http://data.linkeddatafragments.org/ is the better example.

> Parsing the result in a streaming fashion doesn’t help much if the response payload contains an entire, huge dataset, and the hypermedia controls are only appended to the end of the dataset triples.

Fully agree.

> FWIW, appending the hypermedia controls to the data seems to be exactly what the DBPedia server does.

Actually, the hypermedia controls are output as soon as possible
(https://github.com/LinkedDataFragments/Server.js/blob/v1.1.3/lib/writers/TurtleWriter.js#L51):
you'll encounter fragments with the controls on top, in the middle, and at the bottom.

> It might be a benefit for a server to choose to page, but that isn’t the concern I asked about. As someone thinking about implementing a *client*, I can’t rely on it being sensible for a server to do the right thing.

> This is an issue I think needs to be discussed if the spec is going to leave the choice of paging entirely to the implementation and/or the server configuration.

You're right. How about making it a SHOULD in the spec,
detailing the reason (i.e., only needing controls)?

> OK. Are you opposed to the idea of a separate entry point URL that contains the hypermedia controls but isn’t a fragment?

I'm certainly not opposed to that. In fact, nothing prevents a server from doing so.
I fully agree with Kjetil here:

>> FWIW, I stuffed this in the general VoID description in my implementation. I think it is useful as a best practice to have the controls in not just the fragments, discovery of that is also an issue.


However, I don't think this should be part of the spec, for the very reason that it is already possible.
That really illustrates the flexibility of hypermedia APIs.
Note the similarity with SPARQL query <form>s, which are also not spec'ed.

(Since the TPF spec is, as far as I know, the first spec for a hypermedia API,
 we are obviously still learning here. An editorial review is in progress BTW.)

>>> 2. Either require paging ("MUST" or at least "SHOULD") or require that there be a separate URL to retrieve estimated counts for a fragment.
>> 
>> I'm less in favor of this, as it complicates the API,
>> but foremost, because the resulting media type would be less useful.
>> If we compare this to the human Web, we wouldn't have a page
>> that has only a number on it.
> 
> It would complicate the API, but without that or some other complication (like the one you discuss below), I don’t think TPFs can be effectively used for query answering as much of your work suggests.

We've found the HTTP request itself to be the main overhead.
Sure, we could gain bandwidth by removing things,
such as for instance the hypermedia controls.

However, the current API has a much more evolvable design.
The server can, for instance, even send additional metadata.
If we need a dedicated operation for a count,
what do we do with the other pieces of metadata?

Hypermedia APIs have different design goals than RPC APIs.

> It can obviously be used when certain assumptions hold, but those assumptions (like requiring paging) don’t seem to be explicit.

Paging seems indeed a major assumption here.
If we make paging a SHOULD, would this address the concerns?

> Yes, that would address this issue. If nothing else, I think algorithms and pseudocode that discuss using the triple counts for query execution need to address the potential case where data is not paged.

In a sense, they do: the page size is then just infinity.
The algorithms would be inefficient then, true, but they'd still work.
Another reason for the SHOULD.

> I’m afraid I wasn’t very clear about my concern here. The NOTE in section 3.6 talks about what the server should do when it receives various requests from a client. It gives two example URLs that seem like they should reference the same resource, the first with "?s=a&page=3” and the second with "?page=3&s=a”. The text doesn’t indicate where those URLs came from. We’re left to assume that they originated with the server, but intuitively the server should have only generated one of those (either that or the server implementation is order-agnostic with respect to the query parameters).

I agree the example in the paper might be confusing indeed.
Perhaps we should find something more simple.

This paragraph is supposed to mean:
ensure the IRI you use to describe the fragment with in your representation
is exactly the same IRI through which the fragment was requested.

Reading that again… yeah, we can do with a simpler example :-)

Ruben
Received on Tuesday, 20 January 2015 11:08:37 UTC