Re: Question about implementing triple pattern fragments client from Gregory Williams on 2015-01-20 (public-hydra@w3.org from January 2015)

From: Gregory Williams <greg@evilfunhouse.com>
Date: Tue, 20 Jan 2015 15:44:11 -0800
To: Ruben Verborgh <ruben.verborgh@ugent.be>
Cc: public-linked-data-fragments@w3.org, Kjetil Kjernsmo <kjetil@kjernsmo.net>
Message-Id: <46BD3184-0132-4D49-8466-D0FFB0087428@evilfunhouse.com>
On Jan 20, 2015, at 3:08 AM, Ruben Verborgh <ruben.verborgh@ugent.be> wrote:
> 
>> Yes, I understand that. However, simply for usability reasons, won’t there often be a URI that you pass around as a preferred entry fragment (e.g. the fragment with the shortest URI)? Or do you honestly believe that any time somebody wants to access a new TPF server they’ll use an arbitrary/random fragment as the entry point?
> 
> You're right, there will probably be a single URI the server advertises as a starting point.
> But the current spec allows for this: what the server can communicate is not restricted.
> 
> I expect the easiest way would indeed be to communicate the shortest URI;
> another option (uncommon on today's Web) is to communicate the template URI,
> of by extension, the entire form. The server can decide.
> 
> Do you think the spec needs a change to mention this?

I agree about the shortest URI being the best option. But that was what caused my initial concern: for servers with URI templates like /{?s,p,o}, the shortest fragment is going to be /, which will be the URL for the maximal fragment. As long as the paging issue is addressed (discussed below), I don’t think the spec needs to change here, but best practices might be developed for communicating the shortest fragment URL as a default entry point.

>> I obviously come at this from a SPARQL perspective, but as I began to look at the TPF spec and the DBPedia fragments server, I had to do much more mucking about with curl and rapper than I had expected just to figure out what the URI template was. I had expected there to be an explanation on the front page pointing me at an entry point URI (instead of having it be silently linked from the “Welcome to DBPedia!” text near the top (but completely different from the link on the “DBpedia – Linked Data Fragments” page header text).
> 
> I fully agree there are usability issues with the current DBpedia fragments landing page.
> It is a quickly created hybrid between http://data.linkeddatafragments.org/ and http://client.linkeddatafragments.org/.
> 
> Note that, funnily, the machine-readable version is actually better and more complete:
>    $ curl -s -H 'Accept: text/turtle' http://fragments.dbpedia.org/ | grep "> a void:Dataset"
>    <http://fragments.dbpedia.org/2014/en#dataset> a void:Dataset;
>    <http://fragments.dbpedia.org/2014/nl#dataset> a void:Dataset;
>    <http://fragments.dbpedia.org/#dataset> a void:Dataset;
> It leads you straight to the dataset (and exposes one that is hidden in the HTML version ;-)
> 
> I will make the DBpedia human landing page more clear at some point.
> For the time being, http://data.linkeddatafragments.org/ is the better example.
> 
>> Parsing the result in a streaming fashion doesn’t help much if the response payload contains an entire, huge dataset, and the hypermedia controls are only appended to the end of the dataset triples.
> 
> Fully agree.
> 
>> FWIW, appending the hypermedia controls to the data seems to be exactly what the DBPedia server does.
> 
> Actually, the hypermedia controls are output as soon as possible
> (https://github.com/LinkedDataFragments/Server.js/blob/v1.1.3/lib/writers/TurtleWriter.js#L51):
> you'll encounter fragments with the controls on top, in the middle, and at the bottom.

Oops. This was my mistake. I looked too quickly at the output and saw some of the hydra metadata at the end of the content and assumed that’s where the hypermedia controls were, too.

>> It might be a benefit for a server to choose to page, but that isn’t the concern I asked about. As someone thinking about implementing a *client*, I can’t rely on it being sensible for a server to do the right thing.
> 
>> This is an issue I think needs to be discussed if the spec is going to leave the choice of paging entirely to the implementation and/or the server configuration.
> 
> You're right. How about making it a SHOULD in the spec,
> detailing the reason (i.e., only needing controls)?

Yes, I’d be very happy to see the spec recommend paging (and mention potential pitfalls if implementations chose not to do paging).


>> OK. Are you opposed to the idea of a separate entry point URL that contains the hypermedia controls but isn’t a fragment?
> 
> I'm certainly not opposed to that. In fact, nothing prevents a server from doing so.
> I fully agree with Kjetil here:
> 
>>> FWIW, I stuffed this in the general VoID description in my implementation. I think it is useful as a best practice to have the controls in not just the fragments, discovery of that is also an issue.
> 
> 
> However, I don't think this should be part of the spec, for the very reason that it is already possible.
> That really illustrates the flexibility of hypermedia APIs.

Understood. What I’ve been trying to communicate, though, is that the spec leaves a lot of choices to the implementation, without discussing the impact of those choices.


> Note the similarity with SPARQL query <form>s, which are also not spec’ed.

I don’t understand the comparison here. Could you explain?

> (Since the TPF spec is, as far as I know, the first spec for a hypermedia API,
> we are obviously still learning here. An editorial review is in progress BTW.)

Yes, understood. And I hope my feedback is taken as trying to help that process, not just complain.

>>>> 2. Either require paging ("MUST" or at least "SHOULD") or require that there be a separate URL to retrieve estimated counts for a fragment.
>>> 
>>> I'm less in favor of this, as it complicates the API,
>>> but foremost, because the resulting media type would be less useful.
>>> If we compare this to the human Web, we wouldn't have a page
>>> that has only a number on it.
>> 
>> It would complicate the API, but without that or some other complication (like the one you discuss below), I don’t think TPFs can be effectively used for query answering as much of your work suggests.
> 
> We've found the HTTP request itself to be the main overhead.
> Sure, we could gain bandwidth by removing things,
> such as for instance the hypermedia controls.
> 
> However, the current API has a much more evolvable design.
> The server can, for instance, even send additional metadata.
> If we need a dedicated operation for a count,
> what do we do with the other pieces of metadata?
> 
> Hypermedia APIs have different design goals than RPC APIs.

My suggesting a separate API for accessing triple counts was a suggestion for a possible way to deal with problems that occur without paging, though I’ll admit it isn’t the nicest solution. My problem (and proposed solution) essentially goes away if the spec recommends paging (and if work on query execution using TPF talks about what happens if paging isn’t used).

>> It can obviously be used when certain assumptions hold, but those assumptions (like requiring paging) don’t seem to be explicit.
> 
> Paging seems indeed a major assumption here.
> If we make paging a SHOULD, would this address the concerns?

Yes!

>> Yes, that would address this issue. If nothing else, I think algorithms and pseudocode that discuss using the triple counts for query execution need to address the potential case where data is not paged.
> 
> In a sense, they do: the page size is then just infinity.
> The algorithms would be inefficient then, true, but they'd still work.
> Another reason for the SHOULD.

They might work in theory, but as an implementor, that isn’t much comfort. For practical reasons, I’ll still have to guard against weird servers that may try to send me the entire dataset if I request a fragment. And I’ll have to deal with potential bad outcomes of such a request like connection timeouts.

>> I’m afraid I wasn’t very clear about my concern here. The NOTE in section 3.6 talks about what the server should do when it receives various requests from a client. It gives two example URLs that seem like they should reference the same resource, the first with "?s=a&page=3” and the second with "?page=3&s=a”. The text doesn’t indicate where those URLs came from. We’re left to assume that they originated with the server, but intuitively the server should have only generated one of those (either that or the server implementation is order-agnostic with respect to the query parameters).
> 
> I agree the example in the paper might be confusing indeed.
> Perhaps we should find something more simple.
> 
> This paragraph is supposed to mean:
> ensure the IRI you use to describe the fragment with in your representation
> is exactly the same IRI through which the fragment was requested.
> 
> Reading that again… yeah, we can do with a simpler example :-)

If that’s all you’re trying to say, then I think we’re in agreement (on both the intended point and the desire for a simpler example).

thanks,
.greg
Received on Tuesday, 20 January 2015 23:44:34 UTC