Re: [dxwg] Profile negotiation [RPFN] from Annette Greiner on 2018-06-06 (public-dxwg-wg@w3.org from June 2018)

From: Annette Greiner <amgreiner@lbl.gov>
Date: Tue, 5 Jun 2018 18:17:57 -0700
To: Rob Atkinson <rob@metalinkage.com.au>
Cc: Ruben Verborgh <Ruben.Verborgh@ugent.be>, "public-dxwg-wg@w3.org" <public-dxwg-wg@w3.org>
Message-ID: <f6b03a6d-5d97-d238-81e2-435c59cc1974@lbl.gov>
You say:

"This provides an optional mechanism that significantly simplifies the 
user experience - at the cost of more server smarts.  Server smarts are 
paid for once is the good news in that scenario. currently all the 
burden is on the user with no standardised mechanisms and the user pays 
(or in practice more likely is unable to access data)"

In that paragraph, you refer to "the user" and I don't know what user 
you mean. Can you clarify that, please?


On 6/5/18 6:12 PM, Rob Atkinson wrote:
> The combination of dcterms:conforms to on a distribution,  and 
> profileDesc gives that option for a catalog using dcat...
>
> And all of the above are clients a user might use...
>
>
> On Wed, 6 Jun 2018, 10:57 Annette Greiner <amgreiner@lbl.gov 
> <mailto:amgreiner@lbl.gov>> wrote:
>
>     One thing to consider is how important it might be for data
>     catalogs to have information about what profile options are
>     available. As an end user seeking a dataset with which I might
>     build an app or a visualization, I think it would be nice to be
>     able to select a profile in a search and get a list of datasets
>     that use that profile and also meet my other criteria. I wouldn't
>     want to have to test each result separately to see if I can use it.
>
>     But I want to understand who you mean by "the user" here. A human?
>     A developer writing a web app? A script? Who are they the client
>     of? The original publisher? The data catalog publisher?
>
>
>     On 6/5/18 5:22 PM, Rob Atkinson wrote:
>>
>>     This is what this UC is trying to cover ..
>>     https://github.com/w3c/dxwg/issues/239
>>
>>     All you say is correct - and at one level profile negotiation
>>     adds a mechanism which is extra complexity. From the users
>>     perspective however it means that an object identifier becomes a
>>     potential source of the meta-information - you dont have all the
>>     extra complexity of dealing with a catalog to find this info - or
>>     even finding the right catalog. Server is its own catalog if you
>>     like (and in fact it may even be implemented that way)
>>
>>     This provides an optional mechanism that significantly simplifies
>>     the user experience - at the cost of more server smarts.  Server
>>     smarts are paid for once is the good news in that scenario.
>>     currently all the burden is on the user with no standardised
>>     mechanisms and the user pays (or in practice more likely is
>>     unable to access data)
>>
>>     So - a great conversation to keep in mind all these factors and
>>     see if we can find the right set of tools and recommendations for
>>     the best solution for a Web of Data outcome, recognising that
>>     point solutions for smaller communities already exist and will
>>     remain attractive. Cataloguing these things is still probably the
>>     only option. Just better if we have one information model for
>>     both cases.
>>
>>
>>
>>
>>
>>     On Wed, 6 Jun 2018 at 10:09 Annette Greiner <amgreiner@lbl.gov
>>     <mailto:amgreiner@lbl.gov>> wrote:
>>
>>         What I'm seeing a requirement for is a standardized way to
>>         indicate the
>>         availability of alternative forms of a dataset with different
>>         profiles
>>         and to enable the end user (human or script) to receive the most
>>         appropriate one for their use.
>>
>>         Consider the case where the client is a human, browsing to
>>         find a
>>         dataset that matches a certain profile that they like. If
>>         they are using
>>         a typical commercial browser, they don't have a ready
>>         facility to use
>>         content negotiation.
>>
>>         Consider the case where the client is a script harvesting
>>         datasets for a
>>         catalog. If the catalog publishers want to be able to
>>         indicate which
>>         profiles are available for a dataset, they need to capture a
>>         list of
>>         available profile options. Using content negotiation, they
>>         need to make
>>         a request and then capture the list of available formats that
>>         the server
>>         returns in the header. For that to work, the script needs to
>>         be written
>>         to expect negotiation as one way it can get such data. If
>>         everyone
>>         publishes their data this way, that's fine. But what if content
>>         negotiation by profile follows the adoption trend of content
>>         negotiation
>>         by other dimensions? Then the script would need to expect
>>         other means of
>>         offering the list of possible profiles. Certainly at least
>>         initially,
>>         adoption will be low. So adding negotiation to the mix adds
>>         complexity
>>         rather than removing it.
>>
>>         Consider the case where the client is a script for a web
>>         application.
>>         The script needs data with a specific profile to work at
>>         all.  This case
>>         works with negotiation, but it's not clear to me that it
>>         wouldn't work
>>         as well with a link-based approach, e.g. a link with an
>>         attribute that
>>         indicates its profile. The threshold to use on the
>>         publisher's side is
>>         extremely low for that approach. On the client side, it's
>>         easier and
>>         faster to check an attribute in a link than to try to follow
>>         it and then
>>         parse the header to see if you received what you wanted.
>>
>>         Re registration, if you want user agents to be able to do
>>         anything with
>>         your MIME type other than download it, it needs to be
>>         registered. I
>>         suppose that, if the profile creator wants user agents to be
>>         able to do
>>         anything profile-specific with a dataset, they would supply a
>>         dereferenceable IRI.
>>
>>         Re representations vs resources, I think we agree that they are
>>         something of a continuum. That's what I mean when I say it's
>>         a choice
>>         whether to treat an entity as one or the other. I'm thinking
>>         of content
>>         negotiation, where a resource is a thing with a URL and a
>>         representation
>>         is a version of it that a user agent may receive depending on
>>         the accept
>>         headers in the request.
>>
>>         -Annette
>>
>>
>>
>>         On 6/5/18 2:13 PM, Ruben Verborgh wrote:
>>         > Hi Annette,
>>         >
>>         >> What do you mean? Links are already available in http.
>>         > Yeah, but you'd need a standardized way to say
>>         > "this link points to representation of X with profiles Y, Z"
>>         >
>>         >>> Content negotiation is simply an existing mechanism
>>         >>> for connecting a resource to representations,
>>         >>> so reusing it seems better than inventing a new
>>         link-based negotiation mechanism.
>>         >> You are assuming the need for negotiation. That's what I'm
>>         asking you to justify.
>>         > No, I'm assuming a need for clients
>>         > to automatically find the representation they want,
>>         > and I'm proposing content negotiation for that
>>         > as opposed to a link-only mechanism.
>>         >
>>         >>> Furthermore, linking assumes that there is a finite
>>         number of representations,
>>         >>> and not a combinatorial explosion of all combinations
>>         that can be made.
>>         >> There *is* a finite number of representations that would
>>         be available.
>>         > Finite, yes. Necessarily small, no.
>>         >
>>         >> You would have to configure the server to return the right
>>         representations, and you would have to have created each of
>>         those representations.
>>         > In any case, but that's independent of the mechanism to
>>         find them.
>>         >
>>         >>> Finally, it integrates with negotiation in order
>>         dimensions, such as
>>         >>> "give me the French document in XML conforming to
>>         profiles X, Y, Z".
>>         >> Yes, that is nice. But there are other possible dimensions
>>         to data. Why negotiate for this one?
>>         > Quite the contrary: let's negotiate all dimensions.
>>         > We already do this for content type and language.
>>         >
>>         >> One can think of different versions of datasets as
>>         different resources if one wants.
>>         > Yes, the usage of content negotiation does not alter that.
>>         >
>>         >> In fact, one could argue that it is always a different
>>         resource because it contains different values.
>>         > Sure, but that is independent of the mechanism to arrive at
>>         the right one.
>>         >
>>         >> It's a choice to decide that it should be treated as a
>>         representation. What motivates that choice?
>>         > You seem to use "representation" as an opposite of
>>         "resource", but that's not correct.
>>         > As I've explained on GitHub, "representation" is a relative
>>         notion, not an absolute one:
>>         >
>>         >>> To understand this, it's important to see that the
>>         "representation" concept is a relative notion. E.g., in the
>>         sentence "A is a representation of B", B the resource that A
>>         is the representation of. However, A is a resource in its own
>>         right.
>>         >>>
>>         >>> An example to clarify:
>>         >>>
>>         >>>     • http://example.org/weather/amsterdam/2018-06-01 is
>>         the weather report for Amsterdam for 1 June
>>         >>>     •
>>         http://example.org/weather/amsterdam/2018-06-01.html is the
>>         weather report for Amsterdam for 1 June in HTML
>>         >>> Regardless of whether 2 has its own URL, all of the
>>         following hold:
>>         >>>
>>         >>>     • 1 is a resource
>>         >>>     • 2 is a resource
>>         >>>     • 2 is a representation of 1
>>         >>>> Why is automated discovery needed?
>>         >>> Because it's a manual thing otherwise.
>>         >> That is a tautology.
>>         > I'll try to explain better.
>>         >
>>         > If you have a client that fetches resources represented in
>>         a certain profile,
>>         > do you want it to ask you every time what link it should
>>         follow,
>>         > or do you want it to be able to select the right link itself?
>>         >
>>         >>> You don't want your client to ask you what links to follow.
>>         >> Why not? That is how hypermedia APIs work.
>>         > Nothing in hypermedia APIs requires clients to ask you such
>>         things.
>>         > It is a possibility, but not a requirement.
>>         >
>>         >> Adding negotiation as a new alternative means that
>>         crawling the web of data has to involve checking for profile
>>         options by content negotiation in addition to checking what
>>         is available through links.
>>         > You're still free to link to them.
>>         >
>>         >> But I get the feeling you have a specific use case in mind
>>         where this all makes immediate sense. *What is that use case?*
>>         > I have a client that can read certain JSON profiles.
>>         > I want that client to operate on dataset X.
>>         > The client should be able to get X in a profile it understands.
>>         >
>>         >> Registration of new MIME types is needed.
>>         > I'm afraid that's not correct.
>>         > I can just start using application/vnd.my-thing whenever I
>>         want to,
>>         > and I do not need to register that with IETF.
>>         >
>>         >> How do you get around new profiles needing to be registered?
>>         > You mint an IRI for them.
>>         >
>>         > Best,
>>         >
>>         > Ruben
>>
>>         -- 
>>         Annette Greiner
>>         NERSC Data and Analytics Services
>>         Lawrence Berkeley National Laboratory
>>
>>
>
>     -- 
>     Annette Greiner
>     NERSC Data and Analytics Services
>     Lawrence Berkeley National Laboratory
>

-- 
Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory
Received on Wednesday, 6 June 2018 01:18:32 UTC