RE: Comments on WD-HTTP-in-RDF-20070301

Hi Shadi

Thanks for your reply - clarifications below ...

> -----Original Message-----
> From: Shadi Abou-Zahra [mailto:shadi@w3.org]
> Sent: 15 March 2007 09:43
> To: Jo Rabin
> Cc: public-wai-ert@w3.org
> Subject: Re: Comments on WD-HTTP-in-RDF-20070301
> 
> Hi Jo,
> 
> Thank you for your comments on our internal draft. We were about to
> publish this document as an updated Working Draft in the next few days
> so don't be surprised if you see it pop up, we will address your
> comments either before or after publication. Please find some initial
> responses to your comments inline below:
> 
> 
> Jo Rabin wrote:
> > 1. Is it in scope of this work to represent when a request fails outside
> of
> > HTTP - e.g. the response is not valid HTTP, or the TCP connection fails
> for
> > some reason or another. It would be convenient to have a consistent
> > representation of success and failure cases.
> 
> The current scope is strictly focused on recording the request/response
> messages rather than making any statements about them (such as "valid",
> "successful", "failed", etc.). Can you elaborate a use case scenario to
> help us consider such a scope extension?

Our use case is that we want to gather together all the resources that are
relevant to mobileOK for a particular URI. So if resolving the URI results
in an HTML document containing references to stylesheets and to images, for
example, we will go and get those and record the retrievals. If for some
reason a TCP connection cannot be established (DNS resolution failure, for
example) - or the connection resets after some receipt of data, then we'd
like to record that as part of the test result. Rather than establishing a
mechanism that is orthogonal to the HTTP-in-RDF mechanism it would be
convenient to use an elaboration of it so that we can show we tried to get
the resource, but it failed because of network problems, in the same format
as we show that we tried to get the resource but it failed through e.g. a
5xx error. I'm not sure we care terribly whether the failure is caused by
server mal-operation or network mal-operation. A failure is a failure.

> 
> 
> > 2. It would be useful to timestamp requests and responses.
> 
> Also pushes the scope a little even though it seems useful, I'll bring
> this back to the group.

OK, I regret that I haven't examined any scope related documentation, I'm
just looking at the suitability of the spec for meeting our requirements.

Just to reinforce this pushing of the envelope: in many cases what you get
from a URI is more dependent on the time of the retrieval than on the
headers presented, so this may be more important information to us.

> 
> 
> > 3. I understand that there is an extension mechanism in HTTP for the
> request
> > method. It this modelled in this specification?
> 
> Do you mean for providing additional headers or something else? Would be
> good if you can give a pointer to an RFC or such.
> 
Yes, sure, should have included the reference before:

http://www.ietf.org/rfc/rfc2616.txt

5.1.1 Method

   The Method  token indicates the method to be performed on the
   resource identified by the Request-URI. The method is case-sensitive.

       Method         = "OPTIONS"                ; Section 9.2
                      | "GET"                    ; Section 9.3
                      | "HEAD"                   ; Section 9.4
                      | "POST"                   ; Section 9.5
                      | "PUT"                    ; Section 9.6
                      | "DELETE"                 ; Section 9.7
                      | "TRACE"                  ; Section 9.8
                      | "CONNECT"                ; Section 9.9
                      | extension-method
       extension-method = token


> 
> > 4. It's potentially useful to record both the absolute URI used in a
> request
> > and the relative URI that was used to form it - e.g. when checking links
> > from an HTML document.
> 
> Currently we only record whatever was sent to/from the server without
> any transformation or interpretation. Even such an expantion is left to
> the application level.
> 
I can see that it may be of borderline relevance in many cases, but in our
case it will be useful to be able to check that the checker has offset a
relative reference to a base correctly.

> 
> > 5. I'm not clear as to what normalisation is pre-supposed on the
> contents of
> > the various header field values. For our purposes it would be useful to
> have
> > those values in a normal form, where possible. Equally it would be
> useful,
> > for audit purposes, to have a literal representation of the unprocessed
> > headers.
> 
> Can you reformulate the question, I'm not sure I quite understand it.

1. For audit purposes it will be useful to (optionally) have a textual
transcription of the headers, as they appeared on the connection, to show
that they have been correctly formulated in HTTP-in-RDF.

2. nocache and NOCACHE, for example, are both fine as values. Someone has to
normalise the values somewhere and since the application that is producing
HTTP-in-RDF is already manipulating the Headers it will be useful to have
the values in a normal form with respect to capitalisation and white-space.

> 
> 
> > 6. It would be useful for those header field values that have structure
> to
> > be represented so that their components are exposed in a way that allows
> > easy access via XPATH expressions.
> 
> Do you mean pre-parsing the literal values and expressing them in RDF
> vocabulary? For example, currently the content-type header contains a
> literal value like "application/xhtml+xml". This is the string sent by
> the server as-is. Do you mean having explicit RDF terms for certain
> values such as "application/xhtml+xml"?
> 
No, not a vocabulary as I think that would suffer extensibility problems.

What I mean is, for example, that the Content-Type value can be something
like:

application/xhtml+xml; charset=UTF-8

rather than asking each user application to implement a parser for this it
would be very handy for HTTP-in-RDF to specify a decomposition of field
values and their parameters so that the information is more readily
accessible.


> 
> > 7. It's a little inconvenient to have two different representations for
> > Headers. Is it an error to use an additionalHeader object where a
> specific
> > object could have been used?
> 
> Yes, we should elaborate this (possibly in the conformance section)
> -additionalHeader is only intended to be used for expressing headers
> that are not listed in the schema.

As I mention, from the user application point of view, this means that you
have to implement two different access methods, which is inconvenient but
not terminal.

> 
> 
> > 8. You provide a linkage between a request and its response - it might
> be
> > useful to provide also a linkage between a response and a request
> (either
> > the one that it relates to, or more interestingly, perhaps, a request
> that
> > was triggered by a redirect or by following a link within the body).
> 
> The first case should be covered by RDF, you can query the data to find
> all requests that are related to a response.
> 
> As to the latter case, this would require an interpretation of the
> content and is currently out of scope. Specifically different user
> agents may trigger different reactions to a response, for example load
> or ignore linked CSS pages, RSS feeds etc. It would be significantly
> tricky to relate requests/response thus currently we only record the
> mere interaction.
> 
> 
> > 9. On the representation of the Body
> > a. when you say XML? In the flow chart, do you mean that the content-
> type
> > indicates XML or do you mean that the content is well-formed XML?
> 
> Well-formed XML.


Isn't there a problem with things parsing as XML but not being intentionally
XML?

> 
> 
> > b. If the content is XML delivered with the content type text/html then
> is
> > this considered XML?

How do you know it is XML if it is delivered with that content type?

> 
> Yes, if it is well-formed (and doesn't break the RDF document it is
> embedded in).
> 
> 
> > c. Isn't there the possibility that a malformed document would break
> this
> > document when included.
> 
> It needs to be well-formed, and an additional check for impact on the
> RDF environment is also required (but not elaborated in the description
> of the algorithm, we will fix this).
> 
> 
> > d. Is there an issue with the use of a CDATA section? What happens if
> the
> > data itself contain a CDATA section?
> 
> Yup, we missed a "does it break the RDF? -> then record as a byte
> sequence" step in the algorithm.

I meant to add that it is very important to us to record the XML declaration
and the DOCTYPE. Clearly inclusion of either of these will immediately break
the document.

> 
> 
> > e. Is there an issue with transparency of data - in that if the body
> itself
> > contains the literal string </http:body> does this cause a problem?
> 
> See above.
> 
> 
> > 10. HTTP Response code - what should one do if the response code is not
> one
> > of those enumerated?
> 
> Good point, we have a closed enumeration right now. This is because it
> is the enumeration in the respective RFC but I agree that it should be
> extensible for other purposes.
> 
> 
> > 11. It would be useful to record the size of the headers and the body.
> 
> You keep trying to push the scope, ay? ;) Similar to the timestamps in
> #2, this seems like a scope creep though quite easy to do.

As above, I apologise that I didn't inform myself on your scope, but from an
application perspective having this information in the same place as the
rest of the HTTP information would be very convenient.

> 
> 
> > 12. I just realised that the connection structure is intended for
> modelling
> > the requests on a keep-alive connection ... the order of the requests is
> > significant, I suppose?
> 
> Yes, I think there should be a sequence list somewhere (currently the
> order is not captured by default).
> 
> 
Thanks
Jo

Received on Thursday, 15 March 2007 10:21:33 UTC