Re: Comments on WD-HTTP-in-RDF-20070301 from Shadi Abou-Zahra on 2007-03-15 (public-wai-ert@w3.org from March 2007)

From: Shadi Abou-Zahra <shadi@w3.org>
Date: Thu, 15 Mar 2007 12:16:22 +0100
To: Jo Rabin <jo@linguafranca.org>
Cc: public-wai-ert@w3.org, public-mobileok-checker@w3.org
Message-ID: <45F92B06.8000909@w3.org>
Hi Jo,

Thanks for further clarifying your comments, I think we are mostly set 
now and the ball is in our court to address them. Find some additional 
comments below.


Jo Rabin wrote:
>>> 1. Is it in scope of this work to represent when a request fails outside
>> of
>>> HTTP - e.g. the response is not valid HTTP, or the TCP connection fails
>> for
>>> some reason or another. It would be convenient to have a consistent
>>> representation of success and failure cases.
>> The current scope is strictly focused on recording the request/response
>> messages rather than making any statements about them (such as "valid",
>> "successful", "failed", etc.). Can you elaborate a use case scenario to
>> help us consider such a scope extension?
> 
> Our use case is that we want to gather together all the resources that are
> relevant to mobileOK for a particular URI. So if resolving the URI results
> in an HTML document containing references to stylesheets and to images, for
> example, we will go and get those and record the retrievals. If for some
> reason a TCP connection cannot be established (DNS resolution failure, for
> example) - or the connection resets after some receipt of data, then we'd
> like to record that as part of the test result. Rather than establishing a
> mechanism that is orthogonal to the HTTP-in-RDF mechanism it would be
> convenient to use an elaboration of it so that we can show we tried to get
> the resource, but it failed because of network problems, in the same format
> as we show that we tried to get the resource but it failed through e.g. a
> 5xx error. I'm not sure we care terribly whether the failure is caused by
> server mal-operation or network mal-operation. A failure is a failure.

Right. Our model was that such information should go into EARL rather 
than into the HTTP vocabulary. Here is a sample scenario:

1. person initiates mobileOK checker for a given URI
2. mobileOK checker attempts to access the URI document
3. HTTP vocabulary is used to record the request/response
4. EARL is used to record the result of the given check

In your case, step 2 is not successful. The checker should still record 
whatever request it sent to the server and whatever it may have gotten 
in response (if available); then record a "cannotTell" test result (with 
an appropriate description) in EARL. Does that work for your needs?


>>> 2. It would be useful to timestamp requests and responses.
>> Also pushes the scope a little even though it seems useful, I'll bring
>> this back to the group.
> 
> OK, I regret that I haven't examined any scope related documentation, I'm
> just looking at the suitability of the spec for meeting our requirements.
> 
> Just to reinforce this pushing of the envelope: in many cases what you get
> from a URI is more dependent on the time of the retrieval than on the
> headers presented, so this may be more important information to us.

If you use HTTP vocabulary from within EARL, then you can use the 
dc:date timestamp that describes the Test Subject. But I see where you 
are coming from, as said I will take this back to the group.


>>> 3. I understand that there is an extension mechanism in HTTP for the
>> request
>>> method. It this modelled in this specification?
>> Do you mean for providing additional headers or something else? Would be
>> good if you can give a pointer to an RFC or such.
>>
> Yes, sure, should have included the reference before:
> 
> http://www.ietf.org/rfc/rfc2616.txt
> 
> 5.1.1 Method
> 
>    The Method  token indicates the method to be performed on the
>    resource identified by the Request-URI. The method is case-sensitive.
> 
>        Method         = "OPTIONS"                ; Section 9.2
>                       | "GET"                    ; Section 9.3
>                       | "HEAD"                   ; Section 9.4
>                       | "POST"                   ; Section 9.5
>                       | "PUT"                    ; Section 9.6
>                       | "DELETE"                 ; Section 9.7
>                       | "TRACE"                  ; Section 9.8
>                       | "CONNECT"                ; Section 9.9
>                       | extension-method
>        extension-method = token

OK. Not sure if we have a mechanism other than subclassing the generic 
Response class. I'll check with the group.


>>> 4. It's potentially useful to record both the absolute URI used in a
>> request
>>> and the relative URI that was used to form it - e.g. when checking links
>>> from an HTML document.
>> Currently we only record whatever was sent to/from the server without
>> any transformation or interpretation. Even such an expantion is left to
>> the application level.
>>
> I can see that it may be of borderline relevance in many cases, but in our
> case it will be useful to be able to check that the checker has offset a
> relative reference to a base correctly.

It seems you are looking at the vocabulary from an "inside the checker" 
perspective whereas I think we mostly looked at it from an "between the 
client and server" view. Interesting feedback for us to consider.


>>> 5. I'm not clear as to what normalisation is pre-supposed on the
>> contents of
>>> the various header field values. For our purposes it would be useful to
>> have
>>> those values in a normal form, where possible. Equally it would be
>> useful,
>>> for audit purposes, to have a literal representation of the unprocessed
>>> headers.
>> Can you reformulate the question, I'm not sure I quite understand it.
> 
> 1. For audit purposes it will be useful to (optionally) have a textual
> transcription of the headers, as they appeared on the connection, to show
> that they have been correctly formulated in HTTP-in-RDF.
> 
> 2. nocache and NOCACHE, for example, are both fine as values. Someone has to
> normalise the values somewhere and since the application that is producing
> HTTP-in-RDF is already manipulating the Headers it will be useful to have
> the values in a normal form with respect to capitalisation and white-space.

Ah, OK. We only do #1 so far, I think #2 refers again to the "inside 
checker" vs "between client/server" perspective issue.


>>> 6. It would be useful for those header field values that have structure
>> to
>>> be represented so that their components are exposed in a way that allows
>>> easy access via XPATH expressions.
>> Do you mean pre-parsing the literal values and expressing them in RDF
>> vocabulary? For example, currently the content-type header contains a
>> literal value like "application/xhtml+xml". This is the string sent by
>> the server as-is. Do you mean having explicit RDF terms for certain
>> values such as "application/xhtml+xml"?
>>
> No, not a vocabulary as I think that would suffer extensibility problems.
> 
> What I mean is, for example, that the Content-Type value can be something
> like:
> 
> application/xhtml+xml; charset=UTF-8
> 
> rather than asking each user application to implement a parser for this it
> would be very handy for HTTP-in-RDF to specify a decomposition of field
> values and their parameters so that the information is more readily
> accessible.

Same as above, will also take back to the group.


>>> 7. It's a little inconvenient to have two different representations for
>>> Headers. Is it an error to use an additionalHeader object where a
>> specific
>>> object could have been used?
>> Yes, we should elaborate this (possibly in the conformance section)
>> -additionalHeader is only intended to be used for expressing headers
>> that are not listed in the schema.
> 
> As I mention, from the user application point of view, this means that you
> have to implement two different access methods, which is inconvenient but
> not terminal.

OK, got your point now. Not sure how to address it though...


>>> 8. You provide a linkage between a request and its response - it might
>> be
>>> useful to provide also a linkage between a response and a request
>> (either
>>> the one that it relates to, or more interestingly, perhaps, a request
>> that
>>> was triggered by a redirect or by following a link within the body).
>> The first case should be covered by RDF, you can query the data to find
>> all requests that are related to a response.
>>
>> As to the latter case, this would require an interpretation of the
>> content and is currently out of scope. Specifically different user
>> agents may trigger different reactions to a response, for example load
>> or ignore linked CSS pages, RSS feeds etc. It would be significantly
>> tricky to relate requests/response thus currently we only record the
>> mere interaction.
>>
>>
>>> 9. On the representation of the Body
>>> a. when you say XML? In the flow chart, do you mean that the content-
>> type
>>> indicates XML or do you mean that the content is well-formed XML?
>> Well-formed XML.
> 
> 
> Isn't there a problem with things parsing as XML but not being intentionally
> XML?
> 
>>
>>> b. If the content is XML delivered with the content type text/html then
>> is
>>> this considered XML?
> 
> How do you know it is XML if it is delivered with that content type?
> 
>> Yes, if it is well-formed (and doesn't break the RDF document it is
>> embedded in).
>>
>>
>>> c. Isn't there the possibility that a malformed document would break
>> this
>>> document when included.
>> It needs to be well-formed, and an additional check for impact on the
>> RDF environment is also required (but not elaborated in the description
>> of the algorithm, we will fix this).
>>
>>
>>> d. Is there an issue with the use of a CDATA section? What happens if
>> the
>>> data itself contain a CDATA section?
>> Yup, we missed a "does it break the RDF? -> then record as a byte
>> sequence" step in the algorithm.
> 
> I meant to add that it is very important to us to record the XML declaration
> and the DOCTYPE. Clearly inclusion of either of these will immediately break
> the document.
> 
>>
>>> e. Is there an issue with transparency of data - in that if the body
>> itself
>>> contains the literal string </http:body> does this cause a problem?
>> See above.

Valid points, will consider them and refine the algorithm accordingly.


>>> 10. HTTP Response code - what should one do if the response code is not
>> one
>>> of those enumerated?
>> Good point, we have a closed enumeration right now. This is because it
>> is the enumeration in the respective RFC but I agree that it should be
>> extensible for other purposes.
>>
>>
>>> 11. It would be useful to record the size of the headers and the body.
>> You keep trying to push the scope, ay? ;) Similar to the timestamps in
>> #2, this seems like a scope creep though quite easy to do.
> 
> As above, I apologise that I didn't inform myself on your scope, but from an
> application perspective having this information in the same place as the
> rest of the HTTP information would be very convenient.

OK, need acknowledged.


>>> 12. I just realised that the connection structure is intended for
>> modelling
>>> the requests on a keep-alive connection ... the order of the requests is
>>> significant, I suppose?
>> Yes, I think there should be a sequence list somewhere (currently the
>> order is not captured by default).


Regards,
   Shadi


-- 
Shadi Abou-Zahra     Web Accessibility Specialist for Europe |
Chair & Staff Contact for the Evaluation and Repair Tools WG |
World Wide Web Consortium (W3C)           http://www.w3.org/ |
Web Accessibility Initiative (WAI),   http://www.w3.org/WAI/ |
WAI-TIES Project,                http://www.w3.org/WAI/TIES/ |
Evaluation and Repair Tools WG,    http://www.w3.org/WAI/ER/ |
2004, Route des Lucioles - 06560,  Sophia-Antipolis - France |
Voice: +33(0)4 92 38 50 64          Fax: +33(0)4 92 38 78 22 |
Received on Thursday, 15 March 2007 11:16:28 UTC