- From: Shadi Abou-Zahra <shadi@w3.org>
- Date: Thu, 15 Mar 2007 12:16:22 +0100
- To: Jo Rabin <jo@linguafranca.org>
- Cc: public-wai-ert@w3.org, public-mobileok-checker@w3.org
Hi Jo, Thanks for further clarifying your comments, I think we are mostly set now and the ball is in our court to address them. Find some additional comments below. Jo Rabin wrote: >>> 1. Is it in scope of this work to represent when a request fails outside >> of >>> HTTP - e.g. the response is not valid HTTP, or the TCP connection fails >> for >>> some reason or another. It would be convenient to have a consistent >>> representation of success and failure cases. >> The current scope is strictly focused on recording the request/response >> messages rather than making any statements about them (such as "valid", >> "successful", "failed", etc.). Can you elaborate a use case scenario to >> help us consider such a scope extension? > > Our use case is that we want to gather together all the resources that are > relevant to mobileOK for a particular URI. So if resolving the URI results > in an HTML document containing references to stylesheets and to images, for > example, we will go and get those and record the retrievals. If for some > reason a TCP connection cannot be established (DNS resolution failure, for > example) - or the connection resets after some receipt of data, then we'd > like to record that as part of the test result. Rather than establishing a > mechanism that is orthogonal to the HTTP-in-RDF mechanism it would be > convenient to use an elaboration of it so that we can show we tried to get > the resource, but it failed because of network problems, in the same format > as we show that we tried to get the resource but it failed through e.g. a > 5xx error. I'm not sure we care terribly whether the failure is caused by > server mal-operation or network mal-operation. A failure is a failure. Right. Our model was that such information should go into EARL rather than into the HTTP vocabulary. Here is a sample scenario: 1. person initiates mobileOK checker for a given URI 2. mobileOK checker attempts to access the URI document 3. HTTP vocabulary is used to record the request/response 4. EARL is used to record the result of the given check In your case, step 2 is not successful. The checker should still record whatever request it sent to the server and whatever it may have gotten in response (if available); then record a "cannotTell" test result (with an appropriate description) in EARL. Does that work for your needs? >>> 2. It would be useful to timestamp requests and responses. >> Also pushes the scope a little even though it seems useful, I'll bring >> this back to the group. > > OK, I regret that I haven't examined any scope related documentation, I'm > just looking at the suitability of the spec for meeting our requirements. > > Just to reinforce this pushing of the envelope: in many cases what you get > from a URI is more dependent on the time of the retrieval than on the > headers presented, so this may be more important information to us. If you use HTTP vocabulary from within EARL, then you can use the dc:date timestamp that describes the Test Subject. But I see where you are coming from, as said I will take this back to the group. >>> 3. I understand that there is an extension mechanism in HTTP for the >> request >>> method. It this modelled in this specification? >> Do you mean for providing additional headers or something else? Would be >> good if you can give a pointer to an RFC or such. >> > Yes, sure, should have included the reference before: > > http://www.ietf.org/rfc/rfc2616.txt > > 5.1.1 Method > > The Method token indicates the method to be performed on the > resource identified by the Request-URI. The method is case-sensitive. > > Method = "OPTIONS" ; Section 9.2 > | "GET" ; Section 9.3 > | "HEAD" ; Section 9.4 > | "POST" ; Section 9.5 > | "PUT" ; Section 9.6 > | "DELETE" ; Section 9.7 > | "TRACE" ; Section 9.8 > | "CONNECT" ; Section 9.9 > | extension-method > extension-method = token OK. Not sure if we have a mechanism other than subclassing the generic Response class. I'll check with the group. >>> 4. It's potentially useful to record both the absolute URI used in a >> request >>> and the relative URI that was used to form it - e.g. when checking links >>> from an HTML document. >> Currently we only record whatever was sent to/from the server without >> any transformation or interpretation. Even such an expantion is left to >> the application level. >> > I can see that it may be of borderline relevance in many cases, but in our > case it will be useful to be able to check that the checker has offset a > relative reference to a base correctly. It seems you are looking at the vocabulary from an "inside the checker" perspective whereas I think we mostly looked at it from an "between the client and server" view. Interesting feedback for us to consider. >>> 5. I'm not clear as to what normalisation is pre-supposed on the >> contents of >>> the various header field values. For our purposes it would be useful to >> have >>> those values in a normal form, where possible. Equally it would be >> useful, >>> for audit purposes, to have a literal representation of the unprocessed >>> headers. >> Can you reformulate the question, I'm not sure I quite understand it. > > 1. For audit purposes it will be useful to (optionally) have a textual > transcription of the headers, as they appeared on the connection, to show > that they have been correctly formulated in HTTP-in-RDF. > > 2. nocache and NOCACHE, for example, are both fine as values. Someone has to > normalise the values somewhere and since the application that is producing > HTTP-in-RDF is already manipulating the Headers it will be useful to have > the values in a normal form with respect to capitalisation and white-space. Ah, OK. We only do #1 so far, I think #2 refers again to the "inside checker" vs "between client/server" perspective issue. >>> 6. It would be useful for those header field values that have structure >> to >>> be represented so that their components are exposed in a way that allows >>> easy access via XPATH expressions. >> Do you mean pre-parsing the literal values and expressing them in RDF >> vocabulary? For example, currently the content-type header contains a >> literal value like "application/xhtml+xml". This is the string sent by >> the server as-is. Do you mean having explicit RDF terms for certain >> values such as "application/xhtml+xml"? >> > No, not a vocabulary as I think that would suffer extensibility problems. > > What I mean is, for example, that the Content-Type value can be something > like: > > application/xhtml+xml; charset=UTF-8 > > rather than asking each user application to implement a parser for this it > would be very handy for HTTP-in-RDF to specify a decomposition of field > values and their parameters so that the information is more readily > accessible. Same as above, will also take back to the group. >>> 7. It's a little inconvenient to have two different representations for >>> Headers. Is it an error to use an additionalHeader object where a >> specific >>> object could have been used? >> Yes, we should elaborate this (possibly in the conformance section) >> -additionalHeader is only intended to be used for expressing headers >> that are not listed in the schema. > > As I mention, from the user application point of view, this means that you > have to implement two different access methods, which is inconvenient but > not terminal. OK, got your point now. Not sure how to address it though... >>> 8. You provide a linkage between a request and its response - it might >> be >>> useful to provide also a linkage between a response and a request >> (either >>> the one that it relates to, or more interestingly, perhaps, a request >> that >>> was triggered by a redirect or by following a link within the body). >> The first case should be covered by RDF, you can query the data to find >> all requests that are related to a response. >> >> As to the latter case, this would require an interpretation of the >> content and is currently out of scope. Specifically different user >> agents may trigger different reactions to a response, for example load >> or ignore linked CSS pages, RSS feeds etc. It would be significantly >> tricky to relate requests/response thus currently we only record the >> mere interaction. >> >> >>> 9. On the representation of the Body >>> a. when you say XML? In the flow chart, do you mean that the content- >> type >>> indicates XML or do you mean that the content is well-formed XML? >> Well-formed XML. > > > Isn't there a problem with things parsing as XML but not being intentionally > XML? > >> >>> b. If the content is XML delivered with the content type text/html then >> is >>> this considered XML? > > How do you know it is XML if it is delivered with that content type? > >> Yes, if it is well-formed (and doesn't break the RDF document it is >> embedded in). >> >> >>> c. Isn't there the possibility that a malformed document would break >> this >>> document when included. >> It needs to be well-formed, and an additional check for impact on the >> RDF environment is also required (but not elaborated in the description >> of the algorithm, we will fix this). >> >> >>> d. Is there an issue with the use of a CDATA section? What happens if >> the >>> data itself contain a CDATA section? >> Yup, we missed a "does it break the RDF? -> then record as a byte >> sequence" step in the algorithm. > > I meant to add that it is very important to us to record the XML declaration > and the DOCTYPE. Clearly inclusion of either of these will immediately break > the document. > >> >>> e. Is there an issue with transparency of data - in that if the body >> itself >>> contains the literal string </http:body> does this cause a problem? >> See above. Valid points, will consider them and refine the algorithm accordingly. >>> 10. HTTP Response code - what should one do if the response code is not >> one >>> of those enumerated? >> Good point, we have a closed enumeration right now. This is because it >> is the enumeration in the respective RFC but I agree that it should be >> extensible for other purposes. >> >> >>> 11. It would be useful to record the size of the headers and the body. >> You keep trying to push the scope, ay? ;) Similar to the timestamps in >> #2, this seems like a scope creep though quite easy to do. > > As above, I apologise that I didn't inform myself on your scope, but from an > application perspective having this information in the same place as the > rest of the HTTP information would be very convenient. OK, need acknowledged. >>> 12. I just realised that the connection structure is intended for >> modelling >>> the requests on a keep-alive connection ... the order of the requests is >>> significant, I suppose? >> Yes, I think there should be a sequence list somewhere (currently the >> order is not captured by default). Regards, Shadi -- Shadi Abou-Zahra Web Accessibility Specialist for Europe | Chair & Staff Contact for the Evaluation and Repair Tools WG | World Wide Web Consortium (W3C) http://www.w3.org/ | Web Accessibility Initiative (WAI), http://www.w3.org/WAI/ | WAI-TIES Project, http://www.w3.org/WAI/TIES/ | Evaluation and Repair Tools WG, http://www.w3.org/WAI/ER/ | 2004, Route des Lucioles - 06560, Sophia-Antipolis - France | Voice: +33(0)4 92 38 50 64 Fax: +33(0)4 92 38 78 22 |
Received on Thursday, 15 March 2007 11:16:29 UTC