RE: mobileOK intermediate format (moki) from Jo Rabin on 2007-04-20 (public-mobileok-checker@w3.org from April 2007)

From: Jo Rabin <jrabin@mtld.mobi>
Date: Fri, 20 Apr 2007 13:49:25 +0100
To: "James Pearce" <jpearce@mtld.mobi>, <public-mobileok-checker@w3.org>
Message-ID: <C8FFD98530207F40BD8D2CAD608B50B42364E5@mtldsvr01.DotMobi.local>
Hi James

Thanks this is v useful. The process of translation from a mind map to
XML example has answered some of your points and raised some others.

More in line below.

Jo


> -----Original Message-----
> From: James Pearce [mailto:jpearce@mtld.mobi]
> Sent: 20 April 2007 09:23
> To: Jo Rabin; public-mobileok-checker@w3.org
> Subject: RE: mobileOK intermediate format (moki)
> 
> A few obvious points sprang to mind:
> 
>  * Will you capture all HTTP request headers (literal/decomposed as
for
> the response)?
>    - I know they're generally DDC so potentially constant, but should
> the schema assume that?
>    - what happens when DDC changes in the future?
>    - I guess you could put them in the "record once" section as
"browser
> characteristics"
>    - of course I am thinking of implementations that will use other UA
> simulations

The way the example looks, as of 2300 last night (!) we capture the sent
headers explicitly for the reasons you outline - like the format is more
generally useful if you relax the assumption about DDC, and specifically
have the implementation use things like cookies, which introduces
variability between requests.

For the DDC itself, you could record once. However there's a nearly
explicit rule that says a single XPath expression should give you the
answer, rather than if the info is not there, then look somewhere else.
It's clearly a trade off between efficiency/simplicity of XPath and
document weight. I made the opposite trade off when looking at recording
tester configuration - viz I thought it was worth reducing the document
weight at the expense of allowing details either to be stored in the
document itself, or in (say) an external file - on the basis that this
info though crucial will rarely be looked at.

> 
>  * I think it would be good to be rigorous about timing points if the
> stack is exposing them
>    - e.g. not just start-end, but
> start-DNS-connect-startRequest-endRequest-startResponse-endResponse
>    - or make the schema support 0-N <timing> nodes for extensibility
>    - again, not important for mOk per se, but thinking of
> implementations that will record performance

"I don't disagree" (TM). However how many stacks do expose this? And
because of the way TCP coalesces buffers and so on how accurate would be
it anyway?

One of the things that the HTTP in RDF made me think about more
carefully than I did before was reuse of sockets per keep-alive. I think
it's a good idea to record this, though it raises the ugly spectre of
having to think about threaded models for retrievals and how to
serialise the log of those activities and spool it simply into a file.
(If that's the way you chose to do it).

Think that more thought needs to be given to timing so this would be a
useful thread to start.

> 
>  * "decompression is moot"
>    - again, I think it may be brave to make the schema assume DDC
> requests

Yes, that's true. 

> 
>  * You need some referentiality between requests - why was a request
> made?
>    - was it the root URL?
>    - was it as a result of a redirect?
>    - was it an object referenced in another request? which?
>    - was it linked to from another page? which? (thinking of
> implementations that will crawl)
>    - i.e. apart from the root, you need one of redirectorID, parentID,
> refererID in each request

The proposed XML does contain references for each object that is
retrieved explaining why it was retrieved (and if a caching model is in
operation, why not).

> 
>  * Errors, yes. Like timings, howabout 0-N <error> nodes for
> extensibility
>    - in general, error classes might include
>      - "system" (e.g. checker has broken)
>      - "network" (e.g. couldn't get anything)
>      - "parser" (e.g. got something, couldn't make sense of it)
>      - "test results"
>    - NB the boundaries between these classes may not always be clear
> cut:
>      - Nothing came back: was that my fault or the server's?
> 
Yes this is an area for further elaboration. Fwiw the current proposed
XML makes a start at defining a multiple error node structure, though I
haven't had time to work on what that looks like when applied to DNS
errors or broken TCP connection etc.

Definitely more thought needed as to error class structure and how that
is serialised.

> 
> Some fleeting thoughts for a Friday am.
> 
> James
> 
> 
> 
> -----Original Message-----
> From: public-mobileok-checker-request@w3.org
> [mailto:public-mobileok-checker-request@w3.org] On Behalf Of Jo Rabin
> Sent: 19 April 2007 23:35
> To: public-mobileok-checker@w3.org
> Subject: mobileOK intermediate format (moki)
> 
> Hi
> 
> My action was to develop the schema for this, and apologies as it has
> taken me longer than expected. Well, given that you expect things to
> take longer than expected.
> 
> It's very much based on the ideas we developed while batting examples
> back and forth between me and CTIC - and it's also taken into account
> the requirements we drew up into Dublin.[1] I know there's some stuff
> missing (e.g. CSS analysis for MEASURES) and Error examples. Hopefully
> not that much.
> 
> [1]
>
http://lists.w3.org/Archives/Public/public-mobileok-checker/2007Apr/att-
> 0009
> /Intermediate_Document_Format.svg
> 
> I have really struggled with trying to work the HTTP in RDF stuff in.
In
> the end, taking Nachos words a couple of weeks ago to heart - viz it
> can't be used as RDF if it is in an XML document, I thought the best
> thing was to take inspiration from that document, and copy it where
> possible. I have taken the opportunity to try to reduce the verbosity
> and to allow for single consistent XPATH expressions to extract items
of
> interest to mobileOK. This is of course a decision that is personal
and
> can be reversed.
> 
> A couple of other points.
> 
> a) this is an example doc and not a schema. I'll go about generating a
> schema and some documentation once there is some feedback and
iteration.
> 
> b) I wonder if it would be a good idea to separate out the http part
> (and possibly others) into standalone schemas?
> 
> c) Error codes. We do need a system of allocating codes for errors,
> warns etc infos. Poss we should look at the validation engines and see
> what we need to represent before diving too deeply into this.
> 
> d) I decided to call it moki. Provisionally.
> 
> Jo
Received on Friday, 20 April 2007 12:51:14 UTC