RE: Proposed changes to Moki - External Resources Test from Jo Rabin on 2007-07-30 (public-mobileok-checker@w3.org from July 2007)

From: Jo Rabin <jrabin@mtld.mobi>
Date: Tue, 31 Jul 2007 00:35:16 +0100
To: "Sean Owen" <srowen@google.com>
Cc: "public-mobileok-checker" <public-mobileok-checker@w3.org>
Message-ID: <C8FFD98530207F40BD8D2CAD608B50B4552666@mtldsvr01.DotMobi.local>
I'm not sure we are on exactly the same wavelength here as I don't think
that we are all that worried about unidentified content types. And I
don't think that we are going very far down the "too general" route
really.

The issue wrt to images at least is that when presented with an IMG or
OBJECT element one does not know what the content type is in advance of
retrieving it. So one must retrieve it and construct an appropriate
object depending on the image type. If the image type is not one we are
interested in then we should construct an "unknown" image object to hold
at least the content type for later inspection. 

Reporting where the references to images are found does seem like it is
important for a 1.0 implementation, especially when those images are
being retrieved as a result of a stylesheet which itself might be
imported from another stylesheet. Without that information I think
developers are likely to be very lost when they see results referring to
images they have never heard of and won't find in the primary resource.

Jo
 

> -----Original Message-----
> From: Sean Owen [mailto:srowen@google.com]
> Sent: 30 July 2007 23:34
> To: Jo Rabin
> Cc: public-mobileok-checker
> Subject: Re: Proposed changes to Moki - External Resources Test
> 
> My general reaction is that this is getting too complicated -- for
> version 1.0 at the very least. Retrieve images, examine their
> Content-Type. If missing, assume it's not a supported image. If
> present and it's GIF or JPEG, great, parse it. If it's something else,
> assume it's not supported.
> 
> I strongly believe we need to favor simple solutions that solves the
> problem of "implement mobileOK Basic 1.0" first. It's good to develop
> this into a more general platform for evaluating a web resource but we
> haven't quite signed on for that just yet. At the moment scope and
> complexity appears to be outpacing progress towards an implementation.
> This particular issue -- unidentified content types -- feels
> corner-case-ish to me. I am happy for version 1.0 to go out with a
> crude reaction to this situation as long as it's handling the 99% of
> other cases usefully. And then this can be tackled.
> 
> On 7/25/07, Jo Rabin <jrabin@mtld.mobi> wrote:
> >
> >
> >
> >
> > Hi Laura
> >
> >
> >
> > Sorry about taking a long time to get back, you ask good questions!
Some
> > thoughts.
> >
> >
> >
> > We should include all resources pointed to by img tags. I noticed in
the
> > preprocess method that the checker seems to want to assess content
type
> by
> > looking for the file extension. But that is really not right at all.
> It's
> > not necessarily the case that images will have a file extension, and
> even
> > when they do, it's an error to infer the content type from them -
see
> e.g.
> > [1] and [2] which make it clear that the resource must be retrieved
and
> its
> > Content-Type header examined in order to determine its type.
> >
> >
> >
> > [1]
> > http://www.w3.org/2001/tag/doc/metaDataInURI-31.html#erroneous
> >
> > [2] http://www.w3.org/2001/tag/doc/mime-respect#missing
> >
> >
> >
> > Even though the object element allows the specification of content
type,
> > browsers typically taste the content of nested objects even in the
> presence
> > of this information to determine the actual content type. Given that
> they
> > stop when they find something they like, it's a good question to ask
> whether
> > the checker should continue and whether and where it should put
those
> > references in moki. It's an even better question to wonder how the
xslt
> > would differentiate between those objects that should be counted and
> those
> > that should not.
> >
> >
> >
> > So some thoughts about the code:
> >
> >
> >
> > 1. Given that the image type is not known in advance of retrieving
it,
> and
> > given that the image may not be of a known type, there seems to be
the
> need
> > for a factory somewhere which constructs a JPEG resource, a GIF
resource
> or
> > a generic image resource depending on the result of the retrieval.
It
> looks
> > like the image element in moki needs to be extended to include an
image
> type
> > which should be set to the media type of the response under the
> imageInfo
> > element.
> >
> >
> >
> > 2. When processing images (and links and so on) in the primary
document,
> I
> > think that duplicates should not be suppressed and the duplicate
> detection
> > should be handled in the preprocess method. Aside from anything
else,
> the
> > detection of duplicates should be done on a canonical URI not just a
> text
> > match (and on the absolute version of the URI, for that matter).
Though
> as
> > we saw from a little test that Dom put together real browsers do
appear
> to
> > do a textual match, so that aspect of the behaviour needs to be
> centralized
> > so we can change it easily or control it by a switch.
> >
> >
> >
> > 3. In the CSSResource class, an image list needs to be constructed
and
> then
> > processed as above.
> >
> >
> >
> > 4. The same observation applies to link elements as to images. Since
CSS
> > files can include other CSS files they need to have a list of
included
> CSS
> > and that needs to be preprocessed according to the same URI matching
> > strategy.
> >
> >
> >
> > 5. Ideally, each of the lists of URIs should provide a reference to
> where
> > they were found in the source of whatever document they were found
in
> for
> > error reporting purposes. (Did I hear a collective groan about line
and
> > column number references L) and so that the moki document can
provide
> the
> > info that an image/css was in error and is referenced in 7 rather
than
> just
> > one place.
> >
> >
> >
> > 6. I think there is a need for an objects element in moki. It should
> contain
> > objects and the objects should say a) what their content type is and
b)
> > whether they should be counted as an external reference. That should
be
> easy
> > enough to do. What's not so obvious is what to do about text/html
when
> it is
> > found in an object and I think the answer is that it should be
counted
> and
> > skipped.
> >
> >
> >
> > 7. Oh, and finally, before I forget. There is the case (401
> Authentication)
> > where both the page presented with the response and the primary
document
> are
> > tested and the external resources from the authentication page are
added
> to
> > the total. On reflection, I think we should think again about this
> behaviour
> > before we go to the next last call of the mobileOK doc. And not
worry
> about
> > it in the code for now. (Famous last words)
> >
> >
> >
> > I've just checked in some updates with a couple of TODOs in the
relevant
> > places, I hope.
> >
> >
> >
> > I'll also update the moki example doc with the suggestions I made.
And
> while
> > I am about it I will generate a schema for moki. It's about time.
> >
> >
> >
> > Hope this helps. Oh and these are just my suggestions, you or anyone
> else
> > may have better ones.
> >
> >
> >
> > Jo
> >
> >
> >
> >
> >  ________________________________
> >
> >
> > From: public-mobileok-checker-request@w3.org
> > [mailto:public-mobileok-checker-request@w3.org] On Behalf
> > Of Laura Holmes
> >  Sent: 24 July 2007 23:54
> >  To: public-mobileok-checker
> >  Subject: Proposed changes to Moki - External Resources Test
> >
> >
> >
> >
> > Hi all,
> >  I just wanted to run some changes by you all and get some feed
back.
> > Currently, I'm working on the ExternalResourcesTest and am running
into
> > conditions that haven't been accounted for in the existing code.
These
> > conditions include:
> >
> >  1) counting references contained in objects that are not jpeg or
gif:
> >  there are many other image types and other types of objects (such
as
> > applications or audio) that may be included on a page. I'm assuming
that
> we
> > want to include these references even if they can't be rendered on a
> mobile
> > phone due to a comment made regarding nested objects: "For nested
object
> > elements, count only the number of objects that need to be assessed
> before
> > content matching the request header defined in 2.3.2 HTTP Request is
> found."
> > So, we want to assess other content types other than jpeg and gif
when
> > counting external resources.
> >
> >  2) keeping track of unique references to resources that are other
than
> jpeg
> > or gif:
> >  If two references are made in the primary document to the same
image,
> it is
> > only counted once, but if we reference the same image in css, we
> currently
> > don't have a way of tracking this.
> >
> >  3) references contained in nested objects are counted regardless of
> whether
> > or not the reference is actually reached:
> >  We only identify object nodes by name, not in serial order.
> >
> >  Here are my proposed changes I want to make, which would entail
> changing
> > the shape of the moki doc a bit:
> >
> >  We create an ArrayList of URIs that is maintained throughout the
> entirety
> > of the parsing process. When a reference to a resource is
encountered,
> we
> > check to see if the list already contains that URI. This list will
> contain a
> > list of all the resources contained in both the primary doc and css
> files.
> > At the end of the parsing process, we can add an additional node to
any
> > location in the moki that states the length of the list ( i.e. how
many
> > unique resources were encountered). I propose adding this as it's
own
> node
> > under moki, as it spans information in the primary doc, images, and
css.
> > Because we only want to record the number of unique references, I
can't
> see
> > any other way to pull it from the moki document using xsl. I'm open
to
> any
> > other suggestions.
> >
> >  As to the nested object problem, I'm at a loss for solutions given
our
> > current implementation of the DOM. Suggestions?
> >
> >  Thanks for your input in advance,
> >  Laura
> >
> >
Received on Monday, 30 July 2007 23:35:41 UTC