Re: Proposed changes to Moki - External Resources Test from Laura Holmes on 2007-07-27 (public-mobileok-checker@w3.org from July 2007)

From: Laura Holmes <holmes@google.com>
Date: Fri, 27 Jul 2007 10:05:59 -0400
To: "Jo Rabin" <jrabin@mtld.mobi>, public-mobileok-checker <public-mobileok-checker@w3.org>
Message-ID: <135a9f560707270705u3b0635d6s5a7c66951112dd56@mail.gmail.com>
On second thought, here's an addendum to that class I mentioned...

In the master list of external resources, the class/struct should really be
made up of a key (the actual URI of the resource) and a list of references.
That way when the resource is processed via it's URI, the references can be
added as nodes within the resource object.

On 7/26/07, Laura Holmes <holmes@google.com> wrote:
>
> > 3. In the CSSResource class, an image list needs to be constructed and
> then processed as above.
>
> >It sounds as if you're suggesting building up a list of references within
> the style sheet independent of the images contained in the primary document.
> I proposed the idea of a universal list of references because it's very
> possible that a >stylesheet could reference the same images as referenced in
> the primary document. If we construct an independent resource list from the
> css and the primary document, we might get some duplications.
>
>  >Well, what I am suggesting is that rather than passing a single list
> around the place, each resource is responsible for managing and maintaining
> its own list. I agree that there needs to be a central "cache" list so that
> when about to >check a resource you look at the list and see if you already
> did it. I think that it should be noted specifically in the moki document
> that you didn't retrieve it because you think it's the same as some
> specified other resource.
> What I think you mean is that each resource will manage it's own list of
> resources, but we'll pull all those resources together to create a master
> list before we process them. If each resource manages it's own list, as a
> list of HTTPResource objects, duplicate resources will exist under different
> resources' lists. In order to make sure we get the whole master list of
> resources before processing them, we'd probably have to completely dive into
> the css, create the list of image and object resources to be processed, and
> then process that list.
>
> Just so I'm completely certain of the structure of the moki, here's an
> example as how I see things being structured right now. Correct me if I'm
> thinking about this the wrong way or missing any nuances.
>
> <moki>
>    <primaryDoc/>
>    <stylesheets>
>         <stylesheet type="external"/>
>                 (references the stylesheet listed below, but you'd only
> know if you looked at the source of this css document)
>         <stylesheet type="external"/>
>    </stylesheets>
>    <images>
>          <image>
>                <reference/>
>                <reference/>
>          </image>
>          <image>
>                <reference/>
>                <reference/>
>          </image>
>     </images>
>     <objects>
>           <object>
>                 <reference/>
>                 <reference/>
>            </object>
>     </objects>
> </moki>
>
> The method of keeping track of resources list, that most closely resembles
> the final output as represented by this model, would be an array list of
> external resources - and each entry keeps track of where it is referenced.
> This might involve a simple external resource class, such as:
>
> public class ExternalResource {
>     private ArrayList<reference> references;
>     private HTTP____Resource;
> }
>
> I'm still figuring out how this would affect the flow of preprocessing,
> but I think fully processing the stylesheets as we encounter them would
> work, while each css style sheet maintains its own list of resource URIs. If
> one of those resources is a stylesheet, then we process the stylesheet while
> maintaining a simple cache list of stylesheets already processed. Then, as
> soon as all stylesheets are processed, we coalesce all the external
> resources that are either images or objects into a master list, and then we
> create the master list of fully fledged HTTPResources.
>
> Thoughts?
>
> Cheers,
> Laura
>
> P.S. Jo - I'd recommend Coupa Cafe as it's one of my favs and has free
> wireless internet. :)
>
> On 7/26/07, Jo Rabin <jrabin@mtld.mobi> wrote:
> >
> >   *From:* Laura Holmes [mailto:holmes@google.com]
> > *Sent:* 26 July 2007 16:13
> > *To:* Jo Rabin
> > *Subject:* Re: Proposed changes to Moki - External Resources Test
> >
> >
> >
> > > It looks like the image element in moki needs to be extended to
> > include an image type which should be set to the media type of the response
> > under the imageInfo element.
> > Given that there may be a difference between declared image type and the
> > actual image type, if we include an element the states the retrieved file
> > type and there's an element that states the declared image type, should we
> > issue a warning if these two pieces of information don't match?
> >
> > Sounds like a good idea, though this will only happen for Objects,
> > right? Img doesn't allow you to state the content type.
> >
> > > 2. When processing images (and links and so on) in the primary
> > document, I think that duplicates should not be suppressed and the duplicate
> > detection should be handled in the preprocess method.
> >
> > I'm unclear as to what we do after we've detected a duplicate, or how it
> > should be handled. I think we had a conversation a while ago about how
> > exactly multiple references should be represented within the moki document,
> > but I'm not entirely sure what we decided on. What I felt was the most
> > likely conclusion was that we record all references to that image, request
> > the image once, and then in the corresponding image moki information have
> > the image info mentioned once and all the listed references with line
> > numbers included.
> >
> >  Yes, that sounds good to me
> >
> > Is some of this answered with the moki example doc? Sean's out of town
> > and I don't have the link. If someone could send that to me, that'd be
> > great.
> >
> >  No, it's not, as I haven't updated it to do so L. I will do if it is
> > not too late over the next couple of days. Fwiw the current example doc is
> > referenced from the TF home page [1].
> >
> > [1] http://www.w3.org/2005/MWI/
> > BPWG/Group/TaskForces/Checker/Overview.html
> >
> >
> > > 3. In the CSSResource class, an image list needs to be constructed and
> > then processed as above.
> >
> > It sounds as if you're suggesting building up a list of references
> > within the style sheet independent of the images contained in the primary
> > document. I proposed the idea of a universal list of references because it's
> > very possible that a stylesheet could reference the same images as
> > referenced in the primary document. If we construct an independent resource
> > list from the css and the primary document, we might get some duplications.
> >
> >  Well, what I am suggesting is that rather than passing a single list
> > around the place, each resource is responsible for managing and maintaining
> > its own list. I agree that there needs to be a central "cache" list so that
> > when about to check a resource you look at the list and see if you already
> > did it. I think that it should be noted specifically in the moki document
> > that you didn't retrieve it because you think it's the same as some
> > specified other resource.
> >
> >
> > > 4. The same observation applies to link elements as to images. Since
> > CSS files can include other CSS files they need to have a list of included
> > CSS and that needs to be preprocessed according to the same URI matching
> > strategy.
> >
> > How deep are we diving as far as included resources? If a stylesheet
> > @imports another css file, do we evaluate that stylesheet as well? Or do we
> > just include the URI as an external resource?
> >
> >  We go as far as it takes (modulo media type restrictions) as that is
> > what a browser would (should) do.
> >
> >
> > > (Did I hear a collective groan about line and column number references
> > L )
> >
> > Yes, but not because it's impossible, it's just imperfect. As for final
> > recording of errors, we can reference line numbers (not column numbers) from
> > the primaryDoc/docContent of the moki document. This would give a roughly
> > accurate location, except that if the original source document had reduced
> > white space (as we suggest), the line numbers we report would not be the
> > actual line numbers of the source document. However, if we chose to report
> > the snippet of code that corresponds to that line number, that snippet would
> > provide much more specific and useful information.
> >
> > However, right now we only contain the docContent for the primary doc,
> > not the included CSSResources. For our currently line reporting solution, we
> > would also have to include the source for the css pages. For the sake of
> > error reporting but at the expense of keeping the moki smaller, would
> > everyone like to include that information?
> >
> > Yes, I think that is quite important. If we report an error in a
> > resource that is not actually referenced from the primary document it could
> > leave a developer scratching their head for quite a long time, unless there
> > is some way of finding what resource the error is in and where it is in that
> > resource.
> >
> > > 6. b) whether they should be counted as an external reference
> >
> > I think counting objects as an external resource is a good idea, but I'm
> > not sure what the criteria for being counted as an external resource would
> > include. As soon as I know, I can start working on it.
> >
> > That it would actually be retrieved in realistic situations. So each
> > object and its fall-back gets retrieved until one is found that matches the
> > request criteria.
> >
> >
> >
> > Thanks for your detailed response, Jo.
> >
> >  Sorry I didn't update the moki example doc yet, I will do that soon –
> > flying to Palo Alto tomorrow will do once there. From a coffee shop in
> > University Avenue, maybe J
> >
> > Jo
> >
> >
> >
> >  On 7/25/07, *Jo Rabin* < jrabin@mtld.mobi> wrote:
> >
> > Hi Laura
> >
> >
> >
> > Sorry about taking a long time to get back, you ask good questions! Some
> > thoughts.
> >
> >
> >
> > We should include all resources pointed to by img tags. I noticed in the
> > preprocess method that the checker seems to want to assess content type by
> > looking for the file extension. But that is really not right at all. It's
> > not necessarily the case that images will have a file extension, and even
> > when they do, it's an error to infer the content type from them – see
> > e.g. [1] and [2] which make it clear that the resource must be retrieved
> > and its Content-Type header examined in order to determine its type.
> >
> >
> >
> > [1] http://www.w3.org/2001/tag/doc/metaDataInURI-31.html#erroneous
> >
> > [2] http://www.w3.org/2001/tag/doc/mime-respect#missing
> >
> >
> >
> > Even though the object element allows the specification of content type,
> > browsers typically taste the content of nested objects even in the presence
> > of this information to determine the actual content type. Given that they
> > stop when they find something they like, it's a good question to ask whether
> > the checker should continue and whether and where it should put those
> > references in moki. It's an even better question to wonder how the xslt
> > would differentiate between those objects that should be counted and those
> > that should not.
> >
> >
> >
> > So some thoughts about the code:
> >
> >
> >
> > 1. Given that the image type is not known in advance of retrieving it,
> > and given that the image may not be of a known type, there seems to be the
> > need for a factory somewhere which constructs a JPEG resource, a GIF
> > resource or a generic image resource depending on the result of the
> > retrieval. It looks like the image element in moki needs to be extended to
> > include an image type which should be set to the media type of the response
> > under the imageInfo element.
> >
> >
> >
> > 2. When processing images (and links and so on) in the primary document,
> > I think that duplicates should not be suppressed and the duplicate detection
> > should be handled in the preprocess method. Aside from anything else, the
> > detection of duplicates should be done on a canonical URI not just a text
> > match (and on the absolute version of the URI, for that matter). Though as
> > we saw from a little test that Dom put together real browsers do appear to
> > do a textual match, so that aspect of the behaviour needs to be centralized
> > so we can change it easily or control it by a switch.
> >
> >
> >
> > 3. In the CSSResource class, an image list needs to be constructed and
> > then processed as above.
> >
> >
> >
> > 4. The same observation applies to link elements as to images. Since CSS
> > files can include other CSS files they need to have a list of included CSS
> > and that needs to be preprocessed according to the same URI matching
> > strategy.
> >
> >
> >
> > 5. Ideally, each of the lists of URIs should provide a reference to
> > where they were found in the source of whatever document they were found in
> > for error reporting purposes. (Did I hear a collective groan about line and
> > column number references L ) and so that the moki document can provide
> > the info that an image/css was in error and is referenced in 7 rather than
> > just one place.
> >
> >
> >
> > 6. I think there is a need for an objects element in moki. It should
> > contain objects and the objects should say a) what their content type is and
> > b) whether they should be counted as an external reference. That should be
> > easy enough to do. What's not so obvious is what to do about text/html when
> > it is found in an object and I think the answer is that it should be counted
> > and skipped.
> >
> >
> >
> > 7. Oh, and finally, before I forget. There is the case (401
> > Authentication) where both the page presented with the response and the
> > primary document are tested and the external resources from the
> > authentication page are added to the total. On reflection, I think we should
> > think again about this behaviour before we go to the next last call of the
> > mobileOK doc. And not worry about it in the code for now. (Famous last
> > words)
> >
> >
> >
> > I've just checked in some updates with a couple of TODOs in the relevant
> > places, I hope.
> >
> >
> >
> > I'll also update the moki example doc with the suggestions I made. And
> > while I am about it I will generate a schema for moki. It's about time.
> >
> >
> >
> > Hope this helps. Oh and these are just my suggestions, you or anyone
> > else may have better ones.
> >
> >
> >
> > Jo
> >
> >
> >   ------------------------------
> >
> > *From:* public-mobileok-checker-request@w3.org [mailto:
> > public-mobileok-checker-request@w3.org] *On Behalf Of *Laura Holmes
> > *Sent:* 24 July 2007 23:54
> > *To:* public-mobileok-checker
> > *Subject:* Proposed changes to Moki - External Resources Test
> >
> >
> >
> > Hi all,
> > I just wanted to run some changes by you all and get some feed back.
> > Currently, I'm working on the ExternalResourcesTest and am running into
> > conditions that haven't been accounted for in the existing code. These
> > conditions include:
> >
> > 1) counting references contained in objects that are not jpeg or gif:
> > there are many other image types and other types of objects (such as
> > applications or audio) that may be included on a page. I'm assuming that we
> > want to include these references even if they can't be rendered on a mobile
> > phone due to a comment made regarding nested objects: "For nested objectelements, count only the number of objects that need to be assessed before
> > content matching the request header defined in * 2.3.2 HTTP Request*<http://www.w3.org/TR/mobileOK-basic10-tests/#http_request>is found." So, we want to assess other content types other than jpeg and gif
> > when counting external resources.
> >
> > 2) keeping track of unique references to resources that are other than
> > jpeg or gif:
> > If two references are made in the primary document to the same image, it
> > is only counted once, but if we reference the same image in css, we
> > currently don't have a way of tracking this.
> >
> > 3) references contained in nested objects are counted regardless of
> > whether or not the reference is actually reached:
> > We only identify object nodes by name, not in serial order.
> >
> > Here are my proposed changes I want to make, which would entail changing
> > the shape of the moki doc a bit:
> >
> > We create an ArrayList of URIs that is maintained throughout the
> > entirety of the parsing process. When a reference to a resource is
> > encountered, we check to see if the list already contains that URI. This
> > list will contain a list of all the resources contained in both the primary
> > doc and css files. At the end of the parsing process, we can add an
> > additional node to any location in the moki that states the length of the
> > list ( i.e. how many unique resources were encountered). I propose
> > adding this as it's own node under moki, as it spans information in the
> > primary doc, images, and css. Because we only want to record the number of
> > unique references, I can't see any other way to pull it from the moki
> > document using xsl. I'm open to any other suggestions.
> >
> > As to the nested object problem, I'm at a loss for solutions given our
> > current implementation of the DOM. Suggestions?
> >
> > Thanks for your input in advance,
> > Laura
> >
> >
> >
>
>
Received on Friday, 27 July 2007 14:06:21 UTC