- From: Sean Owen <srowen@google.com>
- Date: Mon, 30 Jul 2007 18:34:15 -0400
- To: "Jo Rabin" <jrabin@mtld.mobi>
- Cc: public-mobileok-checker <public-mobileok-checker@w3.org>
My general reaction is that this is getting too complicated -- for version 1.0 at the very least. Retrieve images, examine their Content-Type. If missing, assume it's not a supported image. If present and it's GIF or JPEG, great, parse it. If it's something else, assume it's not supported. I strongly believe we need to favor simple solutions that solves the problem of "implement mobileOK Basic 1.0" first. It's good to develop this into a more general platform for evaluating a web resource but we haven't quite signed on for that just yet. At the moment scope and complexity appears to be outpacing progress towards an implementation. This particular issue -- unidentified content types -- feels corner-case-ish to me. I am happy for version 1.0 to go out with a crude reaction to this situation as long as it's handling the 99% of other cases usefully. And then this can be tackled. On 7/25/07, Jo Rabin <jrabin@mtld.mobi> wrote: > > > > > Hi Laura > > > > Sorry about taking a long time to get back, you ask good questions! Some > thoughts. > > > > We should include all resources pointed to by img tags. I noticed in the > preprocess method that the checker seems to want to assess content type by > looking for the file extension. But that is really not right at all. It's > not necessarily the case that images will have a file extension, and even > when they do, it's an error to infer the content type from them – see e.g. > [1] and [2] which make it clear that the resource must be retrieved and its > Content-Type header examined in order to determine its type. > > > > [1] > http://www.w3.org/2001/tag/doc/metaDataInURI-31.html#erroneous > > [2] http://www.w3.org/2001/tag/doc/mime-respect#missing > > > > Even though the object element allows the specification of content type, > browsers typically taste the content of nested objects even in the presence > of this information to determine the actual content type. Given that they > stop when they find something they like, it's a good question to ask whether > the checker should continue and whether and where it should put those > references in moki. It's an even better question to wonder how the xslt > would differentiate between those objects that should be counted and those > that should not. > > > > So some thoughts about the code: > > > > 1. Given that the image type is not known in advance of retrieving it, and > given that the image may not be of a known type, there seems to be the need > for a factory somewhere which constructs a JPEG resource, a GIF resource or > a generic image resource depending on the result of the retrieval. It looks > like the image element in moki needs to be extended to include an image type > which should be set to the media type of the response under the imageInfo > element. > > > > 2. When processing images (and links and so on) in the primary document, I > think that duplicates should not be suppressed and the duplicate detection > should be handled in the preprocess method. Aside from anything else, the > detection of duplicates should be done on a canonical URI not just a text > match (and on the absolute version of the URI, for that matter). Though as > we saw from a little test that Dom put together real browsers do appear to > do a textual match, so that aspect of the behaviour needs to be centralized > so we can change it easily or control it by a switch. > > > > 3. In the CSSResource class, an image list needs to be constructed and then > processed as above. > > > > 4. The same observation applies to link elements as to images. Since CSS > files can include other CSS files they need to have a list of included CSS > and that needs to be preprocessed according to the same URI matching > strategy. > > > > 5. Ideally, each of the lists of URIs should provide a reference to where > they were found in the source of whatever document they were found in for > error reporting purposes. (Did I hear a collective groan about line and > column number references L) and so that the moki document can provide the > info that an image/css was in error and is referenced in 7 rather than just > one place. > > > > 6. I think there is a need for an objects element in moki. It should contain > objects and the objects should say a) what their content type is and b) > whether they should be counted as an external reference. That should be easy > enough to do. What's not so obvious is what to do about text/html when it is > found in an object and I think the answer is that it should be counted and > skipped. > > > > 7. Oh, and finally, before I forget. There is the case (401 Authentication) > where both the page presented with the response and the primary document are > tested and the external resources from the authentication page are added to > the total. On reflection, I think we should think again about this behaviour > before we go to the next last call of the mobileOK doc. And not worry about > it in the code for now. (Famous last words) > > > > I've just checked in some updates with a couple of TODOs in the relevant > places, I hope. > > > > I'll also update the moki example doc with the suggestions I made. And while > I am about it I will generate a schema for moki. It's about time. > > > > Hope this helps. Oh and these are just my suggestions, you or anyone else > may have better ones. > > > > Jo > > > > > ________________________________ > > > From: public-mobileok-checker-request@w3.org > [mailto:public-mobileok-checker-request@w3.org] On Behalf > Of Laura Holmes > Sent: 24 July 2007 23:54 > To: public-mobileok-checker > Subject: Proposed changes to Moki - External Resources Test > > > > > Hi all, > I just wanted to run some changes by you all and get some feed back. > Currently, I'm working on the ExternalResourcesTest and am running into > conditions that haven't been accounted for in the existing code. These > conditions include: > > 1) counting references contained in objects that are not jpeg or gif: > there are many other image types and other types of objects (such as > applications or audio) that may be included on a page. I'm assuming that we > want to include these references even if they can't be rendered on a mobile > phone due to a comment made regarding nested objects: "For nested object > elements, count only the number of objects that need to be assessed before > content matching the request header defined in 2.3.2 HTTP Request is found." > So, we want to assess other content types other than jpeg and gif when > counting external resources. > > 2) keeping track of unique references to resources that are other than jpeg > or gif: > If two references are made in the primary document to the same image, it is > only counted once, but if we reference the same image in css, we currently > don't have a way of tracking this. > > 3) references contained in nested objects are counted regardless of whether > or not the reference is actually reached: > We only identify object nodes by name, not in serial order. > > Here are my proposed changes I want to make, which would entail changing > the shape of the moki doc a bit: > > We create an ArrayList of URIs that is maintained throughout the entirety > of the parsing process. When a reference to a resource is encountered, we > check to see if the list already contains that URI. This list will contain a > list of all the resources contained in both the primary doc and css files. > At the end of the parsing process, we can add an additional node to any > location in the moki that states the length of the list ( i.e. how many > unique resources were encountered). I propose adding this as it's own node > under moki, as it spans information in the primary doc, images, and css. > Because we only want to record the number of unique references, I can't see > any other way to pull it from the moki document using xsl. I'm open to any > other suggestions. > > As to the nested object problem, I'm at a loss for solutions given our > current implementation of the DOM. Suggestions? > > Thanks for your input in advance, > Laura > >
Received on Monday, 30 July 2007 22:39:43 UTC