- From: Sean Owen <srowen@google.com>
- Date: Fri, 18 May 2007 15:33:31 -0400
- To: "Miguel Garcia" <miguel.garcia@fundacionctic.org>
- Cc: public-mobileok-checker@w3.org
Excellent info. Sounds like the W3C Validator is the complete solution for validating the stylesheet, but it's very slow. SAC doesn't do as much but is very fast. Shall we just use both? Validate with the W3C code, then parse with SAC? Seems to work but then we face a real performance problem. Shall we see if we can launch a side effort to speed up the W3C code? Maybe if it's used slightly differently it's much faster. Or, maybe with a bit of profiling (and I'm a crack JProfiler expert) we can uncover some optimizations that benefit everyone. I believe we will have to manually grab other stylesheets reference by at-import rules, yes. On 5/18/07, Miguel Garcia <miguel.garcia@fundacionctic.org> wrote: > > Hi > > We're making a thoroughly study of all third-parties we'll use in mobile checker. We'll make a document in time for next teleconference, I hope. > We're focusing in which information third-parties provide and how to handle their error messages. > > As foretaste I'll talk about both possible CSS libraries: SAC library and W3C CSS-Validator and try to point advantages and disavantages. > > SAC is an event driven API (like SAX) which provides access to diferent tokens of CSS. An SAC parser accepts two diferent handlers a DocumentHandler and an ErrorHandler. > The DocumentHandler, basically, registers selectors, properties, at-rules and other events like start of the document. > SAC is a low level API, it just provides access to diferent tokens and in code we must check that properties have the expected value. > For example looking for font-size absolute values: > > public void property(String property, LexicalUnit value, boolean important) throws CSSException { > if ( property.equalsIgnoreCase("font-size") ) { > if ( absoluteFontSize(value) ) > // Do something > } > } > > private boolean absoluteFontSize(short lexicalUnitType) { > switch ( lexicalUnitType ) { > case LexicalUnit.SAC_PIXEL: > case LexicalUnit.SAC_INCH: > case LexicalUnit.SAC_CENTIMETER: > case LexicalUnit.SAC_MILLIMETER: > case LexicalUnit.SAC_POINT: > case LexicalUnit.SAC_PICA: > return true; > default: > return false; > } > } > > Some css properties have shorthand form (font-size could also be definided by font property) in that case we must skip the values we are not interested in. (Not sure in this moment if we will have to deal with shorthand properties but just in case) > public void property(String property, LexicalUnit value, boolean important) throws CSSException { > if ( property.equalsIgnoreCase("font") ) { > while ( isNotFontSizeValue(value) ) > value= value.getNextLexicalUnit(); > if ( absoluteFontSize(value) ) > // Do something > } > } > > private boolean isFontSizeValue(LexicalUnit lu) { > // font: font-style font-variant font-weight font-size/line-height font-family.... > switch ( lu.getLexicalUnitType() ) > { > case LexicalUnit.SAC_IDENT: > String stringValue = lu.getStringValue().toLowerCase(); > if ( stringValue.equals("xx-small") || stringValue.equals("x-small") || stringValue.equals("small") || > stringValue.equals("xx-large") || stringValue.equals("x-large") || stringValue.equals("large") || > stringValue.equals("medium") || stringValue.equals("smaller") || stringValue.equals("larger") ) > { > return true; > } > else > return false; > case LexicalUnit.SAC_PIXEL: > case LexicalUnit.SAC_INCH: > case LexicalUnit.SAC_CENTIMETER: > case LexicalUnit.SAC_MILLIMETER: > case LexicalUnit.SAC_POINT: > case LexicalUnit.SAC_PICA: > case LexicalUnit.SAC_EM: > case LexicalUnit.SAC_EX: > case LexicalUnit.SAC_PERCENTAGE: > return true; > default: > return false; > } > } > > The strong point of SAC library is its speed. It is really fast. Later I'll compare both libraries speed. On the other hand SAC doesn't perform grammar validation it only reports lexical errors (like not closing brackets or so). > For example an well formed CSS chunk but grammar invalid will be: > body { non-existent-property: non-existent-value }; > > > Another key point are import rules, SAC parsing only reports at-rules it doesn´t do anything with them. We'd deal with import rules creating a new CSSResource and testing it. > > > Another alternative is using W3C CSS-Validator library. CSS-Validator is a high level API which performs grammar checking against diferent css profiles. > After a style sheet is parsed there is a method to get all the selectors and for each selector you can get its properties. > The first example (font-size) with this library would be something like: > > org.w3c.css.css.StyleSheet ss = css.getStyleSheet(); > java.util.Enumeration e = ss.getRules().keys(); > org.w3c.css.parser.CssStyle style; > org.w3c.css.properties.css1.CssFontSizeCSS2 fontSize; > while ( e.hasMoreElements() ) { > style = ss.getStyle( (org.w3c.css.parser.CssSelectors)e.nextElement() ); > Css1Style css1 = (Css1Style) style; > org.w3c.css.properties.css1.CssFontSizeCSS2 fontSize; > Css1Style css1 = (Css1Style) style; > fontSize = css1.getFontSizeCSS2(); > if ( fontSize!=null && fontSize.isByUser() ) > { > if ( fontSize.get() instanceof org.w3c.css.values.CssLength ) > { > org.w3c.css.values.CssLength cssLength = (org.w3c.css.values.CssLength)fontSize.get(); > if ( !cssLength.getUnit().equalsIgnoreCase("em") && !cssLength.getUnit().equalsIgnoreCase("ex") ) > { > // Do something > } > } > } > } > > With this library we also get the font sizes defined by the shorthand font property so no more code is required to handle them. > Furthermore When you parse a css with this library, it transparently adds any imported style sheet so at the end you get all the styles. > > > What are we using in CTIC? We are using both of them. > We use CSS-Validator for just checking grammar validity and SAC parsing to inspect CSS so you get the best of both worlds. > > Our primary scope is non-mobile web pages and those usually have big css files (+1000 lines). With that kind of style sheet css-validator library takes quite long to loop through each property. > > To get an idea, parsing CTIC's main css [http://www.fundacionctic.org/web/export/sites/default/es/resources/css/style.css] looking for font-size properties takes around 8 sec with css-validator while SAC takes only about 0.1 sec. > If we only validate css grammar with css-validator it is around 0.9 sec. If we assign grammar validation task to css-validator library and css inspection to SAC library we'll get around 1 sec. This is an important gain from relying only on css-validator. > > If we take a small style sheet, for example Google mobile [http://www.google.com/xhtml], and again we parse it looking for font-size properties css-validator takes only 0.56 sec and SAC 0.03 sec. > In small style sheets there's still significant gap between both libraries but in this case css validation takes 0.54 sec, almost the same time. In this case there is no significant gain if we had used both libraries. > > Relating to error messages handling both libraries provides error messages without any identifier (just the string). Error messages are defined in properties files so we could translate them and both parsers allow setting theirs locale. This implies that mobile checker should somehow provide the desired locale. Something like css validator service does, several web pages in different languages and each page set the results to its language. > > Regards, > Miguel > > >
Received on Friday, 18 May 2007 19:34:04 UTC