Moki intermediate document draft has several parts with info messages that come from third parties. These parts are the following:
If we want to build an intermediate document not coupled to specific third parties it would be desirable having our proper message codes (better groups of them) and mappings between third parties ones and ours. In this way anyone can easily replace one tool for another.
A brief example (considering grammar validity) would be:
A fragment in moki document:
<error code="002">
<!-- Specific tool messages -->
<location type="line">30</location>
<messages>Here would be specific tool message </
messages >
<location type="line">40</location>
<messages> Here would be specific tool message </
messages >
</error>
In other file we would have descriptions about the error codes:
<messages>
<error code="002">
<description>Brief description about what
represent this code </description>
</error>
</messages>
A mapping between third party message and our codes:
<messages>
<tool>JHOVE</tool>
<!-- mappings between our codes and tool codes-->
<code="002">
<toolcodes>
<code="001">
<code="002">
<code="003">
</toolcodes>
</code>
</messages>
Note that in this way messages can be easier internationalizable.
Before we can take a decision on this is necessary a deeper watch inside validation engines and analyse the feasibility/cost of this approach.
In this paragraph we will describe the tools that are nowadays chosen. The analysis is centred in the kind of output that these tools generate and if they provide the information needed by the tests.
This tool has the objective of cleaning the code that is going to be the input for the checker. In the tests that we have done, only got another file tidied but nothing like a report of actions taken. So, we looked into the source code and found that this tool use SAX internally to parse the document but in an ad hoc way: when it catches something wrong, it repairs without any report of the fixing made.
Problems
Therefore, using tag-soup we can’t report any output about the actions
made in moki document.
Other possibilities of the tidy tools proposed by a Sean's mail [http://lists.w3.org/Archives/Public/public-mobileok-checker/2007Mar/0046.html] are:
[TODO:These tools are pending to be analized regarding messages management]
This tool has been selected for the validation of the images and the XHTML
code. JHOVE has several modules to validate the input. Although at first sight
it seems that it does not validate the basic profile [http://hul.harvard.edu/jhove/index.html],
a deeper look inside the source code reveals the opposite.
We describe here the JPEG/GIF and XHTML module in a separate way.
JHOVE has both modules for GIF a JPEG images. It is possible to validate the
formats against the specification imposed by mobileOK Basic:
JPEG [http://hul.harvard.edu/jhove/references.html#t.81]
GIF [http://hul.harvard.edu/jhove/references.html#gif89a
Problems
The output provided by these modules does not have any kind of error identification
and the messages are embed in source code (it is not possible the internationalization).
For example this is an error reported by the GifModule:
info.setMessage(new ErrorMessage("End of file reached without encountering
Trailer block",_nByte) );
Another problem would be the impossibility of checking if all pixels of an image are transparent or not.JHOVE only detetcs if the alpha channel (transparency) is used, but it does not check if all pixels are transparent.
The validation of the XHTML grammar is done by the JHOVE XML module. This module uses internally SAX interface. SAX makes the validation using the declared DTD and reports the messages. The problem is that although SAX uses messages codes (as small strings) internally the API only exposes the large message strings without any code.In the following internal message,we only get the error string:
XMLLangInvalid=The xml\:lang attribute value "{0}" is an invalid language identifier.
JHOVE XHTML module includes some common DTDs as resources (XHTML/HTML) but XHTML Basic/Mobile Profile DTDs are not included. For performance reasons (avoid overhead of NET connections and so on) it would be desirable including them as resources.
JXCSS is SAX parser adapter for SAC parsing. Because of this JXCSS will share features (and flaws) of SAC parsing. JXCSS is a library for writing CSS document in XML format. It does not do any process on the CSS grammar.
SAC is an event driven API (like SAX) which provides access to different tokens of CSS. An SAC parser accepts two diferent handlers a DocumentHandler and an ErrorHandler. The DocumentHandler, basically, registers selectors, properties, at-rules and other events like start of the document.
SAC is a low level API, it just provides access to different tokens and in code we must check that properties have the expected value. For example looking for font-size absolute values:
public void property(String property, LexicalUnit value, boolean important)
throws CSSException {
if ( property.equalsIgnoreCase("font-size") ) {
if ( absoluteFontSize(value) )
// Do something
}
}
private boolean absoluteFontSize(short lexicalUnitType) {
switch ( lexicalUnitType ) {
case LexicalUnit.SAC_PIXEL:
case LexicalUnit.SAC_INCH:
case LexicalUnit.SAC_CENTIMETER:
case LexicalUnit.SAC_MILLIMETER:
case LexicalUnit.SAC_POINT:
case LexicalUnit.SAC_PICA:
return true;
default:
return false;
}
}
Some CSS properties have shorthand form (font-size could also be definided by
font property) in that case we must skip the values we are not interested in.
(Not sure in this moment if we will have to deal with shorthand properties but
just in case)
public void property(String property, LexicalUnit
value, boolean important) throws CSSException {
if ( property.equalsIgnoreCase("font")
) {
while ( isNotFontSizeValue(value)
)
value= value.getNextLexicalUnit();
if ( absoluteFontSize(value)
)
// Do something
}
}
private boolean isFontSizeValue(LexicalUnit lu) {
// font: font-style font-variant font-weight
font-size/line-height font-family....
switch ( lu.getLexicalUnitType() )
{
case
LexicalUnit.SAC_IDENT:
String value = lu.getStringValue().toLowerCase();
if ( value.equals("xx-small") || value.equals("x-small")
|| value.equals("small") ||
value.equals("xx-large") || value.equals("x-large")
|| value.equals("large") ||
value.equals("medium") || value.equals("smaller")
|| value.equals("larger") )
{
return true;
}
else
return false;
case LexicalUnit.SAC_PIXEL:
case LexicalUnit.SAC_INCH:
case LexicalUnit.SAC_CENTIMETER:
case LexicalUnit.SAC_MILLIMETER:
case LexicalUnit.SAC_POINT:
case LexicalUnit.SAC_PICA:
case LexicalUnit.SAC_EM:
case LexicalUnit.SAC_EX:
case LexicalUnit.SAC_PERCENTAGE:
return true;
default:
return false;
}
}
The strong point of SAC library is its speed. It is really fast. On the other hand SAC does not perform grammar validation it only reports lexical errors (like not closing brackets or so). For example an well formed CSS chunk but grammar invalid will be: body { non-existent-property: nonExistentValue };
Message errors are handled by ErrorHandler class and splitted in three categories: warning,error and fatal. Each error category is reported in own its method and has the message string but no error code. Error messages can be localized by the method setLocale. In this way we could at least get the error message in a locale dependent manner.
SAC is just an API and there are several implementations, the two most well-known could be Flute (from W3C) and Batik (from Apache). It is a pity but Flute library does not implement the setLocale method yet so only Batik implementation remains as a choice. Batik provides internationalization by properties files so we will need to translate them and setting the locale.
CSS-Validator is a high level API which performs grammar checking against different CSS profiles. After a style sheet is parsed there is a method to get all the selectors and for each selector you can get its properties. The first example (font-size) with this library would be something like:
org.w3c.css.css.StyleSheet ss = css.getStyleSheet();
org.w3c.css.parser.CssStyle style;
org.w3c.css.properties.css1.CssFontSizeCSS2 fontSize;
java.util.Enumeration e = ss.getRules().keys();
while ( e.hasMoreElements() ) {
style = ss.getStyle( (org.w3c.css.parser.CssSelectors)e.nextElement()
);
Css1Style css1 = (Css1Style) style;
fontSize = css1.getFontSizeCSS2();
if ( fontSize!=null && fontSize.isByUser()
)
{
if ( fontSize.get()
instanceof org.w3c.css.values.CssLength )
{
org.w3c.css.values.CssLength cssLength = (org.w3c.css.values.CssLength)fontSize.get();
if ( !cssLength.getUnit().equalsIgnoreCase("em") && !cssLength.getUnit().equalsIgnoreCase("ex")
)
{
// Do something
}
}
}
}
With this library we also get the font sizes defined by the shorthand font property so no more code is required to handle them. Furthermore when you parse a CSS file with this library, it transparently adds any imported style sheet so at the end you get all the styles.
Errors are handled through exceptions and only the message string is provided. As with SAC parser css-validator can localize its errors messages (this time by an ApplContext object) so it can provide messages in diferent languages. Error messages are taken from properties files and there are at least 8 translations.
Whatever CSS tool we use finally (or combination of them), neither of them use messages codes. We would have to modifie their source code.
Before reaching our conclusions, we will summary all information in the following table
Library | Message code | Properties file | Notes |
---|---|---|---|
Tag Soup | — | — | Not provide any kind of message |
JHOVE Image Module | no | no | Validate the specific mobileOK Basic Formats |
Package javax.imageio | no | no | Low level API useful for checking transparency. Any error will be wrapped by mobile checker code. |
JHOVE XHTML Module | no | no | Uses SAX as validation engine |
SAX | internal | no | SAX parser messages can be localized (setLocale). ¿Implementation dependent? |
W3C Markup Validator | yes (see OpenSP) | yes* (see OpenSP) | A wrapper library written in Perl for OpenSP |
OpenSP | yes | yes* | The properties are loaded into code during build process. This library can be useful if we build an JNI |
W3C SOAP Validation Service | internal | yes* | SOAP entry point to W3C Markup Validator |
SAC (Batik) | internal | yes | Useful for CSS properties searching |
CSS-Validator | internal | yes | Useful for validate the CSS grammar |
JXCSS | no | no | Useful for the representation of CSS in XML |
None of these tools has an message management that satisfies the needs introduced at the begining of this document. So far we think about two possible solutions:
Perhaps the better solution is a combination of both of them. (Note that in some cases -like Tag soup it is not possible to include any kind of messages)
As the messages handling in tools are very heterogeneous,we think that a reasonable solution would be the treatment of each tool in a separate way.Not looking for the best solution in all cases but an agreement between development agility and quality.