Re: EXI WG's inquiry about ISSUE-2050

Hi Takuki,

On May 16, 2012, at 01:44 , Takuki Kamiya wrote:
> The rule of thumb in the better design of schemas for EXI is 
> that the more rigorous the schema is the more compactness 
> you can achieve out of EXI.

True, but "rigorous" can be hard to define :)

> Also, I am interested in knowing the aspects of the
> relaxNG schema SVG is exercising that are not supported 
> by XSD 1.0, and the rationale that led SVG 1.x to depend 
> on them. This is to see if it is totally out of whack to
> apply XML Schema, or is manageable.

It's been a loooooong time, so I do not claim that this description here is correct — it is based largely on old memories.

One thing that XML Schema could not capture but RNG could was the <a> element's content model. Essentially, wherever <a> is allowed, it is allowed to contain anything that is allowable at the same time as itself, minus itself (at any level down). To give an example, assume that <foo> can contain <a>, <foo>, and <bar>. An <a> inside a <foo> can therefore contain <foo> and <bar>, and if there is a <foo> inside an <a> it can only contain <foo> and <bar> as well (recursively). That's impossible to express in XSD 1.0 (to be fair, I'm not sure I understand how we captured that in RNG — it broke some brains).

XSD couldn't capture context-dependent constraints. For instance, at least in Tiny, the root <svg> element was allowed some attributes that were forbidden if <svg> appeared inside the document.

I believe that co-occurence constraints were also part of the picture, with some content-models depending on attributes present elsewhere.

I also recall having UPA problems all over when building an XML Schema for SVG. The only solution was to make the schema more permissive than it needed to be.

Note that this is experience from a while back. It's not impossible that in the meantime XML Schema 1.1 may have addressed a number of these issues. Also, the lack of interoperability in XML Schema processors did exclude some more creative constructs that we looked at (which ones, I don't recall). That's something that ought to be a lot better today.

Overall, I don't think that you will have major problems producing an XML Schema for SVG, but the result will be rather loose. SVG is an authoring syntax and a lot of stuff is optional in a lot of places. My experience with binarising SVG is that you gain most from custom codecs (or by changing the syntax, which is essentially the same) and less than you'd hope from the structural redundancy.

One thing that's worth testing on real world data is to exclude rare elements and attributes completely from your schema. For instance, <title> and <metadata> are acceptable in lots of places, but rarely used; same for attributes like contentScriptType and a *lot* of properties. You quickly end up in a situation where you have to encode a lot of 0 bits with each element. If you ditch those elements completely and encode in fault-tolerant mode (so that they aren't lost), the odds are you win overall (possibly by a lot).

-- 
Robin Berjon - http://berjon.com/ - @robinberjon

Received on Monday, 21 May 2012 12:04:55 UTC