Re: A really micro schema language from Stephen D Green on 2012-12-27 (public-microxml@w3.org from December 2012)

From: Stephen D Green <stephengreenubl@gmail.com>
Date: Thu, 27 Dec 2012 14:31:18 +0000
To: James Clark <jjc@jclark.com>
Cc: public-microxml@w3.org
Message-ID: <CAA0AChXudW_WDR2RtDC427Oc0c0QNuC4DL5egA1WPwkGfWyGGA@mail.gmail.com>

If you are going to have a schema which is like a list
of XPath (subset?) expressions: It might be important
to somehow ensure that

1) there is some sense of 100% coverage of the MicroXML
with the expressions (where needed - unless the schema
is partial)
2) there is some way to eliminate duplication - some kind
of cannonicity of expressions, say, such that no two
expressions say the same thing

For 2) it might be worth trying to ensure that there are as
few ways as possible (closest as possible to exactly one
way) to express any particular constraint. Then duplicate
logic will show as duplicate expressions.

I would think a strict, perhaps minimal subset of XPath
might be a way to achieve this. Guessing it would have
a preference for the more succinct shorthand ways to
say something. However, it does get complicated to say
something simple like count(//form//form)=0
or something like count(//form)>=0, even with the shorthand
so an even shorter shorthand might be needed, as has
already been implied, e.g. dropping the 'count()' and the
leading '//' and perhaps replacing the '=0' or '>=0' with
something like the Kleene characters you have in DTDs.
If the choice to use Kleene characters like * and + is made
then it might be best to combine MicroXPaths with other
entities on one line so I suggest separators like those I
mentioned recently on XML-Dev for a similar discussion
http://lists.xml.org/archives/xml-dev/201212/msg00058.html
could be identified: I suggested using the XML-illegal
characters like ampersand and less-than so that line-endings
can be avoided (in case they are needed as part of the
actual expressions). Then you could have something like

//form&+<//form//form&-

or even, more abbreviated (more implicit assumptions):

form&+<form//form&-

to say that a form element can be included (anywhere)
but cannot have a descendant element named 'form'.
(The & separates the MicroXPath-esque expression from
the Kleene cardinality character and the < separates one
such combined statement from the next.)

Having just two (or perhaps three) parts to a statement
and having such a limited subset that as near as possible to
exactly one way exists to state the same thing thing helps
to assure that there can be a clear determination of what
consitutes as close as possible to 100% coverage of
a MicroXML instance.

----
Stephen D Green

On 19 December 2012 04:16, Liam R E Quin <liam@w3.org> wrote:

> On Tue, 2012-12-18 at 16:49 +0700, James Clark wrote:
> > Here's an idea I was playing around with a while ago.  It relates to the
> > PossibleChildren property John mentioned.
> >
> > Imagine a really, really simple schema language that
> >
> > - uses a non-XML syntax;
>
> I'm not sure I want to do that. Why should I need a second parser when
> I've already got microXML and it's supposed to be perfect for this sort
> of thing? If not MicroXML, why not JSON?
>
> >  p !/ p
> >
> > A p element must not have a p child element.
>
> If you're really going to invent an expression language, !(p / p) is at
> least a little clearer. Or, not(p/p) and use a subset of XPath.
>
> Or, almost examplotron-style,
>
>   <p><not><p></not></p>
>
> I know CSS selectors have also been mentioned. But they are complex and
> hopelessly non-general and ad-hoc, and tend to hard-wire knowledge of
> HTML rather too easily.
>
> Liam
>
> --
> Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
> Pictures from old books: http://fromoldbooks.org/
>
>
>

Received on Thursday, 27 December 2012 14:32:06 UTC