Re: NVDL, SVG validation (Was: first questions on validator.nu) from Henri Sivonen on 2008-01-09 (public-qa-dev@w3.org from January 2008)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 9 Jan 2008 16:38:32 +0200
To: Jirka Kosek <jirka@kosek.cz>
Cc: public-qa-dev@w3.org
Message-Id: <CF0576FF-689A-469F-B82A-CE393342A471@iki.fi>
On Jan 3, 2008, at 11:18, Jirka Kosek wrote:

>> No, I don't think so either, since nvdl has its rules and triggers
>> based on namespaces, as far as I can tell. This would mean that using
>> a different schema for different versions or profiles of a language
>> using one namespace (as SVG, or indeed xhtml, do) would need a pre-
>> parse. In the case of SVG it's not that bad since the version/
>> baseProfile information is on the root element, but it does need an
>> xml-preparsing.
>>
>> I wonder if there's a part of nvdl that does that, and which I  
>> haven't
>> found yet.
>
> NVDL itself doesn't have provision for this. But you are not first  
> with
> such requirement. Our implementation JNVDL has extension for this,  
> see:
>
> http://jnvdl.sourceforge.net/extensions.html#extensions-usewhen
>
> I think that something very similar will get into next official  
> version
> of NVDL language.

Allowing arbitrary XPath there would preclude reasonable streaming  
implementations and would be an overkill for the SVG profiling use  
case. Precluding streaming implementations seems like a bad idea. Have  
you discussed this with oNVDL developer(s)?

[profiles and subsets]
> For example you can have renderer that turn SVG into PDF file  
> intended for print. You might want to validate that SVG file is not  
> using any animations and other dynamic effects that would be lost on  
> paper.


The Validator.nu default behavior for XHTML/SVG/MathML is meant for  
Web documents, though.

>>> It might be that the NVDL part of oNVDL allows something like this
>>> already. I haven't investigated yet. I have considered this issue,
>>> though, and I am planning on punching a hole in the XHTML5 schema
>>> for embedded RDF, for example.
>>
>> I think that's be a perfect job for nvdl. See this example in the
>> jnvdl documentation:
>> http://jnvdl.sourceforge.net/tutorial.html#d4e311
>
> Indeed. Managing such holes for more complex compound documents can be
> very tricky, see:
>
> http://2007.xtech.org/public/schedule/paper/76#validation

The paper mentions that having to modify an official schema is a  
problem. I don't think it makes sense to introduce additional levels  
of abstraction/complexity in order to avoid editing the RELAX NG  
files. Besides, XHTML5, XHTML 1.0, SVG 1.1 and MathML 2.0 don't even  
have official RELAX NG schemas (SVG 1.1 comes closest by having a W3C- 
issued schema).

I found that with XHTML 1.0, SVG 1.1 and MathML 2.0 I needed to edit  
the schemas anyway in order to incorporate features from the XHTML5  
schema (e.g. scheme-aware HTTP IRI checking with IDN support).  
Moreover, I had to convert the schemas to Compact Syntax anyway to  
make them more human-friendly.

> Using RELAX NG patterns for this is OK for combination of two or three
> languages, but for more languages it's really not nice. Also resulting
> schema is very big and some validators can have problem with it.

So far, the combination of XHTML, SVG and MathML with holes for RDF,  
Inkscape and OpenMath has worked for me using the RELAX NG engine of  
oNVDL (originally Jing). These schemas are even a bit more complex  
than the compound document schemas in Relaxed, because the  
Validator.nu schemas allow recursion through MathML annotation-xml and  
SVG foreignObject.

It is well-known that Jing needs less stack space that MSV for the  
same validation operation. I had to increase the JVM stack space a bit  
but within quite reasonable limits.

>>> I guess the SVG schema should get that hole, too. I have also
>>> considered an option to filter out unknown namespaces, but I'm not
>>> sure if it is good to open such "anything goes" holes in a  
>>> validator.
>>
>> Not a big fan of it myself, but there is an audience for it. I've
>> heard it asked more than a few times now: a number of hackers want  
>> the
>> right to play with the X in XML *and* get validation.
>
> I think that the biggest mistake of XHTML is its definition of  
> "strictly
> conformant document" which prevents extensibility.

I agree that the definition of "strictly conforming" documents is bad,  
but my foremost beef with it is that it requires a doctype. In my  
opinion, XML content on the Web (as opposed to text/html content)  
should be written without doctypes.

> It's perfectly OK to put custom elements/attributes from other  
> namespaces almost everywhere
> in almost any XML document.

It depends on what the elements/attributes are about. Inkscape-style  
enhanced editing round-tripping is mostly harmless even if  
underdocumented. However, embracing and extending Web formats with  
undocumented extensions that lock users into a particular browsers  
would not be perfectly OK.

> For example our Relaxed (http://relaxed.vse.cz) validator (RELAX NG  
> + Schematron based) allows it -- basic schema for HTML/XHTML simply  
> ignores everything which is not in XHTML namespace. But you can also  
> use more strict version of schema which rejects everything what is  
> not in XHTML namespace.

I think validators shouldn't silently ignore foreign namespaces. In  
the simplest case, doing so fails to catch miscopypasted namespace  
URIs. I think validators should make it easy for users to extend  
validation or to request certain bits to be ignored or validators  
should notify the users that some bits in the document went unchecked.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Wednesday, 9 January 2008 14:38:51 UTC