Re: XML, namespaces, extensibility and validation from olivier Thereaux on 2007-06-25 (www-tag@w3.org from June 2007)

From: olivier Thereaux <ot@w3.org>
Date: Mon, 25 Jun 2007 15:22:37 +0900
To: www-tag@w3.org
Message-Id: <D570F48B-F99B-4CB4-88D9-03F24B278EC6@w3.org>
Hi Tim, Hi TAG list,
(note I'm not subscribed, would appreciate being kept in Cc)

On Jun 22, 2007, at 22:17 , Tim Berners-Lee wrote:
> - The 'X' -n 'XML' is supposed to be for extensible

Absolutely.

> - The HTML language has always allowed for extension by saying that  
> unknown tags or attributed should be ignored.

Correct, yet to be accurate one has to note this has always been a  
requirement for _UAs_, whereas conformance requirements for HTML  
_documents_ have always been based on technologies that do not allow  
for foreign elements. That's the paradox I would like to get rid of.  
That's why I sent mail on this topic.


> - Therefore, groups like ARIA ought to be able to extend XHTML by  
> introducing new elements and attributes.

I think we have seen some progress in this area. See how the XHTML 
+RDFa was created, and can be validated (with the beta-soon-released  
validator only, for now). Note that they had to go and create a  
profile and DTD, that is, an XHTML+RDFa document cannot claim to be  
an XHTML document, with some stuff in a foreign namespace slapped  
into it. That, I believe, is contrary to the basic conformance  
statement for XHTML: http://www.w3.org/TR/xhtml1/#strict

> This hasn't happened, and one factor has been that developers don't  
> want to upset the W3C validator. So. let us make the validator more  
> constructive.

Yes. And since the validator is generally merely following the HTML  
specs, let's make the HTML specs more constructive, too. There lies  
the chicken/egg problem. Blaming some issues on the validator is  
overlooking the actual cause of a number of these issues.

> - allow people to add new elements and tags, using namespaces.

This is mostly something the XHTML specs should allow.

> - give a list of extensions used the validator does not know  
> about.  This is a warning, not an error.

Could you give an example of that?

> - warn them if they are squatting on a namespace without the  
> group's OK.

This sounds interesting. I assume this can only be done if the  
validator knows all elements/attributes officially tied to a given  
namespace. This, I believe, leads us to your following comment:
> - if  XSD or RelaxNG from the namespace document etc can be used to  
> check tha the new items are syntactically correct additions, do so
>    (if not, warn them that it can't, or give error if the syntax  
> can be demonstrated to be wrong)

> - congratulate them if a namespace document gives info about the  
> new namespace

You mean, if a ns URI dereferences, or if it is human readable, or...?

> Also on my wish-list would be:
>
> - check the mime type, content-encoding and other HTTP headers  
> intelligently

This is done to some extent, though I suppose you'd have to clarify  
"intelligently" to know what you have in mind.

What is done at this point is:
* check that the mime type is one the validator knows is for a known  
markup language
* check if the document type matches the mime type, and whine/make  
suggestions if not
e.g for an SVG document served as text/html:
http://validator-test.w3.org/check?uri=http://qa-dev.w3.org/wmvs/HEAD/ 
dev/tests/REC-SVG-1_0-minimal.html#preparse_warnings
Would be interested to hear what you'd had in mind for Content- 
Encoding and other headers.

> - check CSS linked and inline automatically

Our Unicorn project does just that.

http://www.w3.org/QA/Tools/Unicorn/

> - check Javascript linked and inline for syntax.

Yes. I wonder if there is any open source javascript lint/parser we  
could use.

> - give advice about other things we believe in such as  
> accessibility, i18n

Also in the Unicorn scope. There's not a lot of tools yet to check  
for those things.

> - derive RDF data from the page using GRDDL, according to a  
> putative spec  of what the current algorithm (GRDDL, embedded RDF  
> syntaxes etc)

Getting rather out of scope for a "validator", IMHO, but interesting  
nevertheless.

> I'd like similar things for an RDF validator.
>
> - request application/xml+rdf and text/rdf+n3

I think our RDF validator only does application/rdf+xml at ths point.

> - check mime types returned
> - for XML, check namespace & name of document element to decide how/ 
> whether to parse for RDF
> - understand and check RDF/XML and N3/turtle

> - check links from HTML files, GRDDL etc

How would you do that?

> - check that each class and property used is mentioned in its  
> namespace document (else warn)
> - check that classes and properties have labels, ideally  in  
> multiple languages (else weak warning)

Your suggestions for the RDF validator sound good. I don't know if  
there is much of a development effort for it at the moment, do you  
think there would be takers in the semweb hackers community?

-- 
olivier
Received on Monday, 25 June 2007 06:22:39 UTC