Re: On schema quality and schema limitations

* Dominique Hazaël-Massieux wrote:
>Le ven 09/04/2004 à 07:23, Bjoern Hoehrmann a écrit :
>>   While trying to figure out what the lexical space of this "charset"
>> thing is (which I need to know for my Charmod Fundamentals and XHTML
>> Print review and probably for implementation of XHTML Print in the
>> MarkUp Validator...) I somehow ended up writing 
>> 
>>   http://lists.w3.org/Archives/Public/www-archive/2004Apr/0043.html
>
>Very interesting reading! I expect you'll turn most of its contents into
>comments for the XHTML Modularization SE?

Hmm, maybe... I am not sure what to tell them.

>Do you mean by that your mail should be formally added to our issues
>list as another set of points to solve? ie, is there any other issues
>that you would like to see open as a follow-up to your email?

I would say that mail contains just more illustration for the issue,
adding a link to the issue might be a good idea, I don't think there
is anything new here.

>Getting back to your email, you raise a point that I think is really
>worth trying to solve in W3C as a quality process: how to check that the
>formal language (defined in schema, dtd, ...) matches the
>English-written specification ;

Agreed.

>Have you tried to check whether a conformant XHTML document is declared
>as XML Schema invalid, based on the errors you found?

Well, what is a conformant XHTML document? Regarding the FrameTarget
type in <http://www.w3.org/TR/xhtml1-schema/>, here is a test case

  http://www.websitedev.de/markup/validator/tests/uppercase-blank-in-target.html,sv

The ,sv is a redirect to <http://schneegans.de/sv/>, Christoph
Schneegans' XHTML 1.0 schema validator. HTML 4.01 says in

  http://www.w3.org/TR/html4/present/frames.html#adef-target

the target attribute is case-insensitive, hence it seems that it
does not matter whether I use _blank or _BLANK. Hmm, I commented on
XHTML 1.0 FE that input="TYPE" is not allowed per XHTML 1.0 DTDs but
that this is not documented in the specification, they added

  http://www.w3.org/TR/xhtml1/#h-4.11

Now I do not know whether the target attribute falls into this category.
It does not suffer from the case-sensivity problem in XML DTDs... Maybe
the CI in HTML 4.01 is an error (there are a couple...) Hmm,

  http://www.w3.org/mid/1077199561.30689.263.camel@stratustier

is a similar issue, 

  http://www.w3.org/TR/html4/struct/global.html#adef-profile

says,

  profile = uri [CT] 

  This attribute specifies the location of one or more meta data
  profiles, separated by white space. For future extensions, user agents
  should consider the value to be a list even though this specification
  only considers the first URI to be significant. Profiles are discussed
  below in the section on meta data.

The specification consistently uses %URI; to refer to the lexical space
of this attribute, HTML 4.01 has no %URIs; data type, it uses CDATA for
such attributes like <object archive> ... wait, they have one in

  http://www.w3.org/TR/html4/struct/objects.html#adef-archive-OBJECT

it is "uri-list" which is (like e.g., "cdata-list") practically
undefined in the specification... And it is used only in a few places...
It seems that multiple URIs are prohibed in the profile attribute... 

  http://www.w3.org/People/Raggett/tidy4aug00.zip

attrs.c:attrlist[] agrees, it has

    {"profile",          VERS_HTML40,            URL},      /* HEAD */

  % perl -e "print qq(<head profile='a b'>)" | tidy-current
  ...
  line 1 column 1 - Warning: <head> escaping malformed URI reference
  ...
  <head profile='a%20b'>
  ...

Hmm... People put spaces into URIs

  <a href='Foo Bar Baz.doc'>...</a>

These do not co-exist very well... If multiple URIs are not allowed,
Tidy should fix them... How would one review Schemas that are based on
a contradictory specification that is not fixed? And how to figure out
what to do for Tidy? A new data type? Special case the profile attribute
to complain about spaces but not fixing them?

Back to your question, I did not really have a chance to test something
for M12N SE since it does not provide language specific schemas. There
is a "XML Schema driver for XHTML 1.1" in the .zip file, maybe that
would do, but only for a small subset of XHTML 1.0 Transitional, I would
need to write my own Schema for e.g. XHTML 1.0 Transitional before I
could do some testing. Hmm, there is an xhtml-frameset example and there
is a test file...

  <!-- Width in th/td/tr not working -->
  <!-- Frames Not working, complains body element is missing-->
  <!-- Attribute color values, is it case sensitive or not like Black
       or black -->
  <!-- Character DataType validate with somebody-->
  ...

That looks a bit scary. Maybe these comments are outdated... If I had a
complete Schema I would need to "fix" the .xsd files since all my XML
Schema processors suffer from

  http://www.w3.org/XML/xml-names-19990114-errata#NE05

They do not like

  xmlns:xml = '...'

and refuse to load the schema. As the errata points out, "It may, but
need not, be declared", but even though many people complained about it,
their comments have so far been limited to


http://lists.w3.org/Archives/Public/www-html-editor/2002OctDec/0017.html

  Well-known bug of MSXML.

Probably likewise for the DTD M12N, it contains a processing instruction

  <?IS10744:arch ... ?>

which causes similar trouble

  http://lists.w3.org/Archives/Public/www-html-editor/2002JulSep/0010.html

The W3C CSS Validator for example refused to validate style sheets
referenced from XHTML Basic documents for this reason, for example.

I still don't know whether this is actually allowed, I got so far no
response to

  http://lists.w3.org/Archives/Public/xml-names-editor/2004Jan/0006.html

...

What I do have is 

  http://www.websitedev.de/markup/validator/tests/

which might be of some interest in this regard.

Received on Friday, 9 April 2004 19:35:13 UTC