Re: Problems validating XML

Hi Martin,

On May 30, 2007, at 18:22 , Martin Duerst wrote:
>> you want to submit patches to make it
>> better in this regard, without being detrimental to its main job,
>
> I can definitely submit a patch that goes into XML mode if an
> XML declaration is present. I don't consider this as being
> detrimental to the validator's job, quite to the contrary.
> If that's not what you mean, please tell me.

I meant that in a general way. I don't think that adding a trigger  
for XML mode if the xml declaration is present is a bad thing - it  
does look sane. The discussions about XML detection/triggering, which  
I was mentioning in my previous message were the following two  
bugzilla entries:
XHTML Detection is over-eager [Bug 14]
XHTML-sent-as-text/html is parsed as XML [Bug 1500]

the latter has been made INVALID by a clarification from the XHTML  
working group, and I don't think the former is actually valid, but  
it's raising interesting questions relevant to this discussion.
[Bug 14] http://www.w3.org/Bugs/Public/show_bug.cgi?id=14
[Bug 1500] http://www.w3.org/Bugs/Public/show_bug.cgi?id=1500

>> I believe you're familiar with the code,
>
> Well, that was quite some time ago, and a lot of work has
> gone into the validator since, but to some extent, yes.

I think the code has indeed changed quite a bit since you last  
touched it, but its structure should be familiar.
A few of us here on the list can answer questions, too.

>> We don't do relative SIs. Yet.
>> http://www.w3.org/Bugs/Public/show_bug.cgi?id=1521
>
> If that can be handled in the validator code, I'll try to
> submit a patch. But it might take a while.

It would be great if you can look into it, but I believe this is a  
tricky one. Our parser does not validate online documents, but rather  
retrieves them before performing a local validation of the document's  
string. In this context, making the validator aware of relative URIs  
for system identifiers isn't trivial, you'd have to modify the  
Doctype declaration on the fly to add the URI base. Alternatively a  
patch to opensp to tell it "here is the URI base you should use to  
dereference relative SYSTEM URIs" could do the job, but I am not  
familiar enough with its code to tell how hard it would be.

>> The charset override was broken in the 0.8.0 beta1. It is now fixed.
>
> This would probably explain things, see above.
> Is there a plan to release a beta2?

Absolutely. Crossing fingers to have it out by the end of the week.  
In the meantime, the CVS HEAD version on qa-dev.w3.org should  
systematically have the latest running (or broken, as it happens)  
code, if you need to check for recent changes.

Thanks!
-- 
olivier

Received on Thursday, 31 May 2007 01:08:27 UTC