Re: Parsing HTML

On 5/23/07, Norman Walsh <ndw@nwalsh.com> wrote:
> / Alex Milowski <alex@milowski.org> was heard to say:
> | We already have this issue for p:load and the 'validate' option.  If you don't
> | support validation, you get a dynamic error.  Given your preference, we
> | should change that as well.
>
> Yes, I'm not sure I like that idea. I don't see why the load function
> should not support validation.
>
> |> For file: URIs, I don't think p:load can be relied up on to give you a
> |> media type.
> |
> | Right.  Maybe p:load just fails if it isn't XML and you should use
> | the p:http-request + "parse html" step sequence instead.
>
> How do you get p:http-request to load a file: URI?
>
> I suppose a simple GET would do that...

Right.  That would be the only problem.  You wouldn't be able
to use *http* to get an HTML document off of disk.

I think I'd be OK with allowing the p:http-request step to handle
any protocol it can map to the method requested.  That means
that POST/etc. wouldn't be allow for the 'file' scheme but GET,
HEAD, and DELETE would be allowed.

We could say that an implementation is required to support "http"
and may support https, file, or other schemes.

-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics

Received on Wednesday, 23 May 2007 16:03:40 UTC