Re: making our HTML+RDFa queryable

On Tue, Jan 24, 2017 at 4:31 PM, Ruben Verborgh <Ruben.Verborgh@ugent.be> wrote:
> We have good HTML5+RDFa parsers;
> it's not harder or easier than any other option.
>
> To parse XHTML+RDFa with XSLT,
> you still need the right XSLT source code.
> Similarly, parsing HTML5 requires code.
> In the end it just boils down to executing one command.
>

The effort and the level of abstraction are not the same.

With XSLT, the syntax is taken care of, and you simply write a
declarative transformation of one tree into another.

With HTML5, you cannot reuse XML tools, so you have to first parse the
syntax, and then code some imperative extraction algorithm, which will
be much more platform-dependent than XSLT.

Although it would possible to combine the two approaches and transform
the HTML5 tree with XSLT.

BTW such stylesheets already exist: http://ns.inria.fr/grddl/rdfa/

> And BTW, forcing XHTML on publishers
> didn't work for publishers, not harvesters.
>

I understand that people published tag soup HTML and still expected it
to work. But with this attitude the web will not get anywhere.

How come nobody is expecting broken JavaScript syntax to run
correctly? Somehow because HTML is markup, its syntax can be "lax".
This is the part I don't get.

Without correct syntax there is no layer to build semantics on. So
those who tolerate tag soup, should not expect Semantic Web.


End of rant :) I agree with the rest.

Received on Tuesday, 24 January 2017 15:58:46 UTC