Re: making our HTML+RDFa queryable

Hi Martynas,

> I the documents would be using XHTML instead of HTML, you could parse
> the RDFa metadata using a simple XSLT stylesheet.

We have good HTML5+RDFa parsers;
it's not harder or easier than any other option.

To parse XHTML+RDFa with XSLT,
you still need the right XSLT source code.
Similarly, parsing HTML5 requires code.
In the end it just boils down to executing one command.

And BTW, forcing XHTML on publishers
didn't work for publishers, not harvesters.

> Interestingly, those are the same companies that are able to harvest
> documents on Web scale

That's indeed a pressing issue:
who is able to query our data?

Far worse problems than parsers in this regard are:
– data being spread across different pages
– heterogeneity in vocabularies

For those problems, the pipeline I propose
is an easy solution for everybody who publishes RDFa.
Consumers no longer need to harvest.

> We are giving away the dog food yet have a hard time eating it
> ourselves. Which brings us back to the need for decentralization.

That's precisely my motivation.
If more people publish their data in similar ways,
we can query across multiple pages easily.
Linked Data is a good start,
but cannot offer completeness for SPARQL queries.
What I propose is that everybody has a lightweight interface instead,
and we're providing the tools to do exactly that.
This is where decentralized querying starts.

Best,

Ruben

Received on Tuesday, 24 January 2017 15:32:04 UTC