Re: From embedded structured data to queryable websites from Ruben Verborgh on 2017-01-27 (public-lod@w3.org from January 2017)

From: Ruben Verborgh <Ruben.Verborgh@UGent.be>
Date: Fri, 27 Jan 2017 13:50:27 +0000
To: William Van Woensel <William.Van.Woensel@Dal.Ca>
CC: "public-lod@w3.org" <public-lod@w3.org>, Sven Casteleyn <sven.casteleyn@uji.es>
Message-ID: <ED40650C-2E9B-43C1-9038-EDED16A4A769@ugent.be>

Hi William,

> After having privately discussed this idea of "queryable websites" (as well as some other related ideas) a while ago with Ruben, mentioning my own (partial) implementation and offering to cooperate on the effort

If I recall correctly, the exchange we had was about analyzing what metadata was there,
and what metadata would be sufficient (and to find the definition of “sufficient”).
That's a question that's indeed not answered yet, you can see I left this as an opening:
– https://ruben.verborgh.org/articles/queryable-research-data/#open-questions-p-2
– https://ruben.verborgh.org/articles/queryable-research-data/#open-questions-p-3
– https://ruben.verborgh.org/articles/queryable-research-data/#open-questions-p-4

> I am quite surprised to see this idea reappear here now.

Your idea was about building cross-website applications;
it's something that we're still very far away of.

I have provided one simple way for my own website
to make itself queryable, i.e., a TPF interface instead of LD documents.
In fact, I've had the https://data.verborgh.org/ruben interface for months,
but it just included my FOAF profile as data (https://ruben.verborgh.org/profile/).
The only thing that I changed is that it now also includes my RDFa data.

> Unfortunately, this means that there are now two separate approaches and implementations, likely with a lot of shared code and duplicated work.

I'm afraid you overestimate the complexity of my solution :-)
It's just one 40-line Bash script, half of it dedicated to comments and variables:
https://github.com/RubenVerborgh/WebsiteToRDF/blob/6bcbbe92/extract-website-data

All it does is getting the RDFa out of my website
and applying some reasoning on it,
such that you don't have to mark up everything with 5 ontologies.
Just straightforward execution of existing commands.

> We're currently in the process of writing a journal paper on this work.

And I just wrote an LDOW2017 that details
what the effects are of the 40-line Bash script,
and precisely how they affect querying of 1 website.

> Regardless, in the same line of research, another major issue is to what extent *useful* embedded structured data are actually present in websites for 3rd party scenarios.

Yes, that was your initial mail to me I believe;
I still think we should pursue this;
I do not have any solution to that
expect for trying to cover as much as possible through reasoning.

> Consequently, a first useful step would be to study the scope of the available embedded structured data, and for what kind of third-party scenarios they could be useful. The Web Data Commons initiative recently released a new corpus - right on time for this kind of effort :)

+1

Best,

Ruben

Received on Friday, 27 January 2017 13:51:04 UTC