W3C home > Mailing lists > Public > semantic-web@w3.org > May 2019

Re: Are many (X)HTML documents also RDF/XML documents?

From: Wouter Beek <wouter@triply.cc>
Date: Tue, 7 May 2019 07:03:10 +0200
Message-ID: <CAEh2WcPtvH8O=c2vbCeWJFh8E4J3KQBT0wVX8WpGUfH=yVC4sg@mail.gmail.com>
To: "Charles 'chaals' (McCathie) Nevile" <chaals@yandex.ru>
Cc: "semantic-web@w3.org" <semantic-web@w3.org>
Dear Charles,

I'm not referring to RDFa, JSON-LD, or Microformats in my question.
I'm specifically interested which (X)HTML-only documents are also
RDF/XML documents.

If you say that most (X)HTML documents are not RDF/XML documents, do
you mean that there is a specific criterion that most (X)HTML
documents fail to meet that makes them not RDF/XML documents?  I am
interested in making such a criterion explicit.

---
Best,
Wouter.

Email: wouter@triply.cc
WWW: https://triply.cc
Tel: +31647674624

On Tue, May 7, 2019 at 1:02 AM Charles 'chaals' (McCathie) Nevile
<chaals@yandex.ru> wrote:
>
> Then the answer is clear: No, for the most part they are not.
>
> The exception is data (mostly schema.org) encoded as RDFa or JSON-LD.
>
> There is also a sense in which a reasonable amount of microformats and
> microdata (the latter has been most of the entire included data in the
> wild, but I think JSON-LD might catch it one day), is reasonably
> straiightforwardly RDF.
>
> Collectively all of that is not uncommon - reasonable claims suggest
> double-digit percentages of modern web content and *maybe* as much as a
> quarter or more.
>
> For example schema.org's *model* for the data is RDF, whatever the
> encoding. On the other hand microdata was specifically designed as an
> anti-RDF, so a certain amount of it isn't RDF by any stretch, and in any
> event you have to process it so I am not sure how that counts in what you
> are looking for (you have to process JSON-LD and RDFa, both of which are
> explicitly RDF, to match one to another...)
>
> cheers
>
> Chaals
>
> On Tue, 07 May 2019 00:46:49 +0200, Wouter Beek <wouter@triply.cc> wrote:
>
> > Dear Martynas,
> >
> > I am not interested in generating RDF/XML from non-RDF input.  I'm
> > asking whether all/most/some regular HTML documents are also RDF
> > documents (without applying additional transformations).
> >
> > ---
> > Best,
> > Wouter.
> >
> > Email: wouter@triply.cc
> > WWW: https://triply.cc
> > Tel: +31647674624
> >
> > On Tue, May 7, 2019 at 12:03 AM Martynas Jusevičius
> > <martynas@atomgraph.com> wrote:
> >>
> >> You could generate the desired RDF/XML output with XSLT quite easily.
> >> This is what GRDDL is about:
> >> https://www.w3.org/TR/grddl/#grddl-xhtml
> >>
> >> On Mon, May 6, 2019 at 10:02 PM Wouter Beek <wouter@triply.cc> wrote:
> >> >
> >> > Dear SW community,
> >> >
> >> > The RDF/XML 1.1 specification contains the following two phrases:
> >> >
> >> >     When there is only one top-level node element inside rdf:RDF, the
> >> > rdf:RDFcan be omitted although any XML namespaces must still be
> >> > declared.
> >> >
> >> >     The XML specification also permits an XML declaration at the top
> >> > of the document with the XML version and possibly the XML content
> >> > encoding. This is optional but recommended.
> >> >
> >> > Does this mean that many/all (X)HTML documents are also RDF/XML
> >> > documents?  If so, there is much more RDF out there than I had
> >> > previously thought.  In fact, RDF would be at least as popular as HTML
> >> > (contrary to common complaints from the SW community about RDF's
> >> > popularity).
> >> >
> >> > Specifically, does the above mean that the following document should
> >> > be parsed by a standards-compliant RDF/XML parser:
> >> >
> >> > ```xml
> >> > <?xml version="1.0" encoding="utf-8"?>
> >> > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
> >> > "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
> >> > <html xmlns="http://www.w3.org/1999/xhtml">
> >> > <head>
> >> >   <title>
> >> >   </title>
> >> > </head>
> >> > <body>
> >> >   <table>
> >> >     <tr>
> >> >       <td>some col 1</td>
> >> >     </tr>
> >> >   </table>
> >> > </body>
> >> > </html>
> >> > ```
> >> >
> >> > , resulting in the following RDF triples (serialized in N-Triples):
> >> >
> >> > ```
> >> > _:genid1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> >> > <http://www.w3.org/1999/xhtmlhtml> .
> >> > _:genid2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> >> > <http://www.w3.org/1999/xhtmltitle> .
> >> > _:genid1 <http://www.w3.org/1999/xhtmlhead> _:genid2 .
> >> > _:genid3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> >> > <http://www.w3.org/1999/xhtmltable> .
> >> > _:genid4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> >> > <http://www.w3.org/1999/xhtmltd> .
> >> > _:genid3 <http://www.w3.org/1999/xhtmltr> _:genid4 .
> >> > _:genid1 <http://www.w3.org/1999/xhtmlbody> _:genid3 .
> >> > ```
> >> >
> >> > ---
> >> > Best regards,
> >> > Wouter Beek.
> >> >
> >> > Email: wouter@triply.cc
> >> > WWW: https://triply.cc
> >> > Tel: +31647674624
> >> >
> >
>
>
> --
> Using Opera's mail client: http://www.opera.com/mail/
>
Received on Tuesday, 7 May 2019 05:04:11 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:51:27 UTC