W3C home > Mailing lists > Public > semantic-web@w3.org > May 2019

Re: Are many (X)HTML documents also RDF/XML documents?

From: Charles 'chaals' (McCathie) Nevile <chaals@yandex.ru>
Date: Tue, 07 May 2019 00:56:57 +0200
To: semantic-web@w3.org
Message-ID: <op.z1d4o7qkag6dn2@chaalss-macbook-pro.local>
Then the answer is clear: No, for the most part they are not.

The exception is data (mostly schema.org) encoded as RDFa or JSON-LD.

There is also a sense in which a reasonable amount of microformats and  
microdata (the latter has been most of the entire included data in the  
wild, but I think JSON-LD might catch it one day), is reasonably  
straiightforwardly RDF.

Collectively all of that is not uncommon - reasonable claims suggest  
double-digit percentages of modern web content and *maybe* as much as a  
quarter or more.

For example schema.org's *model* for the data is RDF, whatever the  
encoding. On the other hand microdata was specifically designed as an  
anti-RDF, so a certain amount of it isn't RDF by any stretch, and in any  
event you have to process it so I am not sure how that counts in what you  
are looking for (you have to process JSON-LD and RDFa, both of which are  
explicitly RDF, to match one to another...)

cheers

Chaals

On Tue, 07 May 2019 00:46:49 +0200, Wouter Beek <wouter@triply.cc> wrote:

> Dear Martynas,
>
> I am not interested in generating RDF/XML from non-RDF input.  I'm
> asking whether all/most/some regular HTML documents are also RDF
> documents (without applying additional transformations).
>
> ---
> Best,
> Wouter.
>
> Email: wouter@triply.cc
> WWW: https://triply.cc
> Tel: +31647674624
>
> On Tue, May 7, 2019 at 12:03 AM Martynas Jusevičius
> <martynas@atomgraph.com> wrote:
>>
>> You could generate the desired RDF/XML output with XSLT quite easily.
>> This is what GRDDL is about:
>> https://www.w3.org/TR/grddl/#grddl-xhtml
>>
>> On Mon, May 6, 2019 at 10:02 PM Wouter Beek <wouter@triply.cc> wrote:
>> >
>> > Dear SW community,
>> >
>> > The RDF/XML 1.1 specification contains the following two phrases:
>> >
>> >     When there is only one top-level node element inside rdf:RDF, the
>> > rdf:RDFcan be omitted although any XML namespaces must still be
>> > declared.
>> >
>> >     The XML specification also permits an XML declaration at the top
>> > of the document with the XML version and possibly the XML content
>> > encoding. This is optional but recommended.
>> >
>> > Does this mean that many/all (X)HTML documents are also RDF/XML
>> > documents?  If so, there is much more RDF out there than I had
>> > previously thought.  In fact, RDF would be at least as popular as HTML
>> > (contrary to common complaints from the SW community about RDF's
>> > popularity).
>> >
>> > Specifically, does the above mean that the following document should
>> > be parsed by a standards-compliant RDF/XML parser:
>> >
>> > ```xml
>> > <?xml version="1.0" encoding="utf-8"?>
>> > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
>> > "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
>> > <html xmlns="http://www.w3.org/1999/xhtml">
>> > <head>
>> >   <title>
>> >   </title>
>> > </head>
>> > <body>
>> >   <table>
>> >     <tr>
>> >       <td>some col 1</td>
>> >     </tr>
>> >   </table>
>> > </body>
>> > </html>
>> > ```
>> >
>> > , resulting in the following RDF triples (serialized in N-Triples):
>> >
>> > ```
>> > _:genid1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> > <http://www.w3.org/1999/xhtmlhtml> .
>> > _:genid2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> > <http://www.w3.org/1999/xhtmltitle> .
>> > _:genid1 <http://www.w3.org/1999/xhtmlhead> _:genid2 .
>> > _:genid3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> > <http://www.w3.org/1999/xhtmltable> .
>> > _:genid4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> > <http://www.w3.org/1999/xhtmltd> .
>> > _:genid3 <http://www.w3.org/1999/xhtmltr> _:genid4 .
>> > _:genid1 <http://www.w3.org/1999/xhtmlbody> _:genid3 .
>> > ```
>> >
>> > ---
>> > Best regards,
>> > Wouter Beek.
>> >
>> > Email: wouter@triply.cc
>> > WWW: https://triply.cc
>> > Tel: +31647674624
>> >
>


-- 
Using Opera's mail client: http://www.opera.com/mail/
Received on Monday, 6 May 2019 22:57:27 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:59 UTC