W3C home > Mailing lists > Public > semantic-web@w3.org > May 2019

Re: Are many (X)HTML documents also RDF/XML documents?

From: Wouter Beek <wouter@triply.cc>
Date: Tue, 7 May 2019 07:20:19 +0200
Message-ID: <CAEh2WcM9jnakeRofz9VG=wCjX6FmNm3PAd8Pe+U_2Erb6bKvjw@mail.gmail.com>
To: KANZAKI Masahide <mkanzaki@gmail.com>
Cc: SW-forum Web <semantic-web@w3.org>
Dear Kanzaki,

Thank you for your information about the change in the W3C RDF/XML
validator.  IIUC the absence of the `rdf:RDF' root is valid, as long
as there is one parent node (which is usually the case in HTML: the
`<html>' tag).  So that would mean that the W3C validator is not so
useful ATM, until the "Extended interface" is added back.

And also thank you for pointing towards a criterion that may determine
whether or not an (X)HTML document is also an RDF/XML document:

> RDF/XML has some constraints to be a "striping"
> (elements should represent node -- property -- node pattern).

I tested this by adding more nesting to my example XHTML document.  I
understand your hypothesis as saying that (X)HTML documents with 3
levels of nesting are RDF/XML, yet (X)HTML documents with more or less
levels of nesting are not RDF/XML.  I tested this hypothesis by adding
more nesting to my test document:

```
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>
    </title>
  </head>
  <body>
    <table>
      <tbody>
        <tr>
          <td>some <em>col</em> 1</td>
        </tr>
      </tbody>
    </table>
  </body>
</html>
```

But this still parses as 100% correct RDF/XML with Rapper: it adds
more blank nodes in order to tie the additional levels of nesting
together.

I'm looking for the specific criterion in the RDF/XML 1.1
specification that is implemented incorrectly by Rapper and that
causes it to parse this XHTML document as RDF/XML.  If there is no
such criterion, then RDF is a far more popular language than many
might have previously believed.

---
Best,
Wouter.

Email: wouter@triply.cc
WWW: https://triply.cc
Tel: +31647674624

On Tue, May 7, 2019 at 5:21 AM KANZAKI Masahide <mkanzaki@gmail.com> wrote:
>
> Hello Wouter,
>
> Some XHTML codes could be parsed as RDF/XML while others cause parse
> errors, because RDF/XML has some constraints to be a "striping"
> (elements should represent node -- property -- node pattern).
>
> W3C RDF Validator can parse your example XHTML by rapping with
> <rdf:RDF..> and </rdf:RDF>. Actually, it had "Extended interface"
> (existed until 2013, but now missing), where an option "RDF is NOT
> enclosed in <RDF>...</RDF> tags (optional since 2004, eg. see many
> DOAP files)" was provided.
>
> Note the result shows a parse error claiming 'String data "some col 1"
> not allowed.' (Another popular parse errors include mixed content and
> unexpected attribute on property element)
>
> If the <td> element is something
>
> <td><span>some col 1</span><td>
>
> then the validator accepts it and returns triples as you expected.
>
>
> Ignoring some parse errors (good or bad), it would be possible to
> interpret arbitrary XHTML as RDF/XML. My RDF visualizer ignores them
> and seems to be able to handle most XHTML as RDF/XML [1] (check
> XML/RDF option, otherwise it would be interpreted as Microdata HTML).
>
> cheers,
>
> [1] https://www.kanzaki.com/works/2009/pub/graph-draw
>
> 2019年5月7日(火) 5:03 Wouter Beek <wouter@triply.cc>:
> >
> > Dear SW community,
> >
> > The RDF/XML 1.1 specification contains the following two phrases:
> >
> >     When there is only one top-level node element inside rdf:RDF, the
> > rdf:RDFcan be omitted although any XML namespaces must still be
> > declared.
> >
> >     The XML specification also permits an XML declaration at the top
> > of the document with the XML version and possibly the XML content
> > encoding. This is optional but recommended.
> >
> > Does this mean that many/all (X)HTML documents are also RDF/XML
> > documents?  If so, there is much more RDF out there than I had
> > previously thought.  In fact, RDF would be at least as popular as HTML
> > (contrary to common complaints from the SW community about RDF's
> > popularity).
> >
> > Specifically, does the above mean that the following document should
> > be parsed by a standards-compliant RDF/XML parser:
> >
> > ```xml
> > <?xml version="1.0" encoding="utf-8"?>
> > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
> > "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
> > <html xmlns="http://www.w3.org/1999/xhtml">
> > <head>
> >   <title>
> >   </title>
> > </head>
> > <body>
> >   <table>
> >     <tr>
> >       <td>some col 1</td>
> >     </tr>
> >   </table>
> > </body>
> > </html>
> > ```
> >
> > , resulting in the following RDF triples (serialized in N-Triples):
> >
> > ```
> > _:genid1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> > <http://www.w3.org/1999/xhtmlhtml> .
> > _:genid2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> > <http://www.w3.org/1999/xhtmltitle> .
> > _:genid1 <http://www.w3.org/1999/xhtmlhead> _:genid2 .
> > _:genid3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> > <http://www.w3.org/1999/xhtmltable> .
> > _:genid4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> > <http://www.w3.org/1999/xhtmltd> .
> > _:genid3 <http://www.w3.org/1999/xhtmltr> _:genid4 .
> > _:genid1 <http://www.w3.org/1999/xhtmlbody> _:genid3 .
> > ```
> >
> > ---
> > Best regards,
> > Wouter Beek.
> >
> > Email: wouter@triply.cc
> > WWW: https://triply.cc
> > Tel: +31647674624
> >
>
>
> --
> @prefix : <http://www.kanzaki.com/ns/sig#> . <> :from [:name
> "KANZAKI Masahide"; :nick "masaka"; :email "mkanzaki@gmail.com"].
Received on Tuesday, 7 May 2019 05:21:21 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:51:27 UTC