W3C home > Mailing lists > Public > semantic-web@w3.org > May 2019

Re: Are many (X)HTML documents also RDF/XML documents?

From: KANZAKI Masahide <mkanzaki@gmail.com>
Date: Tue, 7 May 2019 12:20:52 +0900
Message-ID: <CAHQ1n3Athijh9Ye+XX-EmmJ+quFXT1f4ZJCSpkWfhc+DYWnAbQ@mail.gmail.com>
To: Wouter Beek <wouter@triply.cc>
Cc: SW-forum Web <semantic-web@w3.org>
Hello Wouter,

Some XHTML codes could be parsed as RDF/XML while others cause parse
errors, because RDF/XML has some constraints to be a "striping"
(elements should represent node -- property -- node pattern).

W3C RDF Validator can parse your example XHTML by rapping with
<rdf:RDF..> and </rdf:RDF>. Actually, it had "Extended interface"
(existed until 2013, but now missing), where an option "RDF is NOT
enclosed in <RDF>...</RDF> tags (optional since 2004, eg. see many
DOAP files)" was provided.

Note the result shows a parse error claiming 'String data "some col 1"
not allowed.' (Another popular parse errors include mixed content and
unexpected attribute on property element)

If the <td> element is something

<td><span>some col 1</span><td>

then the validator accepts it and returns triples as you expected.


Ignoring some parse errors (good or bad), it would be possible to
interpret arbitrary XHTML as RDF/XML. My RDF visualizer ignores them
and seems to be able to handle most XHTML as RDF/XML [1] (check
XML/RDF option, otherwise it would be interpreted as Microdata HTML).

cheers,

[1] https://www.kanzaki.com/works/2009/pub/graph-draw

2019年5月7日(火) 5:03 Wouter Beek <wouter@triply.cc>:
>
> Dear SW community,
>
> The RDF/XML 1.1 specification contains the following two phrases:
>
>     When there is only one top-level node element inside rdf:RDF, the
> rdf:RDFcan be omitted although any XML namespaces must still be
> declared.
>
>     The XML specification also permits an XML declaration at the top
> of the document with the XML version and possibly the XML content
> encoding. This is optional but recommended.
>
> Does this mean that many/all (X)HTML documents are also RDF/XML
> documents?  If so, there is much more RDF out there than I had
> previously thought.  In fact, RDF would be at least as popular as HTML
> (contrary to common complaints from the SW community about RDF's
> popularity).
>
> Specifically, does the above mean that the following document should
> be parsed by a standards-compliant RDF/XML parser:
>
> ```xml
> <?xml version="1.0" encoding="utf-8"?>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
> "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml">
> <head>
>   <title>
>   </title>
> </head>
> <body>
>   <table>
>     <tr>
>       <td>some col 1</td>
>     </tr>
>   </table>
> </body>
> </html>
> ```
>
> , resulting in the following RDF triples (serialized in N-Triples):
>
> ```
> _:genid1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://www.w3.org/1999/xhtmlhtml> .
> _:genid2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://www.w3.org/1999/xhtmltitle> .
> _:genid1 <http://www.w3.org/1999/xhtmlhead> _:genid2 .
> _:genid3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://www.w3.org/1999/xhtmltable> .
> _:genid4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://www.w3.org/1999/xhtmltd> .
> _:genid3 <http://www.w3.org/1999/xhtmltr> _:genid4 .
> _:genid1 <http://www.w3.org/1999/xhtmlbody> _:genid3 .
> ```
>
> ---
> Best regards,
> Wouter Beek.
>
> Email: wouter@triply.cc
> WWW: https://triply.cc
> Tel: +31647674624
>


-- 
@prefix : <http://www.kanzaki.com/ns/sig#> . <> :from [:name
"KANZAKI Masahide"; :nick "masaka"; :email "mkanzaki@gmail.com"].
Received on Tuesday, 7 May 2019 03:21:36 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:51:27 UTC