Help/feedback needed on the RDF Literal Direction Working Group charter!

Dear all,

about a month ago I have forwarded a mail[1] on this list on the fact that W3C is discussing the possible creation of a W3C Working Group making a minor extension of the core RDF model. This extension aims at taking care of a very old problem (some would call it a bug), namely that while it is possible to express the natural language of text literals in RDF, it is not possible to also express the base direction of the same text (in contrast to what, e.g., HTML can do in this respect).

There is a separate document[2] that describes the issue in more details and also outlines some solutions that some in the community have discussed (the issue list[3] on the corresponding GitHub repository reflects the discussions themselves). As a result of this discussion a Working Group charter draft has been created[4]. However, we need the feedback of the Semantic Web community to be able to choose the best way forward. While it has a relatively easy solution in theory for RDF, the exact approach to be taken for the deployment thereof without an extensive disturbance of the overall RDF environment is the major issue. Hence it is absolutely important to get feedback of... you, the Semantic Web community! (I know it is summer time, not an ideal period for such discussions.)

So... if you have comments on the charter[4], please bring them on! (Preferable by raising issues in [3] but if you prefer an email thread, that is fine; I will possibly convert those to issues.) Actually, as you can see in the charter text, the charter itself is unfinished in the sense that we will have to choose between two general approaches as for the deployment, and we didn't (couldn't) really make this choice yet.

Thanks for your time,

Cheers,

Ivan

P.S. You might think: why is this an RDF issue in the first place? Isn't it true that Unicode takes care of all this? 

There have been lots of discussions around this in [3], but let me give a very simple example. Let us imagine we have a client side program running in an HTML page that displays data it receives via some RDF data. Let us also say we have, in the data, a statement like:

[] ex:num "0031 64 1044 100" .

The literal is, in fact, a phone number that is grouped as phone numbers are usually grouped (i.e., it is _not_ a number). If it is put into HTML like this:

<p>A <span>0031 64 1044 100</span></p>

then things are of course fine. However, if *the same* literal data is put into a, say, Hebrew text like this:

<p dir="rtl">א <span>0031 64 1044 100</span></p>

things will go wrong. Indeed, the phone number will be displayed as: "100 1044 64 0031" which is, as a phone number, wrong (try it out, I have copied the full HTML below!). One has to do something like:

<p dir="rtl">א <span dir="ltr">0031 64 1044 100</span></p>

to get the phone number semantically correct on the screen.

If the program _knows_, out-of-band, that the value of ex:num is a phone number and it therefore has a special logic for this, then of course the app's output will be fine. But relying on such out-of-band knowledge is always error prone. The right way is to add this fact into the data somehow, by making it clear that the literal value "0031 64 1044 100" _must_ be understood as left-to-right, no matter which environment it will be displayed. There are a other, similar examples and also explanations for the underlying official algorithms, in [5,6,7]. 


[1] https://lists.w3.org/Archives/Public/semantic-web/2019Jul/0003.html
[2] https://w3c.github.io/rdf-dir-literal/
[3] https://github.com/w3c/rdf-dir-literal/issues/
[4] https://w3c.github.io/rdf-dir-literal/draft-charter.html
[5] https://www.w3.org/TR/string-meta/
[6] https://www.w3.org/International/articles/uba-basics/
[7] https://www.w3.org/International/articles/inline-bidi-markup/

----

<html>
<head>
    <title>Phone number and text direction</title>
</head>
<body>
    <p>A <span>0031 64 1044 100</span></p> 
    
    <p dir="rtl">א <span>0031 64 1044 100</span></p>

    <p dir="rtl">א <span dir="ltr">0031 64 1044 100</span></p>
</body>
</html>


----
Ivan Herman, W3C 
Publishing@W3C Technical Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: https://orcid.org/0000-0003-0782-2704

Received on Tuesday, 30 July 2019 06:58:28 UTC