- From: Sandro Hawke <sandro@w3.org>
- Date: Sun, 09 Feb 2014 21:30:31 -0500
- To: RDF WG <public-rdf-wg@w3.org>
I've been working on what to put at the rdf: and rdfs: namespaces. This
is a long email, with three sections.
1. goals for the project
2. what my software will look for in RDF data
3. specific issues with the two RDF namespaces
Perhaps the most interesting/novel bit is about HTML and language tags.
Plus my idea for handling rdf:_100. :-)
This is progress on ACTION-98, my last remaining W3C action item (!).
=== PART 1: Goals ===
1. If someone puts the IRI of a term in the rdf: or rdfs: namespaces
into their browser, they'll get some nice documentation on that term.
The URL field will continue to show that term (it wont redirect).
2. That documentation will link to other terms. When it does so,
clicking will repeat the experience as above: the IRI of the term will
be in the URL field of the browser, and the user will see decent
documentation.
3. The documentation will be available in multiple languages. We don't
need this on day one, but we currently have the dcat schema in English,
Spanish, Arabic, Greek, French, and Japanese (thanks for Phil Archer's
pushing on that). I'm still learning how to do a multi-lingual webapp:
there's an early version at
http://www.w3.org/2013/vocabspec/examples/dcat.html -- in that version,
you use the gear in the upper-right to change the language. I'm in the
process of changing it to use the browser's language setting as a
starting point, then allow a simpler selection control.
4. The software will be available so other people can do this with their
namespace documents easily enough.
5. The documentation will be entirely driven by RDF triples that one
gets by dereferencing the namespace documents while asking for
Content-Type text/turtle. In the dcat example above, view source shows
the .html file is just a shell; the content is generated at browse-time
from the turtle at the dcat namespace.
6. In the real deployment, there will also be content-negotiated static
versions at the namespace URL so that search engines and non-javascript
browsers can see the content as well. Folks hosting namespace
documents, if they want this, will have to run a node.js program to
re-generate the static files whenever the turtle is changed.
7. Over time, I want to evolve the code to include social features like
crowdsourced translations, stars (aka "like", "+1", "endorsement",
"bookmarks"), and links to code and public data sources that use the
term. Obviously that will have to be done carefully to avoid detracting
from the official documentation. (This part is a research project.)
I think that's it.
=== PART 2: What my software will look for in RDF data ===
One challenge is that for expressing the documentation in RDF, I don't
know of any consensus around a vocabulary or how to use it. Here's my
best guess, but I'm making some of this up. Feel free to correct me
(but soon, please).
The basic predicates:
* rdfs:label - a name, usually one or two words; the English version
will usually be the same as the end part of the IRI.
* rdfs:comment - a descriptive phrase, usually 5-10 words, might be in
rdf:HTML, especially if it needs to other terms in the vocabulary
* dc:description - a longer, definitional description, usually 1-5
paragraphs (using rdf:HTML for formatting). For rdf and rdfs, I plan
to copy the HTML out of RDF Schema 1.1. For example, for rdf:type it'll
be the stuff at
https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-schema/index.html#ch_type
* vann:usageNote - arbitrary length, more practical and less
definitional than dc:description. (I don't plan to use this for rdf:
and rdfs:, but it's used by dcat and others.)
* dc:title - for the title of the namespace document; ignored on terms
It'll also show the rdfs:domain, rdfs:range, rdf:type, rdfs:subClassOf,
rdfs:subPropertyOf, and other bits I can think of how to include without
making things to complicated. Also, rdfs:isDefinedBy linking to the
right section of the spec.
On language/HTML handing:
I think this is how to do it:
<some term> dc:description
'description of that term in plaintext English',
'description of that term in plaintext English'^en,
'<div lang="en">description of that term in HTML English</div>',
'description of that term in plaintext French'^fr,
'<div lang="fr">description of that term in HTML French</div>',
... etc
The xs:string is there for non-multilingual apps, and to use as the
fallback (with a warning) if no matching languages are found.
This approach implies that predicates with natural language expressions
as their range MUST be conceptually single-valued. You can't do
this: { <s> rdfs:comment "some comment"^en, "some other comment"^en.
} I expect I'll have my software display a warning if this kind of
thing (two values with the language language) occurs in the data. See
[1] for some more discussion of this.
I guess I'll treat an HTML literal without a lang attr on the first
element as like the xs:string literal -- a fallback for when no
available values lang-match the user's preferences.
I plan to ignore triples giving domain, range, or subClassOf as
rdfs:Resource, since they're meaningless.
=== PART 3: Specific issues with the two RDF namespaces ===
* Should we include any dct:creator or dct:contributor triples? It's
hard to make that helpful and fair given all the people who've been
involved with these namespaces over the years.
* Should we leave out the meaningless triples giving domain, range, or
subClassOf as rdfs:Resource? There's some pretty odd stuff there now.
* What should we do about rdf:_1, etc? I'd think having the first few
in the namespace document would make sense, maybe rdf:_1 through
rdf:_20. I *could* put in special javascript for arbitrary ones, but
that seems kind of goofy.
* Can we say *anything* about how investing in implementing reification
systems might not be your best bet? Pretty, pretty please? Or do we
have to let that wait for the commenting mechanism?
* What's the title of the rdf: namespace document? I propose, "The Core
RDF Vocabulary"
* What formats do we serve the schemas in? They've been just RDF/XML so
far. Left to myself, I'd just do Turtle. I'm okay with including any
other format for which there's a serializer available for node.js, so I
can generate them out of the same system. If someone wants json-ld,
could they please write a @context that makes the schema look nice in json?
So... thoughts?
-- Sandro
[1] http://lists.w3.org/Archives/Public/public-rdf-wg/2013Dec/0151.html
Received on Monday, 10 February 2014 02:30:41 UTC