[uriMediaType-9] Identifying Content Classes in HTTP Using rdf:type

The relevent TAG issue: "Why does the Web use mime types and not URIs?
| Media types are not first-class objects on the Web, or are they?"

Using MIME types to identify content types is bad because of the
centralization around the IANA - URIs should identify all important
things on the Web [1], and content types are no exception. In fact,
the content type idiom is isomorphic with that of the "Class" class in
RDF Schema [2]. The rdf:type property [3] identifies the class to
which a particular subject resource belongs. There are real-life
situations where an RDF type association with some generic document in
HTTP space is of use: for example, the Notation3 format [4] uses the
following URI reference as the class of all Notation3 documents:-


OTOH, going through the IANA registration process does not seem to be
an option for the author of the language, although I am not sure why.
The fact that common XML RDF (an alternate XML-based serialization of
the semantics behind N3) does not yet have a MIME type definition that
can be agreed upon is probably a factor.

Use of URIs as content type (class) identifiers follows the "URI only"
schema level as discussed in the "Evolution" DesignIssues document
[5]. This level of language identification is useful where
identification-without-interpretation is required, as is the case with
Notation3 (the syntax [and occasionally semantics] of which actually
changes with the seasons).

The options for identifying content types in URI space were discussed
heavily on the #rdfig IRC channel on 2001-10-25 - the logs for which
are available [6] - between TimBL, Mark Nottingham, Sandro Hawke,
myself, and Aaron Swartz. From what I can gather, the options that
were raised are as follows:-

* Use the regualar HTTP Content-Type header, since there isn't a
problem with it - force people to go through the IANA registration
* Use a new HTTP header identifying a type that uses a URI, e.g.
Sandro's Formal-Lanuage-Definition, which actually (as I understand
it) points to a Blindfold grammar (a type of BNF)
* Use some new header that gives the rdf:type of the HTTP entity-body
[the approach that I favour, and that I a pitching in this message],
e.g. "RDF-Type"
* Use a new header for linking to an RDF file, and give the
serialization type (this sets up a vicious circle)
* Use a new HTTP header for just adding triples to the entity header
(e.g. "N3")
* Use a magic file type - Sandro proposed to use an emacsy string
* In XML only: the namespace, or the conjunction of all namespaces

Clearly, this is not a restrictive choice - people will almost always
have to use heuristics at some level to deduce the content type - but
we can make it easier on them by picking a few standardized ways. The
Content-Type header has been in use for many years now, and is clearly
of great utility. However, I feel that adding the afforementioned
"RDF-Type" header would be useful too, for the reasons discussed
above. If the content type can be identified at some point as being
XML, then you defer to the namespace (and I understand that
discussions about what constitutes a namespace in XML is ongoing on
TAG [7]).

Footnote about classes of document: my own personal opinion at the
moment is that a language is a class is an XML namespace. I agree with
Patrick Stickler that namespaces in RDF are just "punctuation" (as he
puts it), but I think that in XML in the wider case, a language and a
namespace are pretty much inseparably binded. But that says nothing
about what the definition of a language is, or of language mixing, and
so really I'm just dodging the issue.

In any case the use of schemata to define content class constraints is
noted by the XML Schema primer:-

The purpose of a schema is to define a class of XML documents, and so
the term "instance document" is often used to describe an XML document
that conforms to a particular schema.
]]] - http://www.w3.org/TR/xmlschema-0/#POSchema

This is not necessarily constrained to syntactic schemata either - I
have been investigating the overlap between syntactic and semantic
schemata and the impact on accessibility in my draft "Sands: Syntax
and Semantics for the XML Accessibility Guidelines" [8].

The conclusion is that I believe that:-

* A specification detailing an "RDF-Type" HTTP header should be
registered as an informational RFC
* The IANA should provide a mapping from the currently registered
content types to a list of URIs (there are some obvious choices for
this mapping)


[1] "Any resource of significance should be given a URI" -
[2] http://www.w3.org/TR/2000/CR-rdf-schema-20000327/#s2.2.3
[3] http://www.w3.org/TR/2000/CR-rdf-schema-20000327/#s2.3.1
[4] http://www.w3.org/DesignIssues/Notation3
[5] http://www.w3.org/DesignIssues/Evolution
[6] http://ilrt.org/discovery/chatlogs/rdfig/2001-10-25.txt
[7] http://www.w3.org/2001/tag/ilist#nsMediaType-3

Kindest Regards,
Sean B. Palmer
@prefix : <http://purl.org/net/swn#> .
:Sean :homepage <http://purl.org/net/sbp/> .

Received on Saturday, 9 February 2002 18:28:15 UTC