Re: do XML Datatypes work for RDF?

This text is about how literals can be handled as a type of resource.

I was thinking about language handling and was provoked by TBLs text:

I would like this to be considered as something that should be
included in future RDF and RDFS specs.

 The content of literals or other resources

RDF models consists of triples {predicate, subject, object}. The
object can be a Resource or Literal. The literal can consist of any
data. It could be an image, sound, algorithm, text etc. Each of these
literal types are objects witch can have internal metadata. A mp3
object has the title of the song embedded in its own format. The
language of a text is considered to be a part of the text
object. Section 6 of the M & S states:

    The xml:lang attribute may be used as defined by [XML] to
    associate a language with the property value. There is no specific
    data model representation for xml:lang (i.e., it adds no triples
    to the data model); the language of a literal is considered by RDF
    to be a part of the literal.

 The type of literals or other resources

The RDF Schema spec let you describe the range of properties, like

  <rdf:Property ID="age">
      <rdfs:domain rdf:resource="#Person"/>

That means that the object should be of that type, regardless of if
its a literal or other type of resource.

Age as a literal:

    <type resource=""/>

Age as a resource:

<Description about="">
    <type resource=""/>

I will later argue for that should be a subClassOf

 Refering to properties in literals or other resources

In a application based on RDF, you want to represent everything about
a specific object as triples.   

Lets say that you want to know the number of colours in a gif
image. To get that information, the application has to know

 (1) that the data is in gif format and

 (2) how to get the information out of the gif data.

The gif data could be stored somwhere on the web, retrievable by an
URL or placed as a literal in the model.  RDF has the rdf:type
property for describing the type of an object. This property can
universaly be aplied to describe the type of data in RDF. For gif
images and related formats, the mime-type can be used.

Gif as a literal:

    <type resource=""/>

Gif as a resource:

<Description about="">
    <type resource=""/>

 Saving the extracted information from a literal or other resource

Provided that the application both knows that the data is a image/gif
and how to extract data from that gif, it can act as a proxy for other
applications / methods / algoritms that want to consider / reason /
find / present data based on the properties of the gif.  The image has
to have a URI to be represented in the RDF model.

Metadata about gif, the literal:

<Description ID="obj_01">
    <type resource=""/>

Metadata about gif, the resource:

<Description about="">
    <type resource=""/>

 Handling of language of literals or other resources

The language of a string is taken to be a part of the literal.  This
can work for dedicated RDF applications.  But a generalized RDF
application want to internaly represent the language in the same
manner as all other data.

If the language of a object is represented as a property; that
statement could be reified in the same way as all other statmeents.
And the language could be used in RDF reasoning or querying or

The xml representation can still use xml:lang, but we would like to
also represent that information in the model.

As with images, text is just another data format. The application
parses the text and extracts the wanted information - in this case the
language information. 

Language of literal:  (The type declaration here is redundant)

<Description ID="obj_02">
    <value xml:lang="se">Hej alla barn</value>

Language of resource:  (language info will come from http header)

    <type resource=""/>

 Updating the literal/resource or updating the statement

Then you represent a literal as above, it gets its own resource URI.
Every statement you do is about that special resource. The resource
does not represent every literal or object with the same content. It
does only represent the specific literal/resource at hand.

Any RDF document could give an ID to its described literal or
resource. That means that you could have multiple resources with the
same content.

Lets say we have an application that lets you update a short
description about your intrests. This discription could be a literal
or a resource.  What should the application do if you update that
description?  Should it

 (a) Create a new text object and change the description statement to
 point to the new object

 (b) Change the value of the text object

The answer to this is up to the application.  Some application wants
to handle version history. Other applications want to make sure the
value doesn't change. But many applications probably wouldn't care and
would like to just change the text value, without having to create a
whole new object.

 Literals as resources

I will try to explain why literals should be regarded as resources and
how this will work with RDF M&S and RDF Schema. 

Literals are data. Most of the time it's textual data. If it is a short
piece of textual data, you would like to inline it in a xml
representation. But if it's a large text, you will often prefere to
refere to the text as an external objekt.

Why should the model differ between data inlined in the xml syntax and
data stored apart from the xml document?

If all literals would be modeld as a resource having a value property,
we wouldn't have to make any diffrence between literals and other
resources.  The application could regard all literals as resources.

This would mean that the application would be able to access the value
of any resource. It wouldn't matter if the resource is a image, text
or XML.  It wouldn't matter if the value originaly came inlined in XML
representation of a RDF model or if it is laying still at a URL on
the web or sitting in a database or hiding insida an LDAP system or
anyghing else.

 How do the application know which resources represent data?

The application will have to decide on how it will handle each of the
known resources. When doing a presentation of a resource; should it
only list the known attributes and relations to other resources, or
sholud it also present the content of the resource?

If the resource is an image, a representation in html could be <img
src="my_image">. A text resource could be represented as <pre>Hej
alla barn</pre>.

How the application decides what resources to "resolve" and what
resources represent "unfetchable" objects, is its own problem.  It
could use all sorts of heuristics to make the decision.

But I think that the obvious way would be to use the rdf:type
property. If its a or or, it should be

Most literals would of course already have been fetched, and are
waiting in the cahce.

 What about resources of unknown types?

This would be most useful if there was a standard way to know what
types represent values and what types represent other objects.

I suggest that this should be done by considering all data objects as
of type Literal. That means that Text/plain, Image/gif, etc would all
be subClassOf MimeType. And Integer, Float, Boolean and even MimeType
would be a subClassOf Literal.

Then a application comes by a resource of a unknown type, it can still
tell if it is a value by parsing the schema and see if the type is a
subClassOf Literal.

 Handling of classic inline literals

A RDF document parser will read the literals inlined in the
document. It will return triples with the object marked as a literal
or resource.  The parser could do the same thing by instead generating
an extra triple stating that this resource is indeed a Literal. 

The current RDF/XML syntax permits undefined URIs for subjects.  These
URIs must be found or generated by the parser. The same would go
for all the inline literals: the parser would have to give them one
URI each.

 Suggested additions to the RDF XML syntax

The syntax has a way to reference a resource with a known URI: "about"

It can give an URI to a new resource: "ID"

It can even give an URI for a bag of statements: "bagID"

But where is no way to giva an URI to the resource representing an
actual statement.  That should be fixed. This is an example from RDF M
& S spec, section 4.1:

    <rdf:Description about=""
      <s:Creator>Ora Lassila</s:Creator>
      <s:Title>Ora's Home Page</s:Title>

  BagID specifies the identification of the container resource whose
  members are the reified statements about another resource.

I suggest that the syntax should make it possible to state all the
URIs involved. This could be done by a statementID, for statements,
and a literalID for the literals. Like this:

    <rdf:Description about=""
      <s:Creator rdf:statementID="S_001" rdf:literalID="L_001">Ora
      <s:Title rdf:statementID="S_002" rdf:literalID="L_002">Ora's Home

That would complete the equivalence between literals and other
resources (including statements).

 On the nature of the reified statement

By considering the example in RDF M&S spec section 4.2, i realize that
every statement indeed should have a unique URI. There could exist
several statements with identical {p,s,o}, but with diffrent URIs.

They must be treated as individual statements, because:

 (1) the attributes of the statements cant't be mixed up.
 (2) Changing or deleting a statement from one source will not affect
a statement from another source.

I have an example of the first point, described under "The nature of
the statement" in this message:

/ Jonas  -

Received on Sunday, 12 March 2000 15:23:24 UTC