RE: URI for language identifiers from Patrick.Stickler@nokia.com on 2003-04-01 (www-rdf-interest@w3.org from April 2003)

From: <Patrick.Stickler@nokia.com>
Date: Tue, 1 Apr 2003 14:56:14 +0300
To: <algermissen@acm.org>
Cc: <www-rdf-interest@w3.org>
Message-ID: <A03E60B17132A84F9B4BB5EEDE57957B5FBB5A@trebe006.europe.nokia.com>
> -----Original Message-----
> From: ext Jan Algermissen [mailto:algermissen@acm.org]
> Sent: 01 April, 2003 13:44
> To: Stickler Patrick (NMP/Tampere)
> Cc: www-rdf-interest@w3.org
> Subject: Re: URI for language identifiers
> 
> 
> Patrick.Stickler@nokia.com wrote:
> 
> > > Example: If the predicate is from Dublin Core, the 
> subject is allways
> > > the webpage, never an abstract concept.
> > 
> > Well, actually, DC doesn't say that, neither explicitly by 
> any domain
> > assertion nor in any of the prose descriptions/comments of 
> the subject
> > property. So I don't see where you're getting that. 
> 
> Well, I take it as implicit in the semantics of Dublin Core. 
> In other words:
> If I want to make sense of an RDF statement that has a DC 
> predicate I need to
> know those semantics.

Yes, the predicate bears meaning. And yes, some predicates
have domains asserted for them which allow one to infer
some characteristic about the subject -- but that assertion
is not (necessarily) authoritative.

I.e. just because you use a given property with a given subject
does not *force* that subject to actually conform to the
presumptions about subjects for that property.

E.g. I can say

   #Bob rdfs:range xsd:integer .

which would imply that 

   #Bob rdf:type rdf:Property .

since it is the case that

   rdfs:range rdfs:domain rdf:Property .

but just because I expressed a statement using the rdfs:range
property with #Bob as its subject does not necessarily mean
that #Bob is actually a property. I could simply be spouting
nonsense.

*How* or *where* a given URI is used does not affect its 
authoritative meaning. Usage can only reflect the presumptions
of the user, but that does not usurp the authority fo the owner,
and may very well result in disagreement or ambiguity.


> But for the sake
> > of argument, let's just presume that there is something akin to
> > 
> >    dc:subject rdfs:domain ex:WebPage .
> > 
> > > I know, that is not RDF-ish thinking ;-)
> > 
> > Well, actually, it is.
> > 
> > If a predicate has a domain defined for it, then it is quite OK to
> > infer a type characteristic of any subject used with that predicate.
> > I.e., the use of the subject with that predicate is an implicit
> > assertion that the subject is of the particular type per 
> the specified
> > domain.
> 
> So, it would make sense for the DC folks to make these things 
> explicit,
> to publish them as an RDF document?

Well, if they are in fact intending that subjects have a domain
only of web pages (and I don't think they are) then yes, it's 
optimal if that knowledge is expressed explicitly (whether or not
in RDF is another matter).

However, I don't think that the domain of dc:subject is restricted
to web pages. I think you're reading alot into the spec there.

> > *HOWEVER*, even if one may infer that implicit assertion based on
> > the domain of the predicate, that doesn't necessarily mean 
> it is correct.
> > The authority/owner of the subject may in fact not agree with such
> > an assertion and  there may very well be authoritative
> > information about the subject which conflicts with the assertions
> > inferred from its use with a given predicate.
> > 
> > No *use* of the subject with a given predicate counts as 
> any authoritative
> > assertion about the nature of that subject.
> > 
> > If http://www.w3.org/Consortium in fact denotes a web page, 
> no amount
> > of usage with the property http://foo/directory is going to 
> change that
> > and make it denote an organization -- insofar as the authoritative
> > definition of the resource is concerned.
> 
> Now, this seems complicated to me... oh well ;-)

Well, it's like if one has some URI ex:Bob that denotes some
guy named 'Bob'

   ex:Bob ex:name "Bob" .
   ex:Bob rdf:type ex:Man .

and given a property defined as follows

   ex:dressSize rdfs:range xsd:integer .
   ex:dressSize rdfs:domain ex:Woman .

you say

   ex:Bob ex:dressSize "10"^^xsd:integer .

whereby you can infer 

   ex:Bob rdf:type ex:Woman .

Clearly, someone is wrong somewhere, since Bob can't be both a
man and a woman.

It all boils down to which assertions are authoritative.

If the resource denoted by ex:Bob actually is a man, then it is
a semantic error to use it as the subject of a statement having
the predicate ex:dressSize.

The use of a given predicate is thus dependent on the *actual*
nature of the subject and object resources, and any implicit
assertions made about the subject or object based on any
range or domain of the property are simply a means of 
testing agreement with existing knowledge about those
resources -- not as a primary means of creating authoritative
knowledge about those resources.

> >
> > > Huh...does that mean that 'proper' use or RDF does not 
> allow me to use
> > > addresses of existing web pages to refer to abstract
> > > concepts?
> > 
> > If those URIs denote web pages, then no, you certainly should not
> > use those URIs to denote any other resource, abstract or otherwise.
> > 
> > > That seems
> > > like a severe limitation to me? Of what use is an identifier
> > > if I cannot
> > > use for example HTTP GET to 'see/read' what it means?
> > 
> > If a URI denotes an abstract concept, you may be able to GET a
> > representation of that resource. Why not.
> 
> This is a thing I just don't get about RDF. 

This has nothing to do with RDF specifically. This is the way the
REST (Web) architecture works. If XTM is to operate on the Web, then
it must also do so in a way that is compatable with the web architecture,
and that includes the relationship between URIs, resources, and 
representations.

> I find it VERY 
> strange that
> a document can be a representation of a dog. 

It's a slippery slope, and there is no clearly drawn line. There
are many who would agree with you. There are many who wouldn't.

At present, a representation can be anything. And there is no clear
definition of how a representation must relate to the resource itself.
All that is stated is that, given a URI denoting some resource, an
HTTP GET can return a representation of that resource.

I simply consider a representation as some form of content which in
some way reflects the nature of the resource in some useful manner.

Some representations will be able to reflect resources much more
precisely -- and for digital resources, representations may even
be bit-equal copies.

For abstract or otherwise non-digital resources, representations
(which *will* themselves be digital resources) will reflect the
nature of the resource less precisely.

And not all resources will have representations available.

So if I have a URI that denotes a dog, I may HTTP GET a representation
of that dog which is in fact a digital image or perhaps a video stream,
or maybe an encoding of its DNA. Whatever. I may have dozens of different
representations I can choose from.

And *each* of those representations is a resource in its own right, which
may (IMO should) be denoted by a URI that is distinct from that denoting
the resource of which it is a representation -- and servers returning
a representation may (IMO should) specify the identity of the representation
in the response.

Still, you *never* can GET a resource itself, even a digital resource.
You always get a representation. And that representation (unless a bit-equal
copy) is then a distinct resource.

> But I guess that is just
> something to accept as part of the (re-)definition of resource if I
> want to use RDF.

Again. This has nothing to do with RDF. This is the web architecture.

As far as RDF is concerned, one need never dereference any URI and never
get any representation. URIs denote resources and one may use RDF to 
make statements about resources. Representations are entirely outside
the scope of RDF proper.

However, where RDF and the web architecture agree is on the fundamental
principle that URIs should have globally, consistent, unambiguous, and
immutable meaning.

A URI always denotes the same thing, no matter where you encounter it, and
no matter how many representations might be associated with it, etc.

URIs are the global constants, the atomic elements, of the web and semantic
web.

> Furthermore as it prohibits an author to use 
> "http://www.w3.org/Consortium/"
> as an identifier for the W3C (since it is a Web page).

Is it? How do you know? Because you did an HTTP GET and got back
an HTML instance?

That does not prove that it denotes a web page. The representation
you got from the server may very well be a web page. Without
authoritative knowledge about the resource itself, you can't know
for sure.

And getting such authoritative knowledge about resources from the
web authority based on the URI denoting the resource is what I've been
working on for some time and am in the final stages of completing some
open source software for accomplishing in a global, scalable fashion.

(and, yes, RDF is at the heart of it ;-)

> > > No, in TM land, a URI allways is the address of 'the web 
> page', a URI
> > > *never* addresses an abstract concept.
> > 
> > Well, that wasn't my understanding. But if that's true, then TMs and
> > RDF are even farther apart than I thought.
> 
> Yes. 
> > 
> > > Then in TMs URIs can be used as subject indicators, refering to
> > > arbitrary subjects.
> > 
> > And how do you then make statements about the 'web page' versus
> > the subject? If you are using the same URI?
> > 
> > > A key concept is that when the URI of a
> > > subject indicator
> > > is dereferenced and the retrieved information resource is
> > > rendered for human
> > > perception it should be clear what subject the URI indicates.
> > 
> > But how do you differentiate between dereferencing the URI as
> > a subject indicator versus dereferencing the URI as a web page,
> > and is there any logical relationship between the web page
> > denoted by the URI versus the subject indicator denoted by the
> > same URI?
> 
> > Having this ambiguity seems to make the core machinery alot more
> > complicated.
> 
> Here is how we "see the world":
> 
> There are subjects (anything you want to talk about). 

OK, so TM subject = RDF resource

> Subjects are
> represented as topics (the topics are the nodes of the graph that is
> 'produced' from a topic map). Topics have properties that say what
> the subject of the topic is.
>
> Topic Maps are not tied to the Web or URIs conceptually, but it is
> the most known application of them at the moment. So, when 
> applying topic
> maps to the Web world, there are two properties that handle the use
> of URIs: SubjectIndicators and SubjectAddress.
> 
> The value (if any) of the SubjectAddress property is a URI 
> and if a given
> topic exhibits a value for this property, then the topic is a 
> surrogate
> for the subject that is the resource (in the sense of Web page, never
> abstract concept). 
> 
> The value (if any) of the SubjectIndicators property is a 
> list of URIs,
> and each Web resource (again: in the sense of Web page) addressed by 
> the URIs is called a subject indicator (or "subject 
> indicating resource")
> for the subject that the topic represents. 

topic  == subject    ?
topic  -> subject    ?
topic  -> subject+   ?
topic+ -> subject    ?

> So, the core machinery is actually as simple as "nodes with properties
> the 'say' what the node represents".
> 
> This is not the whole story of course, but I hope you get the idea.

Well, I found the original TM spec pretty simple and straightforward,
but XTM has left me continually confused, and I've read through it
numerous times. 

How about a simple example. Here are two URIs, the first denotes
the person John Doe and the second denotes an image of the
person John Doe.

   ex:John     rdf:type ex:Person .
   ex:John.jpg rdf:type ex:Image .

Then, I set my web server up so that if one does an HTTP GET on
ex:John, it returns a copy of ex:John.jpg as a representation
of ex:John. If one does an HTTP GET on ex:John.jpg, it returns 
a copy of ex:John.jpg as a (bit-equal) representation of
ex:John.jpg.

Now, I can differentiate between the person John Doe and the image
of the person in various statements I make:

   ex:John ex:firstName "John" .
   ex:John ex:lastName "Doe" .
   ex:John dc:created "1966-03-31"^^xsd:date .
   ex:John ex:hasRepresentation ex:John.jpg .

   ex:John.jpg dc:title "Image of John Doe" .
   ex:John.jpg dc:format "image/jpg" .
   ex:John.jpg dc:created "2003-03-10"^^xsd:date .
   ex:John.jpg ex:representationOf ex:John .

The fact that ex:John is not a "web page" in no way prevents me from
getting a representation of the resource (person). The URI ex:John
does not denote both a person and a web page. It only denotes a person.
The fact that HTTP GET on ex:John returns a web document in no way
changes the denotation of ex:John from being the person.

Thus, there is no restriction against a URI denoting a non-web-accessible
resource. And it seems to me that XTM presumes that such a restriction
exists, and that is the motivation for having the subject/address dichotomy
and introducing ambiguity into the denotation of URIs.

And I say XTM, not TM, because this dichotomy is an XTM invention not
present in the original TM model. XTM first did the right thing by
adopting URIs, and then broke everything by not preserving globally
consistent, unambiguous, and immutable denotation.

Thus, there is no *need* to make any distinction between the resource
denoted by a URI and some "subject" which that resource ambiguously
also denotes. If you want to talk about a subject, give it a URI and
just talk about the subject. Simple.

Patrick
Received on Tuesday, 1 April 2003 06:56:23 UTC