Re: bbc-programmes.dyndns.org from Richard Cyganiak on 2008-06-22 (public-lod@w3.org from June 2008)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Sun, 22 Jun 2008 11:31:54 +0100
To: "Peter Ansell" <ansell.peter@gmail.com>
Cc: "Alan Ruttenberg" <alanruttenberg@gmail.com>, "Nicholas Humfrey" <Nicholas.Humfrey@bbc.co.uk>, public-lod@w3.org
Message-Id: <C538456B-CA31-431B-8528-4A0129322DB5@cyganiak.de>
Peter,

On 22 Jun 2008, at 07:40, Peter Ansell wrote:
>> However, if you are saying that RDF and OWL talk about more than web
>> structures, you are absolutely right. That means the domain of the  
>> Semantic
>> Web is a set that *subsumes*  web pages, not a set *disjoint* from  
>> them.
>
> I think disjoint is more accurate, and that is how I approach the
> Semantic Web so far, and have had no issues in using non-semantic
> URI's with xsd:anyURI as a typed literal to make the disjointedness
> clear.

Have you heard of RDFa [1]?

Is the URI of a web page with embedded RDFa a “semantic URI” or a “non- 
semantic URI”?

The distinction, as made by you, is pointless. RDF is just a data  
model. It can be used to describe things. The things are called  
“resources”. Of course, as a data model, RDF is designed to be  
generic. You can use it to describe whatever you like.

If I understand you correctly, you want a data model that is  
artificially restrained to prevent it from being used to talk about  
web pages. That doesn't make sense. Web pages are just another kind of  
thing.

Anyway, the entire discussion is pointless. RDF has treated web pages  
as just another kind of resource since its earliest days, there are  
millions of existing RDF documents following this approach, and Peter  
your ranting against it is not gonna change that. Please let's focus  
on getting things done.

Best,
Richard

[1] http://www.w3.org/TR/xhtml-rdfa-primer/


>>>>> The semantic web doesn't gain anything from the result of that  
>>>>> page,
>>>>> which
>>>>> clearly has an
>>>>> alternative semantic representation available that you are already
>>>>> looking at when you see the foaf:page (or whatever predicate  
>>>>> allows
>>>>> literals) statement.
>>>>
>>>> It isn't about the result of what you fetch so much as it is  
>>>> speaking
>>>> clearly, as I said earlier. The domain of foaf:page is a document.
>>>> Neither a
>>>> string nor an xsd:anyURI is a document. End of story.
>>>
>>> It is clear to me what the string means. And saying it is a
>>> foaf:Document doesn't help with that at all. foaf:Page having a  
>>> domain
>>> of rdf:Resource doesn't have any more practical benefit than if it
>>> didn't say what its domain was.
>>
>> To you perhaps. To others it does. For one thing it can be used to  
>> do some
>> basic checking for nonsense statements. (such as the one you were  
>> about to
>> make ;-)
>
> Nonsense as in getting HTML after resolving an identifier where every
> other identifier previously revealed RDF representations when
> resolved.  At least they could expect based on the vocabulary that a
> typed literal needed to be dealt with differently on a
> non-semantically extended HTML page. The set of RDF statements that
> would otherwise be generated from the HTML page would be an empty set,
> ie, completely unuseful, whereas if they were aware that HTML elements
> on their own had value to them they wouldn't bother parsing for RDF
> and could find the HTML page.
>
>>>>> If you accept that the ontology you are using puts xsd:anyURI  
>>>>> typed
>>>>> literals into a given field it is perfectly meaningful to use the
>>>>> string as you do any other URI string,
>>>>
>>>> If you use another ontology than foaf, with a different relation  
>>>> whose
>>>> domain is an xsd:anyURI, and that relation is documented in such  
>>>> a way as
>>>> to
>>>> make sense, then sure. I don't happen to see what is gained by  
>>>> doing
>>>> that.
>>>
>>> The ability to have a string as you say which won't be presumed to  
>>> be
>>> a semantic resource identifier on its own which people can look at  
>>> and
>>> resolve themselves.
>>
>> And?
>> -
>> What is a "semantic resource identifier"?
>
> Semantic resource identifier (n.) : Anything that can be used in the
> subject position of an RDF Statement.
>
>> I'm still failing to see harm in <http://....>. One can examine an  
>> RDF
>> representation, read that, and resolve that manually as well.
>
> A computer must presume that for RDF identifiers RDF is the preferred
> format. Wouldn't it be easier just to acknowledge that HTML exists and
> identify it using typed literals consistently so people recognise the
> difference.
>
>>>>> just in a context which won't be interfered with, or interfere  
>>>>> itself
>>>>> with, the logic based semantic
>>>>> web rules.
>>>>
>>>> I don't know what you mean by "interfered with" or what  
>>>> connection you
>>>> are
>>>> making between this particular choice and logic based semantic
>>>> web rules. It seems to me that the main benefit of using  
>>>> foaf:page here
>>>> is
>>>> that a lot of people know what it is supposed to mean.
>>>
>>> Do they really gain the benefit specifically from its use as an
>>> rdf:Resource though?
>>
>> The instances of rdf:Resource are defined to be *everything*. I'm  
>> not sure
>> what you mean by "benefit specifically from its use as an  
>> rdf:Resource", but
>> I don't need to because by definition everything is a rdf:Resource.
>>
>> It's like saying: Do I gain specifically from being composed of  
>> matter? That
>> I am is a matter of fact. The question might be of metaphysical  
>> interest,
>> but not practical interest.
>
> It is of practical interest if people ever acknowledge that
> non-semantic resources do and will always exist outside of the RDF
> universe and should be as easily accessible as any other resource in
> order for people to mix semantic's with non-semantic representations.
>
>>> Or do they really do a non-semantic retrieval of
>>> the resource? Should they only expect to be able to retrieve machine
>>> readable representations if they resolve this resource?
>>
>> Who are they?
>
> Consumers (n.) : Those people who will eventually penetrate the
> academic jungle that is the semantic web community and utilise
> resources without days of special instruction from the "experts".
>
>> What is a machine readable representation?
>
> One of the many (confusingly) diverse RDF representations.
>
>>> How do you actually say that a specific rdf resource doesn't  
>>> actually
>>> direct to
>>> an rdf representation as an idenfifier itself.
>>
>> I'm having trouble parsing this sentence. Could you rephrase it?
>
> See below with respect to knowing to not ask for RDF if it isn't an
> identifier but still a typed anyURI literal.
>
>>>>>> The web page is
>>>>>>
>>>>>> <http://www.bbc.co.uk/programmes/b00b07kw.html> (the thing that  
>>>>>> the URI
>>>>>> denotes)
>>>>>
>>>>> It isn't an RDF Resource any more than my street and suburb  
>>>>> address
>>>>> though, it is a simple human based locator which doesn't really  
>>>>> have a
>>>>> need or want to be an RDF Resource IMO.
>>>>
>>>> In both the case of the house, and the case of the web page,  
>>>> there is the
>>>> resource - the house and the web page - and there is the address  
>>>> of the
>>>> house and of the web page (also resources, but different ones). In
>>>> discussion, one says different things about the address and the  
>>>> thing.
>>>> For
>>>> instance,
>>>>
>>>> "http://www.bbc.co.uk/programmes/b00b07kw.html" has 45 characters.
>>>> or <http://www.bbc.co.uk/programmes/b00b07kw.html> uses the  
>>>> stylesheet
>>>> <http://www.bbc.co.uk/programmes/r/23870/stylesheets/decor.css>
>>>> or "http://www.bbc.co.uk/programmes/b00b07kw.html" is a name for
>>>> <http://www.bbc.co.uk/programmes/b00b07kw.html>
>>>
>>> I don't see why your convention of not dealing with URI's as strings
>>> themselves really helps.
>>
>> You keep thinking that I am arguing that some convention is useful.  
>> The only
>> thing I am arguing is useful speaking clearly. There is a  
>> difference between
>> the string and the thing it names (when the string actually names
>> something). If you use the string for both cases one can't tell, in  
>> general,
>> which it is that is your subject of discourse. Nor can you infer  
>> that it
>> even is to be used as a name. Ambiguous statements work (to a certain
>> extent) with people. They work to a lesser extent with machines,   
>> at least
>> for the moment.
>
> A typed literal is not a string IMO even if it is represented as a set
> of unicode characters. I find no issues with getting a machine to
> understand <semanticURI> foaf:page "http://..blah.html"^^xsd:anyURI
> and being able to for instance compile these html representations and
> perform textual searches on the results. If it was only ever a
> semantic URI the machine would natively search for RDF instead of
> freetext and hence would be confused when the result was delivered as
> an HTML page with  no semantic extensions. Far easier to just say it
> is a URI and by convention you would know this string was a resolvable
> URL that they could parse in a different way. They would know not to
> expect and therefore not ask for, RDF representations at this stage
> also, which is better than you could do for the other case where you
> really should assume that RDF Resource identifiers represent RDF pages
> and hence you should ask for RDF if you ever want to resolve them.
>
> It simplifies a lot to have an explicit difference at the ontology and
> instance level.
>
>>>> "32 vassar avenue, cambridge, ma, usa" has 36 characters or
>>>> <the MIT Stata Center> foaf:depiction
>>>> <http://en.wikipedia.org/wiki/Image:Wfm_stata_center.jpg>
>>>> or "32 vassar avenue, cambridge, ma, usa"  entered into google  
>>>> maps, will
>>>> locate <the MIT Stata Center>
>>>
>>> And I am trying to say your last statement exactly. When entered  
>>> into
>>> a web browser the .html version will produce something they can look
>>> at... Why is it different for addresses?
>>
>> It's not. There are great many things one can say. foaf:page  
>> doesn't say
>> this. Invent a relation that means what you want it to, document it  
>> well,
>> and use it.  David Booth calls this relation hasURI
>> (http://esw.w3.org/topic/AwwswDboothsRules)
>
> That document blurs a lot of rules that RDF/XML has in comparison to
> N3, and hence it is not really useful for a fully compatible RDF
> system, but the ideas in relation to hasURI are useful. What is
> stopping the author of this scheme using it explicitly? It doesn't
> have to be assumed, particularly if you have no other use for the
> resource identifier yourself and don't see anyone else needing to
> extend your description of the HTML page as anything else.
>
>>>>> It is a coincidence IMO that it is defined in the same way that  
>>>>> RDF
>>>>> Resources are, and it isn't
>>>>> useful to mix everything up by presuming that URL's of web pages  
>>>>> are
>>>>> useful as RDF Resources any more than arbitrary string literals.
>>>>
>>>> First, in the RDF world, everything is an rdf:resource, including
>>>> rdf:Literals. So they are "mixed up" already. While there were  
>>>> perhaps
>>>> mistakes made in RDF, that web pages are considered resources is  
>>>> most
>>>> certainly not one of them. Finally, I'll point out once again  
>>>> that the
>>>> issue
>>>> here isn't what is or is not a "good" resource. The issue is  
>>>> speaking
>>>> clearly. If you want to talk about the literal, by all means do  
>>>> so, and
>>>> if
>>>> you want to talk about the web page, likewise. But don't confused  
>>>> one
>>>> with
>>>> the other.
>>>
>>> I have never quite understood the reason for putting Literals inside
>>> of "Resources" when you can't say anything about Literals as a  
>>> subject
>>> except in reverse as the object of a statement and by common-sense  
>>> you
>>> should be able to state properties of Resources directly rather than
>>> indirectly as RDF provides for the Literal subset.
>>
>> Me either. Perhaps because they just didn't think that people would  
>> want to
>> say that many things about literals. Don't know. I've heard it  
>> mumbled that
>> if RDF goes through another edit, this might get fixed. Mostly it's  
>> not a
>> problem, unless you want to say something where both the subject  
>> and object
>> are literals, since in the other case you can invert the relation.  
>> In the
>> literal p literal case I've seen people use the idiom:
>>
>> _:foo hasLength "45"^^xsd:Integer
>> _:foo owl:sameAs "http://www.bbc.co.uk/programmes/b00b07kw.html"
>>
>> (not that i'd particularly recommend it)
>>
>> or
>>
>> _:foo hasLength "45"^^xsd:Integer
>> _:foo rdf:value "http://www.bbc.co.uk/programmes/b00b07kw.html"
>>
>> (less of evils)
>
> Please don't put that in an introduction on how to reference current
> html pages within the semantic web. I avoid blank nodes like the
> plague, even if the plague is rare in my world, blank nodes are even
> rarer.
>
>>> I personally think its a bad idea to smudge the differences by  
>>> saying
>>> all web pages are semantic resources
>>
>> All web pages are rdf:Resources. What is this "semantic resource"  
>> you speak
>> of?
>
> I disagree, majorly.
>
>>> , as they aren't... Many have no
>>> inherent RDF semantics whatsoever and hence can't be reasonably used
>>> as the subject of statements.
>>
>> Umm...
>> <http://www.bbc.co.uk/programmes/b00b07kw.html> is about an episode  
>> of the
>> television programme Dr. Who.
>>
>> "http://www.bbc.co.uk/programmes/b00b07kw.html" is a string of  
>> length 45.
>
> Not if it was typed as an XML Schema anyURI... In that scenario you
> can't say it is a string, it is simply being represented as a string
> for current purposes.
>
>>> It would be much better if by default they were thought of as  
>>> Literals and
>>> kept as objects of statements in
>>> semantic terms.
>>
>> Well, I can see that you are making this assertion, but I can't  
>> understand
>> the reasoning behind it.
>
> It goes against your (and others) assumption that it is valuable to
> consider all current web URL's as useful references for use as
> identifiers for rdf:Resources. I don't see the usefulness for the vast
> majority of current web pages and think it would be more reasonable to
> make the difference distinct and require them to make up identifiers
> which locate information resources with semantic information. I
> particularly don't like the hit and miss approach of content
> negotiation without redirection in this respect as there is no way to
> say that if you attempt to negotiate with a particular content type
> you will actually get it, as opposed to the .html and .rdf versions
> that are well done in this bbc project.
>
>
> Hopefully I restricted myself to repeating myself 5 or 10 times. Feel
> free to respond to 0 or more of my statements.
>
> :)
>
> Cheers,
>
> Peter
>
Received on Sunday, 22 June 2008 10:32:34 UTC