Re: information resource note #573

On Sun, 2011-05-15 at 10:55 -0400, Jonathan Rees wrote:
> On Fri, May 13, 2011 at 5:29 PM, David Booth <david@dbooth.org> wrote:
> > On Fri, 2011-05-13 at 08:59 -0400, Jonathan Rees wrote:
> >> OK, I've dealt with your comments as best I could.
> >
> > Here are my comments on the latest draft at
> > http://www.w3.org/2001/tag/awwsw/ir/latest/
> >
> >
> > 1. Kudos!  I think this is getting *very* good, and I would definitely
> > endorse it.  I think you've done a very good job of tying in existing
> > AWWW terminology, per my previous comment.  I do have a few more
> > editorial suggestions though.
> >
> >
> > 2. This is a bit confusing:
> > [[
> > It will be useful to have a term to apply in the situation where
> > metadata does not explicitly specify a particular subject, so define a
> > "metadata predicate" to be metadata of this sort.
> > ]]
> > because a predicate *always* specifies a particular subject.  For
> > example, if I write ":s :p :o ." then the subject is the particular
> > subject :s, regardless of what :s is.
> 
> :s :p :o is a statement, not a predicate.  A predicate by definition
> does *not* specify a particular subject.

Well, the original text above talks about metadata that "does not
explicitly specify a particular subject".   That doesn't make sense,
because metadata is written in statements using predicates, and a
statement *always* specifies a particular subject.  Thus, it is not a
matter of specifying a particular subject or not specifying a particular
subject.  Rather, (AFAICT) the whole point is to generalize the
predicate to apply to more *kinds* of subjects.  

Furthermore, the original text says 'so define a "metadata predicate" to
be metadata of this sort', and that doesn't make sense either, because
it doesn't make sense to define a predicate to be metadata.  Rather, the
metadata is written *using* the predicate.  That's why I suggested the
following rewording:

> 
> > I suggest rephrasing this
> > sentence as:
> > [[
> > It will be useful to have a generic predicate for the situation where
> > the same metadata may apply to more than one kind of data, so define a
> > "generic metadata predicate" to be metadata of this sort.
> > ]]

I believe this wording would be clearer.

> >
> > 3. Use "generic metadata predicate" throughout, where you previously
> > used "metadata predicate".
> 
> disagree.  redundant.

The point is that there may be predicates that would apply to only
"specific information entities" and they may be used to express metadata
-- "information about information".  Logically, in English one would
therefore be tempted to refer to such predicates as "metadata
predicates".  But at present, the document is defining the term
"metadata predicate" to be a particular kind of metadata predicate: the
kind that can be applied both to "specific information entities" *and*
to "generic information entities".  This is why I think it would be
clearer use the term "generic metadata predicate" to refer to the kind
that can be applied to both kinds of entities.

Does that make sense?

> 
> If I were being OWL-y I would call them "classes" (which by definition
> are 1-place predicates) but I figured that would confuse more people
> than "predicate".

I think the word "predicate" is fine.

> 
> > 4. s/true of documents have/true of documents that have/
> ok
> >
> > 5.  s/"is an HTML document"/"is-an-HTML-document"/
> > or: s/"is an HTML document"/"is_an_HTML_document"/
> > or: s/"is an HTML document"/":isAnHtmlDocument"/
> > or something similar.
> 
> disagree, would require more explanation. the string in quotes *is* a
> predicate, in English grammar.

Well, I guess different people may read it differently.  But the problem
that I had was that in English, quotes are used for several purposes --
not only to mean "this is a predicate".  Perhaps I am biased by my
programming background, but I prefer (and expect) to see predicates,
functions, operators, relations and the like written as single words.  I
know when I read it I had to back up and re-parse before I figured out
that you were using quotes to join the words into a single predicate.

> 
> > 6. s/true of one and/true of one format but/
> ok
> >
> > 7. Again, for this:
> > [[
> > Put more formally, if M[] is a metadata predicate, then for any G and S,
> >
> >   1. if G generalizes S, and M[G], then M[S].
> >   2. if M[S] for all S such that G generalizes S, then M[G].
> > ]]
> > I still think it would be helpful to the reader to say what G and S are.
> > E.g., is G a galaxy and S a star?  AFAICT they are not currently
> > excluded, but it would be helpful to lead the reader's thinking in the
> > right direction.  Also, it would be helpful to say a little more about
> > the "generalizes" relation and (IMO) it would be stylistically nicer to
> > define M using an "iff".
> 
> The idea is already given in the prose, and people who don't like
> formalism can skip it
> 
> There is nothing more to be said about "generalizes" that I can figure out.

Right, but the problem at present is that the term is being used without
being defined at all.  At present it just looks like we forgot to
include a definition for "generalizes".

> 
> > Therefore, I suggest merging in something like this:
> > [[
> > Given a set of specific information entities, we can hypothesize a
> > generic information entity G that is is said to "generalize" all of the
> > specific information entities in this set.  We can write the relation
> > between G and a member S of this set as "G generalizes S".
> > ]]
> 
> This is already there, see "we take it as axiomatic"

No, the "we take it as axiomatic" text is later in the document.  The
term "generalize" should be defined before (or where) it is used --  not
later.

> 
> There is no need for the double quotes.

The reason for the double quotes is that the term "generalize" is being
introduced/defined.

> 
> > and changing the formal definition to:
> > [[
> > Put more formally, if M[] is a metadata predicate, then for any generic
> > information entity G,
> >
> >  M[G] iff (for all S such that G generalizes S, M[S]).
> > ]]
> 
> ok. I've gone back and forth on this. The problem is that both
> directions of the 'if and only if' are equally important, so I had the
> idea that separating them would give them more force. But it doesn't
> much matter.
> 
> > 8. Regarding:
> > [[
> > We can say that "information resource" (the conventional term in Web
> > architecture) is a near-synonym for "generic information entity" as
> > above, with the possibility understood that in some cases an information
> > resource will have only one specialization.
> > ]]
> > I think we should add: "and with the exception that the class of generic
> > information entities has not been defined as disjoint with any other
> > class."  The reason for this exception is to avoid perpetuating debates
> > about what is or is not a generic information entity, as we had with IR
> > vs. non-IR.  This makes it clear that there is no a priori disjointness.
> 
> This is covered in end note 9. Bringing attention to the question in
> the main part of the text would only encourage those who are obsessed
> with this red herring.

Do you mean the following end note text?
[[
This definition of "information resource" is not the same as the rather
impenetrable one found in [webarch]. Nor is it necessarily the best
match to practice, as it is probably be better for information resources
to have more structure than this: We'd like to allow for two information
resources that generalize the same representations and yet differ in
interesting ways, such as the circumstances (request parameters and
other variables) under which particular representations are authorized.
Elaborating the present theory to account for such distinctions would
not be difficult, but would distract from the current discussion.
]]

That note does not say anything about the fact that the class of generic
information entities has not been defined as disjoint with any other
class.  My point is that it would be good to state this *explicitly*,
because: (a) the AWWW definition of IR indicates disjointness; (b)
clearly the notion of "generic information entity" is very similar to
the AWWW notion of IR; and (c) the text in section 3 says that IR is a
near-synonym for "generic information entity".  

I think it is far better to address questions like this explicitly,
rather than being silent and thus forcing readers to guess and perhaps
guess differently from each other depending on their backgrounds.

> 
> > 9. s/for any nonempty class of representations/for any nonempty set of
> > representations/
> 
> Class and set are very different mathematically. Classes belong to
> logic, which is prior to mathematics, while sets belong to set theory,
> which is just one application of logic.

Okay, I'll take your word for it regarding the history of the terms
"set" and "class".  But according to my understanding of the intent in
this context, AFAICT we are talking about a *set* -- not a class.  Are
we not?  Given an arbitrary *set* of representations (which may have
*nothing* in common except that they are members of this set), we can
hypothesize an IR that generalizes exactly those members.  Correct?

Perhaps I am misunderstanding the term "class", 
http://mathworld.wolfram.com/Class.html 
but to my mind the members of a class must have some property in common,
and I don't see any such requirement in this context:
[[
We take as axiomatic that for any nonempty class of representations
there is an information resource that generalizes those and only those
representations.
]]

Have I misunderstood something?

> 
> > 10. I think it would be good to add a formal definition for onWebAt,
> > like this:
> > [[
> > For any information resource (a/k/a generic information entity) G, and
> > any URI U,
> >
> >  (G onWebAt U) iff (for all S, S isAuthorizedFor U iff
> >  G generalizes S).
> > ]]
> 
> That adds nothing to the prose, which says exactly the same thing in
> almost exactly the same way

The same could be said of all of the other definitions, but the others
were all provided both in prose and formally.  I don't think it makes
sense to suddenly omit the formal definition of this one.  I, for one,
*like* the clarity and conciseness of the formal definitions.

> 
> > 11. Regarding the diagram:
> > a. I think "(for dereference)" can be deleted, as I don't think it adds
> > anything.
> ok
> > b. I suggest changing '("has")' to '("has representation")'.  You area
> > already using the term "representation" (in the AWWW sense) -- and I
> > think that is good -- so I think it will help tie the diagram to the
> > prose.
> 
> If a has representation b, then the relationship between a and b is
> "has", right? It would be redundant to say "has representation b" if
> you already know b is a representation.

Yes, it is a little redundant.  But the problem is that "has" is too
vague without it.  There are a zillion different notions of the term
"has".

> 
> I will just delete it as I don't want to stoke webarch fetishes.

Please don't delete it!  It is *helpful* to have the diagram connect the
AWWW and non-AWWW terminology.

> 
> > c. One of the pages on the diagram indicates its type: "(information
> > resource)".  I suggest changing that to "(information resource a/k/a
> > generic information entity)", and adding similar type labels to the
> > representations if it doesn't look too crowded by doing so.
> 
> The types are really not interesting, all the action is in the relationships.
> I'll just put "generic" as a sort of hint pointing at the discussion

Okay.

> 
> > 12. s/Those who don't care about talking about the Web/Those who don't
> > care about talking about entities on the Web/
> 
> added "in this way"
> 
> > 13. s/in question to what for them is a better use/in question to other
> > uses/
> 
> I want to drive home that this is a turf war. I see the objection,
> have replaced with
> "uses better suited to their applications"

Sounds good.

> 
> > 14. Change:
> > [[
> > In Turtle, this could be a different URI, or a blank node such as
> > [ir:onWebAt "http://example/hen"]:
> > ]]
> > to:
> > [[
> > In Turtle, this could be a blank node such as [ir:onWebAt
> > "http://example/hen"] or a different URI:
> > ]]
> ok
> 
> >
> > 15. Down at the end, I think it would be good to say something
> > explicitly about how one can determine whether an IR is ir:onWebAt a
> > URI.  (This is where the httpRange-14 resolution comes in.)  In
> > particular, the usual practice is to dereference the URI, and see if you
> > get an HTTP 200 response code.  If so, then by the httpRange-14 rule,
> > you conclude that the IR is ir:onWebAt that URI.
> >
> > Pseudo-n3 rule (more like a macro) that expresses this rule:
> >
> >  { "?u" ir:yieldsHttpResponseCode 200 . } =>
> >    { <?u> ir:onWebAt "?u" . }
> >
> > or an actual n3 rule, using log:uri :
> >
> >  @prefix log: <http://www.w3.org/2000/10/swap/log#> .
> >  { ?r log:uri ?u .  ?u ir:yieldsHttpResponseCode 200 . } =>
> >    { ?r ir:onWebAt ?u . }
> >
> > http://www.w3.org/2000/10/swap/doc/Reach.html#More
> 
> I think you have missed the point. There is no way to ascertain - or
> logically conclude in a sound way - that something is on the web at a
> URI, just as there is no way to say that there is no life on Mars. See
> end note 8.

Hold on.  I agree that the httpRange-14 resolution did not explicitly
say *which* IR a URI u identifies if dereferencing u yields a 200
response, and it is fair to point out that flaw.  But I think it is
fairly clear that people use the resolution as though it did not have
that flaw.

Sound or not, a rule such as the above is what people use.  (I.e.,
whether it is sound or not is a separate question.)  I think it is
useful to state it explicitly, because it is this rule that causes the
clashes that we are trying to discuss.

> 
> Anyhow I can't tell what these rules are supposed to mean. In
> particular I have never understood log:uri. 

Since the rule is specifically talking about the binding between a URI
and an IR, it is necessary to step up to the meta level (or syntactic
level) to talk about it.  That's why the pseudo-n3 rule that I gave is
really more like a macro: it is not a well-formed n3 rule.  And the only
reason that the second rule is a well-formed n3 rule is because the
dirty syntactic magic is hidden in the log:uri predicate.

Personally, I think the pseudo-n3 rule would be clearest, perhaps aided
by a little explanation about why it is not an actual n3 rule.  It would
be too messy to try to explain what log:uri means.

> If what you are saying is
> that if an absolute URI is dereferenceable then it names what's on the
> web at that URI - well you have not said anything to justify that, and
> it may actually be false if the URI is being used in some other way.

Agreed, but that is what the rule states, and that's how people use it.
Whether or not it holds is a different question.

> 
> All I'm saying is that people do it, it's common practice, it's
> useful, it's coherent with the *spirit* of the (non-rec) httpRange-14
> rule, and that doing otherwise would create a mess. Nowhere do I say
> that this naming practice is *true* (as if we could even express what
> that meant) or that anything like the above rules are implied. This
> hands-off view of the problem is essential in order to remain in
> dialog with Harry and those he speaks for.

Yes, I agree.  I just think it would be helpful to state it explicitly,
rather than leaving it vague and unstated.  It would also be good to
state explicitly why the rule may not hold.

> 
> > 16. Up front under the document title, after listing yourself
> > (separately) as Editor -- since you did the writing work -- I suggest
> > listing the other active members of AWWSW as "Contributors", since the
> > document is the result of quite a prolonged effort by the group.
> 
> Style sheet doesn't permit. Currently this is in end note 1. The best
> I can do is to follow AWWW's example and create an 'acknowledgments'
> section.

That sounds odd.  Are you sure?  It's been done in many other W3C
publications.  Here is a recent example:
http://www.w3.org/TR/2010/NOTE-WCAG20-TECHS-20101014/


17. BTW, I just noticed this in end note 8:
[[
Nevertheless the wording has led to an unfortunate focus on the
irrelevant question of whether something is an information resource, as
opposed to the consequential question of which resource (of whatever
kind) is named.
]]
That sounds a bit over stated to me.  I don't think the fact that the
httpRange-14 resolution doesn't say *which* IR is identified is the
reason for the "irrelevant question of whether something is an
information resource".  I think both the httpRange-14 resolution and the
AWWW definition of IR are to blame for the "irrelevant question of
whether something is an information resource".

Thanks!



-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.

Received on Monday, 16 May 2011 20:21:47 UTC