W3C home > Mailing lists > Public > public-vocabs@w3.org > February 2015

Re: JSON-LD onsite examples: are @context values missing a trailing slash?

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Thu, 19 Feb 2015 13:42:05 -0500
Message-ID: <54E62E7D.2010300@openlinksw.com>
To: Dan Brickley <danbri@google.com>, Gregg Kellogg <gregg@greggkellogg.net>
CC: W3C Web Schemas Task Force <public-vocabs@w3.org>
On 2/19/15 9:39 AM, Dan Brickley wrote:
> On 17 February 2015 at 23:30, Gregg Kellogg<gregg@greggkellogg.net>  wrote:
>
>> >This isn't a JSON-LD heuristic, but a general web server mechanism to handle
>> >ill-formed URLs. As it happens, although it's common practice,
>> >http://schema.org  is not a valid URL, as it doesn't have a path component.
> Which URL spec are we going by here? Lots of URLs lack path components.
>
> e.g. 3.3 ofhttps://tools.ietf.org/html/rfc3986  says
> "If a URI contains an authority component, then the path component
>     must either be empty or begin with a slash ("/") character."
>
> Or HTTP URIs per RFC-7230 "Hypertext Transfer Protocol (HTTP/1.1):
> Message Syntax and Routing" which largely defers to that work:
>
> http://tools.ietf.org/html/rfc7230#section-2.7
>
> """
> 2.7.  Uniform Resource Identifiers
>
>     Uniform Resource Identifiers (URIs) [RFC3986] are used throughout
>     HTTP as the means for identifying resources (Section 2 of [RFC7231]).
>     URI references are used to target requests, indicate redirects, and
>     define relationships.
>
>     The definitions of "URI-reference", "absolute-URI", "relative-part",
>     "scheme", "authority", "port", "host", "path-abempty", "segment",
>     "query", and "fragment" are adopted from the URI generic syntax.  An
>     "absolute-path" rule is defined for protocol elements that can
>     contain a non-empty path component.  (This rule differs slightly from
>     the path-abempty rule of RFC 3986, which allows for an empty path to
>     be used in references, and path-absolute rule, which does not allow
>     paths that begin with "//".)  A "partial-URI" rule is defined for
>     protocol elements that can contain a relative URI but not a fragment
>     component.
>
>       URI-reference = <URI-reference, see [RFC3986], Section 4.1>
>       absolute-URI  = <absolute-URI, see [RFC3986], Section 4.3>
>       relative-part = <relative-part, see [RFC3986], Section 4.2>
>       scheme        = <scheme, see [RFC3986], Section 3.1>
>       authority     = <authority, see [RFC3986], Section 3.2>
>       uri-host      = <host, see [RFC3986], Section 3.2.2>
>       port          = <port, see [RFC3986], Section 3.2.3>
>       path-abempty  = <path-abempty, see [RFC3986], Section 3.3>
>       segment       = <segment, see [RFC3986], Section 3.3>
>       query         = <query, see [RFC3986], Section 3.4>
>       fragment      = <fragment, see [RFC3986], Section 3.5>
>
>       absolute-path = 1*( "/" segment )
>       partial-URI   = relative-part [ "?" query ]
>
>     Each protocol element in HTTP that allows a URI reference will
>     indicate in its ABNF production whether the element allows any form
>     of reference (URI-reference), only a URI in absolute form
>     (absolute-URI), only the path and optional query components, or some
>     combination of the above.  Unless otherwise indicated, URI references
>     are parsed relative to the effective request URI (Section 5.5).
> """
>
> As far as I can see the absolute-path construction is only used in
> non-URL settings i.e. protocol headers.
>
> My reading is that in JSON-LD 'http://schema.org' serves to identify
> an URL from which a context can be acquired. We have wired up the
> relevant server-side voodoo such that this works e.g. via: curl -H
> "Accept: application/ld+json"http://schema.org
>
> ... where is it written thathttp://schema.org  is a bad http URL?
> (genuine question not rhetorical:)
>
> An equal counter question: where is it written that such an url would
> be dereferenced by requesting '/' ? Or is this just a convention?
>
> Dan

Dan,

In regards to my issue of concern, I am going to use the statements 
below to demonstrate my point:

{
  <#this>
   a schema:WebPage;
   rdfs:label "Name Ambiguity & Referent Description Determination Test" ;
   schema:about <http://schema.org>, <http://schema.org> ;
   schema:url <> .

}

# Describing an entity/thing identified by the HTTP URI: http://schema.org

{

<http://schema.org>
a owl:Thing ;
rdfs:label "Schema.org" ;
schema:sameAs <http://schema.org> ;
schema:url <http://schema.org> ;

}

## Versus

# Describing an entity/thing identified by the HTTP URI: http://schema.org/

{
<http://schema.org/>
a owl:Thing ;
rdfs:label "Schema.org/" ;
schema:sameAs <http://schema.org/> ;
schema:url <http://schema.org/#this> .
}


The result of embedding the statements above in a G+ Post, so that 
anyone can simply view the results:

[1] http://linkeddata.uriburner.com/c/8OM32D -- About Schema.org/
[2] http://linkeddata.uriburner.com/c/9C7L4KYY -- About Schema.org .

Issues:

When you have existing data in a data space, the issues of unambiguous 
naming becomes extremely important. This (again) has nothing to do with 
de-reference and lookups. It has everything to do with documents 
comprised of relations that collectively describe things, using a 
variety of notations (JSON-LD, TURTLE, or whatever) .

I haven't even enabled inference and reasoning options in these pages 
which would simply magnify the problem, exponentially.

This isn't about publishers versus consumers. It's about publishers, 
consumers, and curators of entity descriptions (using their preferred 
notations) to encode and decode information, via the medium provided by 
the World Wide Web (Web).

As I've stated in my posts about profiles [1], reviews [2], and generic 
descriptions [3] publishing (for everyone), the technology behind search 
engines MUST already handle name disambiguation, thus why should content 
creators be encouraged to produce incomprehensible content where 
cleansing and indexing is charged back to them as some kind of value 
added service, unknowingly.

By "unknowingly" I mean this sequential flow:

1. User are encouraged to curate poor entity descriptions
2. They can find anything or build better descriptions from what exists
3. They are then left to the results pages of search engines for some 
variant of #2
4. Step 3 produced HTML documents where disambiguated entity names are 
out of scope to humans and or machines (e.g. user agents).

How do we fix this problem?  By providing instructions that avert the 
mess i.e., simply helping users (consumers, publishers, curators) 
understand how to name things that exist, unambiguously, for effective 
use on the Web or any other HTTP network [4][5].

Users are impatient (rightly so), they love convenience (rightly so), 
but none of that (in my eyes) amounts to them being incapable of 
figuring out how to name things, once they understand the what, why, and 
how of important topics such as unambiguous entity names and their 
impact on entity descriptions published to the Web.

Links:

[1] 
http://kidehen.blogspot.com/2015/01/social-networking-profiles-for-everyone.html 
-- Profile Publishing
[2] 
http://kidehen.blogspot.com/2015/01/review-publishing-for-everyone.html 
-- Review Publishing
[3] http://kidehen.blogspot.com/2014/07/nanotation.html -- Nanotation
[4] http://www.w3.org/2005/Talks/1110-iswc-tbl/#(7) -- Fragment 
Identifiers & Global Identifiers
[5] https://www.pinterest.com/pin/389561436491723060/ -- Naming things 
that exist, for use on the Web .

-- 
Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this




Received on Thursday, 19 February 2015 18:42:27 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 19 February 2015 18:42:29 UTC