- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Thu, 19 Feb 2015 13:42:05 -0500
- To: Dan Brickley <danbri@google.com>, Gregg Kellogg <gregg@greggkellogg.net>
- CC: W3C Web Schemas Task Force <public-vocabs@w3.org>
- Message-ID: <54E62E7D.2010300@openlinksw.com>
On 2/19/15 9:39 AM, Dan Brickley wrote: > On 17 February 2015 at 23:30, Gregg Kellogg<gregg@greggkellogg.net> wrote: > >> >This isn't a JSON-LD heuristic, but a general web server mechanism to handle >> >ill-formed URLs. As it happens, although it's common practice, >> >http://schema.org is not a valid URL, as it doesn't have a path component. > Which URL spec are we going by here? Lots of URLs lack path components. > > e.g. 3.3 ofhttps://tools.ietf.org/html/rfc3986 says > "If a URI contains an authority component, then the path component > must either be empty or begin with a slash ("/") character." > > Or HTTP URIs per RFC-7230 "Hypertext Transfer Protocol (HTTP/1.1): > Message Syntax and Routing" which largely defers to that work: > > http://tools.ietf.org/html/rfc7230#section-2.7 > > """ > 2.7. Uniform Resource Identifiers > > Uniform Resource Identifiers (URIs) [RFC3986] are used throughout > HTTP as the means for identifying resources (Section 2 of [RFC7231]). > URI references are used to target requests, indicate redirects, and > define relationships. > > The definitions of "URI-reference", "absolute-URI", "relative-part", > "scheme", "authority", "port", "host", "path-abempty", "segment", > "query", and "fragment" are adopted from the URI generic syntax. An > "absolute-path" rule is defined for protocol elements that can > contain a non-empty path component. (This rule differs slightly from > the path-abempty rule of RFC 3986, which allows for an empty path to > be used in references, and path-absolute rule, which does not allow > paths that begin with "//".) A "partial-URI" rule is defined for > protocol elements that can contain a relative URI but not a fragment > component. > > URI-reference = <URI-reference, see [RFC3986], Section 4.1> > absolute-URI = <absolute-URI, see [RFC3986], Section 4.3> > relative-part = <relative-part, see [RFC3986], Section 4.2> > scheme = <scheme, see [RFC3986], Section 3.1> > authority = <authority, see [RFC3986], Section 3.2> > uri-host = <host, see [RFC3986], Section 3.2.2> > port = <port, see [RFC3986], Section 3.2.3> > path-abempty = <path-abempty, see [RFC3986], Section 3.3> > segment = <segment, see [RFC3986], Section 3.3> > query = <query, see [RFC3986], Section 3.4> > fragment = <fragment, see [RFC3986], Section 3.5> > > absolute-path = 1*( "/" segment ) > partial-URI = relative-part [ "?" query ] > > Each protocol element in HTTP that allows a URI reference will > indicate in its ABNF production whether the element allows any form > of reference (URI-reference), only a URI in absolute form > (absolute-URI), only the path and optional query components, or some > combination of the above. Unless otherwise indicated, URI references > are parsed relative to the effective request URI (Section 5.5). > """ > > As far as I can see the absolute-path construction is only used in > non-URL settings i.e. protocol headers. > > My reading is that in JSON-LD 'http://schema.org' serves to identify > an URL from which a context can be acquired. We have wired up the > relevant server-side voodoo such that this works e.g. via: curl -H > "Accept: application/ld+json"http://schema.org > > ... where is it written thathttp://schema.org is a bad http URL? > (genuine question not rhetorical:) > > An equal counter question: where is it written that such an url would > be dereferenced by requesting '/' ? Or is this just a convention? > > Dan Dan, In regards to my issue of concern, I am going to use the statements below to demonstrate my point: { <#this> a schema:WebPage; rdfs:label "Name Ambiguity & Referent Description Determination Test" ; schema:about <http://schema.org>, <http://schema.org> ; schema:url <> . } # Describing an entity/thing identified by the HTTP URI: http://schema.org { <http://schema.org> a owl:Thing ; rdfs:label "Schema.org" ; schema:sameAs <http://schema.org> ; schema:url <http://schema.org> ; } ## Versus # Describing an entity/thing identified by the HTTP URI: http://schema.org/ { <http://schema.org/> a owl:Thing ; rdfs:label "Schema.org/" ; schema:sameAs <http://schema.org/> ; schema:url <http://schema.org/#this> . } The result of embedding the statements above in a G+ Post, so that anyone can simply view the results: [1] http://linkeddata.uriburner.com/c/8OM32D -- About Schema.org/ [2] http://linkeddata.uriburner.com/c/9C7L4KYY -- About Schema.org . Issues: When you have existing data in a data space, the issues of unambiguous naming becomes extremely important. This (again) has nothing to do with de-reference and lookups. It has everything to do with documents comprised of relations that collectively describe things, using a variety of notations (JSON-LD, TURTLE, or whatever) . I haven't even enabled inference and reasoning options in these pages which would simply magnify the problem, exponentially. This isn't about publishers versus consumers. It's about publishers, consumers, and curators of entity descriptions (using their preferred notations) to encode and decode information, via the medium provided by the World Wide Web (Web). As I've stated in my posts about profiles [1], reviews [2], and generic descriptions [3] publishing (for everyone), the technology behind search engines MUST already handle name disambiguation, thus why should content creators be encouraged to produce incomprehensible content where cleansing and indexing is charged back to them as some kind of value added service, unknowingly. By "unknowingly" I mean this sequential flow: 1. User are encouraged to curate poor entity descriptions 2. They can find anything or build better descriptions from what exists 3. They are then left to the results pages of search engines for some variant of #2 4. Step 3 produced HTML documents where disambiguated entity names are out of scope to humans and or machines (e.g. user agents). How do we fix this problem? By providing instructions that avert the mess i.e., simply helping users (consumers, publishers, curators) understand how to name things that exist, unambiguously, for effective use on the Web or any other HTTP network [4][5]. Users are impatient (rightly so), they love convenience (rightly so), but none of that (in my eyes) amounts to them being incapable of figuring out how to name things, once they understand the what, why, and how of important topics such as unambiguous entity names and their impact on entity descriptions published to the Web. Links: [1] http://kidehen.blogspot.com/2015/01/social-networking-profiles-for-everyone.html -- Profile Publishing [2] http://kidehen.blogspot.com/2015/01/review-publishing-for-everyone.html -- Review Publishing [3] http://kidehen.blogspot.com/2014/07/nanotation.html -- Nanotation [4] http://www.w3.org/2005/Talks/1110-iswc-tbl/#(7) -- Fragment Identifiers & Global Identifiers [5] https://www.pinterest.com/pin/389561436491723060/ -- Naming things that exist, for use on the Web . -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
Attachments
- application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Thursday, 19 February 2015 18:42:27 UTC