Re: Namespace persistence etc from Rob Atkinson on 2016-08-25 (public-sdw-wg@w3.org from August 2016)

From: Rob Atkinson <rob@metalinkage.com.au>
Date: Thu, 25 Aug 2016 00:34:57 +0000
To: Dan Brickley <danbri@google.com>, Phil Archer <phila@w3.org>
Cc: SDW WG Public List <public-sdw-wg@w3.org>, Scott Simmons <ssimmons@opengeospatial.org>
Message-ID: <CACfF9Lws-11eLy4hJ1hHiKPZO2HMnoVieZ4KnvZNCFBB-n7Www@mail.gmail.com>
Hi Dan,

Apropos of this versioning problem, I'm looking into the definition of data
dimensions and measures (specifically looking at applying RDF-QB as a
component of the metadata describing "coverage" data ).

Of course there are lots of similarities and differences (different
coordinate reference systems, grid cell shapes, units of measure etc)

My conclusions are:
1) enough complexity if we dont propose a BP or standard we'll never be
able to cope with everyone's best (or no) efforts.
2)  probably needs to be an ontology to support the logic about similarity
and hence how to these can be operated on
3) cant be bundled in a single file resource because many communities will
need to define their own specialisations
4) needs to support simple parameterised use of community defined dimensions

So the conclusion is that a registry of such definitions is more useful
that a static file with concomitant versioning nightmares.

In the case of terms managed in a registry, a single namespace is easy
convention.

What about the ontology that can be derived from the definitions, to
support reasoning?  My hope is that it will be OK to provide an http GET
based API that provides the relevant subset (inheritance hiearchy) for any
given term, and not have to version or define new namespaces for the
artefacts generated.

Is this a general pattern avoiding versioning and proliferation of overlaps
between extensions, when multiple stakeholders require possibly quite
similar extensions/constraints to a simple base schema?

So - throwing the idea out there for a sanity check :-)

Rob Atkinson

On Wed, 24 Aug 2016 at 21:52 Dan Brickley <danbri@google.com> wrote:

> On 24 August 2016 at 12:17, Phil Archer <phila@w3.org> wrote:
> > Hey Dan, pls see inline below.
> >
> >
> > On 24/08/2016 10:14, Dan Brickley wrote:
> >>
> >> (excuse the belatedness of this reply, I thought I had responded but
> >> don't see it in the thread)
> >>
> >> On 13 July 2016 at 06:12, Phil Archer <phila@w3.org> wrote:
> >>>
> >>> @Scott - please chime in with any variance to this from an OGC
> >>> perspective.
> >>>
> >>> Dear all,
> >>>
> >>> I must begin by apologising for not being on the SSN call today/last
> >>> night.
> >>> I could make up some convoluted reason but the truth is that I forgot.
> >>>
> >>> I know one of the topics discussed was the issue around vocabulary term
> >>> persistence so I should set out a few things about that.
> >>>
> >>> The principle is, I think, straightforward: any change made to a
> >>> vocabulary
> >>> shouldn't break existing implementations. Since we don't know who has
> an
> >>> implementation, we can't write to everyone and ask "if we change this
> >>> will
> >>> your thing break?" Therefore we have to be cautious.
> >>
> >>
> >> We discussed this a bit further f2f last time. If you want to be this
> >> strict you will literally only be allowing yourself meaningless
> >> changes to a term's definition. For example, if you change the case,
> >> spelling, indentation, punctuation, phrasing order or other minor
> >> aspects of the rdfs:comment of a type or property, you're not
> >> affecting 1.) for a type, the things that are in it 2.) for a
> >> property, the pairs of things that it relates. As soon as you start
> >> tweaking the text to clarify meaning, you affect 1.) or 2.), and these
> >> can always potentially create breakage. The notion that some changes
> >> are broadening and some are restricting does not affect whether those
> >> changes might break things; all that is needed for potential breakage
> >> is any change from previous conditions. Software and applications can
> >> be very fragile, and embody all kinds of assumptions.
> >>
> >> Consider the example of Course markup, and a CourseInstance type with
> >> a courseMode property. Imagine version one of the definition gave
> >> "face-to-face" as a (text or URL-based) value option for that
> >> property. A later revision might want to clarify whether Skype
> >> sessions (or VR or whatever) counted as face-to-face. Prior to that
> >> clarification applications could have assumed it did, or that it
> >> didn't; there's always the risk of breakage even with modest
> >> improvements. This is not a radical change in meaning, but can make
> >> the difference between something working as intended and not. It is
> >> also not a theoretical example but comes from Google's review of the
> >> draft Courses schema,
> >>
> >>
> https://www.w3.org/community/schema-course-extend/wiki/Mode_of_study_or_delivery
> >
> >
> > I guess it's a question of balance, then. It is only search engines and,
> I
> > think, even amongst those, only Google, that has access to this kind of
> view
> > of the real world. So you're able to look at how terms are actually used
> and
> > make an assessment. The rest of the world works without such access and
> so I
> > tend to err on the side of caution/conservatism. If there is clear
> evidence,
> > wherever it comes from, that a term's definition should be amended to
> match
> > the ground truth then, OK, that seems right to do so. But that evidence
> > needs to be available I think, otherwise, a new term should probably be
> > minted.
>
> Actually it is surprisingly hard to find out how things are used even
> within (a fast moving complex company like) Google, never mind our Web
> search competitors or other consumers of Web data.
>
> My perspective is more than for schemas that aspire to global adoption
> - and I'll count Dublin Core, FOAF and Schema.org in that direction -
> it is very easy to allow the metaphorical "concrete to set around your
> feet" and to be paralyzed into inaction through fear of breaking
> things via schema changes / improvements. And that this can have huge
> cost for adoption. Dublin Core became prematurely conservative about
> change in the late '90s, then even in the much more informal FOAF
> effort we also worried (imho) too early about breaking things if we
> changed the schema.
>
> The insight from doing this at Google is *not* really that we can
> assess precisely how the data is used everywhere. We do have some
> insight into how it is *published* (like webdatacommons but scaled
> up). Usage in sense of data consumption is another matter. The "view
> from Google" in my experience is much more about appreciating how
> schema definition nuances are often of less impact than other
> pragmatic considerations. Many publishers don't read the definitions
> anyway, but work from examples, tutorials and other supporting
> materials that are not within a formal versioning system. They are
> also often tool-guided, for better or for worse. Published data often
> has syntactic, formatting or other errors. For all the sites that are
> worrying carefully whether "face-to-face" includes Skype, there are
> 100s of relevant sites that aren't yet adopting, or whose adoption
> could be improved. It is important to respect early adopters
> (publishers and consumers), but also important to keep a focus also on
> simplicity and usability and future users --- and such a focus  can be
> difficult to reconcile with formally release of a new version for
> every non-trivial change. The choice at schema.org was for data
> consumers to carry more of the burden for handling changes,
> improvements and smoothing out bugs in the data.
> https://en.wikipedia.org/wiki/Robustness_principle remains reasonable
> guidance.
>
>
> > Then we get into how long does something have to be published before it's
> > locked? If I publish a new term today and think better of it tomorrow,
> am I
> > required to keep it as it is in case someone somewhere used my original?
> In
> > 24 hours, no. In a week, almost certainly not. A month? 6? A year?
> There's
> > no right answer to that.
>
>
> Yep - there are no hard and fast rules. At schema.org we changed
> http://schema.org/Language recently after several years of it
> including "computer languages", for another example. It now says,
> "Natural languages such as Spanish, Tamil, Hindi, English, etc. Formal
> language code tags expressed in BCP 47 can be used via the
> alternateName property. The Language type previously also covered
> programming languages such as Scheme and Lisp, which are now best
> represented using ComputerLanguage.". This fairly soft nudge towards a
> new idiom hopefully is reasonably respectful of its previous wider
> definition, and better than introducing /v2/Language into the
> namespace as another fiddly thing for publishers to have to try to
> understand and remember.
>
>
> >>> That's what leads to W3C saying that vocabulary terms may not be
> deleted
> >>> or
> >>> their semantics changed radically. But it only applies at the namespace
> >>> level. If you have a new namespace, you can do what you like since
> >>> nothing
> >>> will break. *However* it's going to be really confusing if some terms
> in
> >>> the
> >>> old and new namespaces are the same but with radically different
> >>> semantics.
> >>> So my interpretation is:
> >>>
> >>> Same namespace:
> >>> ===============
> >>> No deletions.
> >>> No changing or tightening or semantics (i.e. don't add a new domain or
> >>> range
> >>
> >>
> >> FWIW the approach we took at schema.org was to use weaker domain-like
> >> and range-like properties that give us more wiggle-room
> >> (domainIncludes, rangeIncludes). It is a kind of promise that things
> >> might continue evolving.
> >
> >
> > Yes and if you'd done that when you were editing RDF Schema it might have
> > been a good idea, but, well, the RDF WG wrote it as it is.
>
> The RDFS design was pretty much complete in 1998 :) Let's not get into
> the horrors of versioning schema languages...
>
> We aren't obliged to use all the formal machinery from the schema
> languages. It is tempting to use this stuff just because it is there
> and it seems neat and tidy to write these things down for machines.
> But sometimes mechanical simplicity is outweighed by other
> considerations. OWL and RDFS have no idea about the passage of time.
> OWL can't distinguish property with "at most one value (at any point
> in time)" from "at most one value (ever)". Depending on your
> preferences this can either be an argument for standardizing ever more
> powerful ontology/schema formalisms, or an argument that human-facing
> definitions deserve as much attention as the axioms.
>
>
> >> You mention tightening and (later) weakening. What about clarifying?
> >> Realizing that definitions were not as tight as originally hoped is a
> >> hugely important class of schema edit.
> >>
> >>> - make a sub class|property and put the new restrictions on that)
> >>> Deprecation is OK.
> >>> Loosening semantics is OK (so you *can* remove a domain or range
> >>> restriction
> >>
> >>
> >> This can also cause breakage, if downstream clients expect the data to
> >> already embody those restrictions. There are also restrictions that
> >> are not embodied in domain/range but are carried in the textual
> >> definitions.
> >
> > OK, so I'm tending towards conservative.
>
> Even if the cost is dozens of extra namespace URIs in play? And a
> built-in bias towards fragmentation and stagnation, since larger
> vocabularies will suffer from version proliferation more than tiny
> ones. If you have a vocabulary with > 100 terms, will you really want
> to both releasing a new version of the whole thing just to improve a
> single term's definition? It will be tempting to leave the formal
> definition in a stale state and simply update "best practice",
> tutorials, examples and tools with hints instead. At which point it
> may be cleaner and fairer to say "sorry, we changed our mind slightly"
> and update the core specification too (while leaving a record of the
> changes).
>
>
> >>> since it is extremely unlikely that doing so will break anyone's
> existing
> >>> implementation).
> >>> Adding new terms is fine.
> >>> Clarifying existing definitions is OK.
> >>> Adding new translations of labels is expressly encouraged.
> >>
> >> +1
> >>>
> >>>
> >>> Different namespace
> >>> ===================
> >>> We can be a little more relaxed here. Recall that documents on w3.org
> are
> >>> persistent so the original documentation will always be there (at the
> >>> original URI or redirected from it).
> >>>
> >>> No need to replicate the whole of the old vocabulary, so no need to
> >>> include
> >>> deprecated terms - they are deprecated by not being included in the new
> >>> namespace.
> >>>
> >>> Assuming the vocabulary has the same name then terms that appear in
> both
> >>> old
> >>> and new should broadly be the same although semantics can change a
> >>> little.
> >>> It's a matter of judgement.
> >>>
> >>> The case I keep in mind is Dublin Core/DC Terms. dc:creator took either
> >>> text
> >>> or a URI as a value - which was confusing. dcterms:creator should take
> a
> >>> URI.
> >>
> >>
> >> Minor nitpic, DC doesn't say quite that. See
> >>
> >>
> http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=terms#terms-creator
> >> It says that the value of a dcterms:creator property will be a
> >> dcterms:Agent. For which see
> >> http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=terms#Agent
> >> "A resource that acts or has the power to act.", "
> >> Examples of Agent include person, organization, and software agent.".
> >>
> >> So (in json-ld) you could have something like,
> >>
> >>  {
> >>    "...": "......",
> >>    "dcterms:creator":
> >>     {
> >>       "@type": "dcterms:Agent",
> >>       "foo": "bar", ....
> >>     }
> >>   }
> >>
> >> There are those in the Linked Data community who take the view that
> >> every time you mention an entity you should give a URI for it, but
> >> that viewpoint is not currently baked into DC Terms. All that DC Terms
> >> says is that a creator is something that can act, which is pretty
> >> broad. But it does as you point out discourage us from using names of
> >> those things as values for the property.
> >
> >
> > Understood. But please bear in mind that not everyone has several
> hollowed
> > out mountains full of servers to interpret fuzziness.
>
> I wasn't saying it was good or bad to use bnodes, only that
> dcterms:creator is agnostic on this point. Requiring publishers to
> know a Linked Data URI for every entity also comes with some cost...
>
> cheers,
>
> Dan
>
>
Received on Thursday, 25 August 2016 00:35:52 UTC