- From: Michael Andrews <nextcontent01@gmail.com>
- Date: Fri, 28 Jun 2019 07:52:22 +0530
- To: Guha <guha@google.com>
- Cc: "schema.org Mailing List" <public-schemaorg@w3.org>
- Message-ID: <CAF9ZrJ2LqPHH2V55W2F+VTrbR7cpn67CqZ6FQ6vGdUW8iscUew@mail.gmail.com>
I am a little confused about the scope of this proposal as it is presented. The proposal addresses "aggregate statistical data". The examples specific kinds of statistical data: those that relate to human populations. I am trying to understand how the proposal will accommodate statistical information more broadly, beyond demographic data. The naming of a type as StatisticalPopulation makes sense when talking about people (or possibly other living things) with respect to a location -- the focus of DataCommons.org. There are of course many other kinds of statistical data not focused on human characteristics that in the public domain, relating to manufacturing production, air quality, road haulage, etc. When reporting about non-living phenomenon, the term StatisticalPopulation doesn't sound right. Statistical agencies report factory orders and automobile production. It would be helpful to see examples of how that would be represented. I would also like to see more about alternatives to Observation as a type for indicating the provenience of data. Data could be self-reported, randomly sampled, or increasingly autogenerated. On Tue, Jun 25, 2019 at 12:43 AM Guha <guha@google.com> wrote: > This document can be accessed here. > <https://docs.google.com/document/d/139jXakeQk4ChwCkGjqq5wJfCPMDnwIV94oCH-JzJrhM/edit?usp=sharing> > > > Look forward to feedback. > > Guha > > Representing aggregate statistics > > > Examples of aggregate statistical reports include those from Census > Organizations (e.g., American Community Survey), Health Organizations > (e.g., CDC Wonder) and many others. This is a schema, currently in use on > DataCommons.org for representing facts stated in these reports. This > document describes certain general mechanisms for representing statistical > populations and associated observations. This document will be followed > later by a companion proposal suggesting some basic common vocabulary > useful for representing the kind of data released by the US Census, CDC, > etc. > > Our interest is not in describing a data set or mapping columns in csv > files, but in representing the actual data itself. Other efforts have > focused on characterizing data cubes in terms of dimensions, etc. While we > draw upon their work, our goals are different. > > Examples of the kind of statistics we would like to represent include: > > 1. In 2016, there were 1213 people in East Podunk, California, who were > male, married, with a median age of 22. > 2. In 2017, there were 20 deaths in Falooda County where the cause of > death was XYZ > > We will refer to ‘number of people who are male, hispanic’, ‘number of > deaths where cause of death was XYZ’, etc. as variables. Since the number > of possible variables increases combinatorially, clearly, we can’t have a > properties for each variable (or worse, property for each variable x > years). We need a way of compositional way of constructing variable > references. We use the concept of a StatisticalPopulation to do this > construction. > > A StatisticalPopulation is a set of instances of a certain given type that > satisfy some set of constraints. The property populationType is used > specify the type. Any property that can be used on instances of that type > can appear on the statistical population. An instance of > StatisticalPopulation whose populationType is C1, which has the properties > p1, p2, … with values v1, v2, … corresponds to the set of objects of type > C1 what have the property p1 with value v1, property p2 with value v2, etc. > The properties numConstraints and constrainingProperties are used to > specify which of the populations properties are used to specify the > population. In the two examples above: > > > Node: SP1 > type: StatisticalPopulation > populationType: Person > location: EastPodunkCalifornia > gender: Male > maritalStatus: Married > numConstraints: 3 > constrainingProperties: location, gender, race > > > Node: SP2 > type: StatisticalPopulation > populationType: MortalityEvent > location: FaloodaCounty > causeOfDeath: XYZ > numConstraints: 2 > constrainingProperties: location, causeOfDeath > > > SP1 is an abstract set in the sense that it does not correspond to a > particular set of people who satisfy that constraint at a certain point in > time, but rather, to an abstract specification, about which we can make > observations that are grounded at a particular point in time. We now turn > our attention to the representation of these observations. > > Instances of the class Observation are used to specify observations about > an entity (which may or may not be an instance of a StatisticalPopulation), > at a particular time. The principal properties of an Observation are > observedNode, measuredProperty, measuredValue (or median, etc.) and > observationDate (measuredProperty can, but need not always, be w3c rdf data > cube "measure properties", as in lifeExpectancy example here: > https://www.w3.org/TR/vocab-data-cube/#dsd-example.) In the two examples > above: > > > Node: Obs1 > type: Observation > observedNode: SP1 > measuredProperty: age > median: “23 years” > observationDate: “2016” > > Node: Obs2 > type: Observation > observedNode: SP1 > measuredProperty: count > measuredValue: 1213 > observationDate: “2016” > > Node: Obs3 > type: Observation > observedNode: SP2 > measuredProperty: count > measuredValue: 20 > observationDate: “2017” > > > Observations can also have properties related to the measurement > technique, margin of error, etc. To elaborate on Obs2 above, we can have: > > Node: Obs2 > type: Observation > observedNode: SP1 > measuredProperty: count > measuredValue: 1213 > observationDate: “2016” > marginOfError: 22 > measurementMethod: CensusACS5yrSurvey > > > Notes: > 1. Care needs to be exercised when querying StatisticalPopulations, to > make sure that the query specifies all the constraining properties. > 2. We do not yet have a way of using properties which are named in the > opposite direction e.g. we handle "alumniOf" (relating a person to an org), > but if the only existing property was "alumni" (relating an org to a > person). > >
Received on Friday, 28 June 2019 02:22:58 UTC