Re: Trip report on dagsuthl seminar Big Graph Data Processing

Hi Dave,

Thanks for the comments. Chiming in inline.


On Sat, Dec 14, 2019 at 7:27 AM Dave Raggett <dsr@w3.org> wrote:

> [...]
> > • How do we create mappings between different data models?
> > • Or should we create a dragon data model that rules them all, such that
> all data models can be mapped to the dragon data model? If so, what are all
> the abstract features that a data model should support?
>
> This corresponds to the pros and cons of using upper ontologies vs peer to
> peer mappings. The answer which is best depends on the context and which
> approach proves to be cheaper, more robust etc.
>

True, and one more step removed, it corresponds to the pros and cons of
"hub" datasets like DBpedia, "star" tables in data warehouses, etc. The
analogies go on and on. For data models, a star pattern can be valuable for
facilitating composability. If you can compose data models in an
associative fashion, you can more easily decouple data, queries, and
processes from any one model, and carry them across models. In the graph
community, we have been developing pairwise mappings between data models
for a long time. Unfortunately, it is usually the case that even if you
have a mapping between model A and B, and another mapping between B and C,
you don't thereby have a mapping between A and C, because the mappings
formalize B in different ways. Property graphs are a frequent "B" because
there has been no single agreed-upon property graph formalism.



> > • What is the formalism to represent mappings? Logic? Algebra? Category
> Theory?
>
> Do we need really need such formalisms?  An alternative is to see this as
> figuring out how to define mappings between graphs


Well, now we're back at the schema or instance level. I would say that no
single ontology will save you from defining pairwise mappings, because the
set of terms we might want to align is unbounded. Less so for data models,
which deal with syntax and the most basic semantics.



> based upon the statistics of a set of training examples, e.g. as used by
> Google translate to map text in one human language to text in another
> language. Rather than manually developing mapping rules, we would instead
> focus on curation of examples and counter examples, and scoring mappings on
> a scale of good to bad.  Is this blend of graph+statistics in scope for the
> Semantic Web?
>

I don't see statistical approaches as to schema or dataset alignment as
being at odds with abstractions for data models and mappings. We need both,
although in my experience, we need the abstractions *more urgently*, at
least in the context of enterprise data integration. For some definition of
"we".



> > • What are the properties that mappings should have? Information, Query
> and Semantics preserving, composability, etc.
>
> I would emphasis machine learnability!
>

Again at the schema or instance level, I agree that automated mappings
could be extremely useful in some scenarios, saving massive amounts of
developer time. Some practical reasons you may see machine-learned mappings
less often than human-defined ones are the added expense of data analysis,
and the complicatedness of combining data from multiple datasets in a
single processing workflow.

Josh



>
> Dave Raggett <dsr@w3.org>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.w3.org_People_Raggett&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=yHrezOOUvTAeD_KgsElyJw&m=zWC35dvWhIiEVhQRZ8yKb2ykDql0Tlyu6ux13Cm2djE&s=fv8_-MZKOQiDl-Ti94HM9nKfMtTyj7OuTGDfirJlWr0&e=
> W3C Data Activity Lead & W3C champion for the Web of things
>
>
>
>
>

Received on Saturday, 14 December 2019 19:57:26 UTC