Minutes from Jan 7 PropertyGraphs CG Telcon

[12:00] <Ashok> Meeting:Property Graphs CG
[12:01] <zwu2> Happy New Year, folks!
[12:02] <@TallTed> I'm on the phone.:-)
[12:02] <AndyS> Hi all - Happy New Year.
[12:05] <AndyS> participants -- http://www.w3.org/community/propertygraphs/participants
[12:05] <Ashok> present:Gregg, Patrick, Zhe, Ashok, Ted, Sridhar Sarnobat, Andy
[12:05] <gkellogg> scribe: gkellogg
[12:06] <Ashok> present+: Michael
[12:07] <gkellogg> topic: approve minutes from Dec 17 meeting
[12:07] <Ashok>http://lists.w3.org/Archives/Public/public-propertygraphs/2013Dec/0019.html
[12:07] <gkellogg> RESOLVED: approve minutes from 17-Dec-2013
[12:08] <gkellogg> ashok: we will have a telecon next week, but I'm away the following week. We could cancel it, or if someone else would like to chair, I'm fine with that.
[12:08] <gkellogg> ... Membership has changed, we now have TallTed from OpenLink joining us.
[12:09] <gkellogg> TallTed: I've been with OpenLink software for about 13 years, and several W3C groups.
[12:09] <gkellogg> ... We're interested in open standards. My personal focus is on everything but programming.
[12:09] <gkellogg> ashok: is your company putting out a property-graph implementation.
[12:10] <gkellogg> TallTed: I expect we will, but of course there's no spec yet.
[12:10] <Ashok> present+: Kelvin
[12:11] <gkellogg> ashok: I think our membership is pretty complete now, and I don't expect many new members from industry, at this point.
[12:11] <gkellogg> ... I don't think will get Neo or TinkerPop. They aren't interested in being part of a W3C standards effort.
[12:12] <gkellogg> ... I think we have a reasonable cross-section of the industry and can go forward, but that's up to the W3C.
[12:13] <AndyS> Ivan has a new role in W3C in Digital Publishing Activity.
[12:13] <gkellogg> topic: Use Cases
[12:13] <gkellogg> ashok: any activity on this?
[12:13] <gkellogg> topic: I think we have enough use cases to start thinking about what we should recommend.
[12:13] <AndyS> What about Phil Archer?"Open Data on the web activity"
[12:14] <gkellogg> ... We've been asked to spell out a rough charter for the working group, and that's what we should start talking about
[12:15] <gkellogg> ... On the wiki, I wrote that the group should do two things: standardize on the data model, and write an API standard.
[12:15] <gkellogg> ... Andy isn't happy with that, and thought we should really just focus on the data model and serializations.
[12:17] <gkellogg> andys: I've been focusing on the critical points: if you want a 2-years WG, it's a matter of putting as little into the charter of the WG as possible. Then, other groups could start on that when it would arrive. If there's too much in the WG, it will take longer before standards are established.
[12:17] <gkellogg> ... Judging by comments on the list, I think there's a lot to do. For example, your comments on the API pre-judges where the group would like to go.
[12:18] <gkellogg> ... Also, is this effectively another data stack in W3C, or is it some smaller-scale activity? W3C already has a number of data stacks around; why would they need another stack?
[12:18] <gkellogg> ... Also, will the WG get critical mass, or would it be in danger of fizzling out.
[12:18] <gkellogg> ashok: when you speak of another data-stack, are you thinking of RDF?
[12:19] <gkellogg> andys: If you look at XML, there is XSLT, XPath, ... That's a large stack. RDF is another stack, and is the defacto model around JSON.
[12:20] <gkellogg> ... When you started about a query language and other things, it starts to seem like another data-stack with a similarly broad scope.
[12:20] <gkellogg> kelvin: I don't think we've talked enough about the gaps a WG would work on. I've disagreed strongly on some of the points discussed on the mailing list.
[12:21] <gkellogg> ... There may be some existing formats that we could adopt and help standardize on, rather than going in a new direction.
[12:22] <gkellogg> ... The WG grew from the SF workshop, and there were definitely some gaps there, but not serialization. For example, if we're doing a social graph, can we agree on a schema for common types? We talked bout schema.org and JSON-LD.
[12:22] <gkellogg> ... I think there's a large group around best-practices, which a community group could tackle, not needing a WG.
[12:23] <gkellogg> ... There also seems to be a gap in declarativequery language.
[12:23] <gkellogg> ... For example, in gremlin you'd talk about how to navigate through a graph, rather than a more abstract definition of what is wanted.
[12:23] <gkellogg> ... There's also a gap on a REST API, the industry has tried this, and it didn't work well.
[12:24] <gkellogg> ... I don't want W3C to produce a spec and have key vendors ignore it.
[12:24] <gkellogg> ashok: Can you elaborate on the REST API issue?
[12:24] <AndyS> q+ to ask about graph query language possibities
[12:24] * Zakim sees AndyS on the speaker queue
[12:24] <Sridhar_Sarnobat> I'd like to support this point that a declarative query language worked better than a REST API. The latter has limited functionality with respect to filtering, projection, back-referencing etc.
[12:25] <gkellogg> kelvin: I've poked around on lists, and they've found problems on scaling and have gone out to dig further and look at other work themselves.
[12:25] <patrickDurusau> +1 on best practices and declarative query language
[12:25] <Sridhar_Sarnobat> (I felt this way when using Neo4j which offers Cypher over REST)
[12:25] <gkellogg> ... Clearly, it's a valuable area, but it might be pre-mature.
[12:26] <gkellogg> AndyS: I think there's a difference between best-practices and a query language. When you get to a query language, there's a lot of detail. The different graph specs are vague enough that this generates an issue.
[12:26] <zwu2> q+ to ask about opinions on the dfiference between RDF graph model and Property graph data model
[12:26] * Zakim sees AndyS, zwu on the speaker queue
[12:26] <Ashok> ack next
[12:26] * Zakim sees AndyS at the head of the speaker queue
[12:26] <gkellogg> ... For example, integer representation, if you think that a declarative language must be interoperable.
[12:26] <Zakim> AndyS, you wanted to ask about graph query language possibities
[12:26] * Zakim sees zwu on the speaker queue
[12:27] <gkellogg> ... I think there are different strands around these different needs.
[12:27] <gkellogg> kelvin: I saw a group in our company take a large RDF graph and turn it into a Property Graph. The first time they did it, the data model produced far too many nodes or edges. The thought was that if they had thought of it differently they would have come up with something better.
[12:28] <gkellogg> AndyS: I was wondering on if the focus is on different serialization formats, there may be small things that don't matter, but when you look at a standard query language, these details matter.
[12:29] <gkellogg> zwu2: I wanted to ask the CG's opinion on the difference between RDF and PG data models.
[12:31] <gkellogg> ... RDF has interpretation semantics. In a way, it's pretty heavy-duty; before you can use RDF you need to know something about semantics, IRIs reasoning, ...
[12:31] <gkellogg> ... However, for a regular user who just wants to do some simple things, RDF may be too much.
[12:31] <gkellogg> ... PGs are similar, but they're really just key-value pairs, and is somewhat simpler.
[12:32] <gkellogg> ... But, the lack of unique resource identification limits it's applicability.
[12:32] <gkellogg> ... IMO, PGs are simpler and easier to use, but don't address the semantics RDF is designed to do.
[12:35] <gkellogg> zwu: I disagree that the models can be easily used. You can say how strong relationships can be expressed.
[12:36] <gkellogg> kelvin: PGs tend to be node-centric. If you find the starting node, a lot of information is close by. RDF tends to be more edge-centric.
[12:37] <gkellogg> ... The arbitrary nature of PGs is also appealing. People seem to like working with them.
[12:37] <AndyS> There are node-centric RDF engines e.g. Haystacks.
[12:37] <gkellogg> ... It's not really A vs B, but PGs are getting a lot of attention in the industry.
[12:39] <gkellogg> ashok: I think we'll need to face up to some questions: should we start a WG to standardize the data model and serialization only?
[12:39] <gkellogg> ... Whether this comes out as an extension to RDF or as something different, that's another issue.
[12:39] <zwu2> +1 to standarzie data model
[12:39] ==Nick [~Nick@public.cloak] has joined #propertygraphs
[12:39] <gkellogg> ... Secondly, if that's a reasonable thing for us to do, is that enough? Or, do we also need to add an API?
[12:40] <gkellogg> ... We could start with the data model and do that quickly, and have a better idea of where to go with the API.
[12:41] <gkellogg> kelvin: I'm worried that we're talking about things...
[12:42] <gkellogg> ashok: certain kinds of data, e.g. from social networks, require a certain kind of data model. RDF doesn't provide it that well; we'd like to create a model which is much more suitable for that kind of data.
[12:42] <AndyS> q+
[12:42] * Zakim sees zwu, AndyS on the speaker queue
[12:42] <gkellogg> kelvin: is that a standard or best-practice?
[12:42] <gkellogg> ... my problem is that a PG is inherently simple, similar to XML.
[12:43] <gkellogg> ashok: a relational data model had a large impact because it was inherently simple.
[12:43] <gkellogg> kelvin: I don't know what it means to build a data model for a PG. When I think of this, it's just a bunch of nodes and edges with properties on those.
[12:44] <gkellogg> zwu2: But there's not standard around this. It doesn't go down into details that allow for interoperability.
[12:44] <gkellogg> ashok: it would be helpful if some graph could be identified using URIs. Then you could exchange this data with other kind of data and do other interesting integration things.
[12:45] <gkellogg> kelvin: I think we're back to the XML schema thing. You can exchange using, e.g., GraphML. What we're debating here is what XML schema did for XML.
[12:45] <gkellogg> AndyS: I think there's evidence to support things from where Neo is going. They're putting labels on nodes and this is much like what RDF does.
[12:46] <gkellogg> ... If there's a deficiency in RDF, why isn't it just a matter of fixing RDF, rather than going to something new and different?
[12:46] <gkellogg> ashok: good question. Technically, I don't know. Politically, that's another issue.
[12:47] <gkellogg> zwu2: IMO RDF learning curve is steeper.
[12:47] <gkellogg> kelvin: politically, RDF is a poisonous work. Ivan said "don't go there".

[12:48] <gkellogg> ... I think the reality is that this would just take us down a rat-hole. If you accept that PGs are hear to stay, that makes it a separate topic of discussion.
[12:48] <gkellogg> AndyS: If you want to add schema information to property graphs, you're in much the same territory as RDF.
[12:48] <gkellogg> zwu2: there is no semantics, entailment or URIs.
[12:49] <gkellogg> AndyS: When you're using linked-data at scale, it's not about that. it's about data integration and linking between datasets. Talking about entailment, it's just one subset of what RDF is used for.
[12:49] <zwu2> but a fundamental thing linked data use is URIs
[12:49] <gkellogg> ... We don't think about entailment too much.
[12:51] <gkellogg> zwu2: when we're talking about linked data, URIs are fundamental things.
[12:51] <gkellogg> ... But, with PGs, all you have is key-value pairs. If you have two separate graphs, you have two separate graphs.
[12:52] <gkellogg> ashok: I was thinking if we get started on this, that's an aspect we have to address.
[12:52] <gkellogg> ... If we get started on this, we are going to face up to how to integrate data and how to use URIs.
[12:53] <gkellogg> ... We're trying to build a PG data model within the W3C umbrella, which means URI/protocol/serialization techniques. I think that's useful on it's own.
[12:53] <gkellogg> zwu2: we need to make the technology work well with RDF.
[12:53] <gkellogg> ... If we see simplicity in PGs, then that's what we need to work on.
[12:54] <gkellogg> ... Once you have a PG data model defined, you may be able to make it work with RDf technologies too. The most important thing is provenance and simplicity.
[12:54] <gkellogg> kelvin: I think there are scenarios with different technologies which suite one or the other. Technology's usually not the deciding factor.
[12:55] <gkellogg> ... I see people using big data going towards property graphs.
[12:55] <gkellogg> ... most social networks are set up as property graphs.
[12:56] <gkellogg> AndyS: A lot of big systems don't really use PGs, they're special use for a particular use case.
[12:56] <gkellogg> ashok: and it's usually just one particular use case.
[12:56] <gkellogg> AndyS: It's not clear what a WG should really do. I think we need more time in the CG to define the charter.
[12:58] <gkellogg> ... If I were in W3C's shoes, there isn't a clear-cut charter that's sufficiently bounded to get people to do real work.
[12:58] <gkellogg> ashok: the idea was to get us started thinking about this.
[12:58] <gkellogg> ... I'm also glad that we have a broad set of opinions so we can focus on what we should be doing.
[12:59] <gkellogg> ... Let's continue on the call next week at the same time.
[12:59] <gkellogg> ... likely cancel the call week after next.
-- 
All the best, Ashok

Received on Tuesday, 7 January 2014 18:11:42 UTC