- From: Matt Womer <mdw@w3.org>
- Date: Wed, 14 Sep 2011 22:59:11 -0400
- To: public-poiwg W3C <public-poiwg@w3.org>
- Cc: Henning Schulzrinne <hgs@cs.columbia.edu>
Hi all,
For last week's meeting we had a special presentation from Henning Schulzrinne. The slides are available here:
http://www.cs.columbia.edu/~hgs/papers/2011/poi-urn.pptx
and the minutes are available at:
http://www.w3.org/2011/09/08-poiwg-minutes
and as text below.
-Matt
--
[1]W3C
[1] http://www.w3.org/
- DRAFT -
Points of Interest Working Group Teleconference
08 Sep 2011
See also: [2]IRC log
[2] http://www.w3.org/2011/09/08-poiwg-irc
Attendees
Present
Regrets
Chair
SV_MEETING_CHAIR
Scribe
Matt
Contents
* [3]Topics
1. [4]Identifying and Categorizing POIs, presented by Henning
Schulzrinne
* [5]Summary of Action Items
_________________________________________________________
<trackbot> Date: 08 September 2011
<robman> hey matt
<robman> great thanks...you?
<robman> sure...lets talk later
<cperey> 848.aaaa is Christine
<cperey> called BuildAR
<scribe> Scribe: Matt
Identifying and Categorizing POIs, presented by Henning Schulzrinne
->
[6]http://lists.w3.org/Archives/Member/member-poiwg/2011Sep/att-0005
/poi-urn.pdf Slides
[6] http://lists.w3.org/Archives/Member/member-poiwg/2011Sep/att-0005/poi-urn.pdf
Henning: pulled some use cases out of the draf
... The ones I pulled out seemed to be about finding categories of
things.
... How can we divide the millions of POIs into manageable
categories for searching?
... Some problems we won't be solving:
... Properties of POIs vs category, e.g. "restaurant that takes
credit cards" isn't a category.
<cperey> restaurant a favorite category :-)
Henning: The distinction between what is or isn't a category is
somewhat arbitrary. You can claim that "Italian Restaurant" could be
a category, or it could be a cuisine attribute of a restaurant.
... It's fuzzy, and one has to make pragmatic choices about what
user's typically expect.
... The two characteristics I see defining categories is: 1. ?? and
2. that categories are not interchangeable
... A "gas station" and "restaurant" is not interchangeable, but
even this gets tricky, "synagogue" and "Christian Church".
... North American Industry Classification System (NAICS) is one
standard, the only one I've noticed. Based on Census.
... Very much comes out of industry, designed for classical
manufacturing type industries, i.e. "this establishment produces
cutlery"
... It struggles today with identifying services.
... While it's a fine example of categories, it isn't what we'd want
to use though. Restaurants for instance have just two
classifications: full-service and limited service.
... That may be okay, but is somewhat limiting, and not really what
I'd expect users to care about.
... Great from a statistical perspective though.
... Many of things you'd want to look up, aren't in the system at
all
... I tried some things that are common from GPS POIs and they don't
appear at all in NAICS, e.g. ATMs, wifi hotspots, monuments, etc.
... One alternative is to say we've got Google, just use free-text.
That works, and is probably better than many alternatives, but free
text is also hard to translate into other languages, and the same
service has many names.
... e.g. ATM, cash machine, automated teller machine, etc
... Then there are also things like distinguishing between a diner,
a café, coffee shop, etc. McDonalds calls itself a restaurant, but
most of us would think of it as another term.
<cperey> McDonalds =? Restaurant?
Henning: So you might get lots of things that you wouldn't expect
showing up on a list.
... There's also properties, such as "public", "university", of
library.
... and hierarchy is missing too: French vs French Restaurant.
... Another option is map overlay labels. These are usually used to
label topographical features and not services, like restaurants,
ATMs.
... GPS POIs are much more consumer applicable.
... As far as I can, there isn't any standardization for POI labels,
at best informally standardized. I haven't confirmed with every
vendor though.
... Some of the categories are a bit odd too, sometimes very broad:
all government services labeled as one, even if it's everything from
libraries to prisons, but sometimes libraries are separate etc. It's
inconsistent and not clear if it's official or just made up.
... And lastly, the Yellow Pages labels.
... I've heard every region does there own thing and often picked in
a way that businesses would appear in as many places as possible,
rather than where users might expect them.
... Yellow Pages wouldn't contain bathrooms and ATMs or other things
without a phone number.
... Coming from another direction, we had a similar problem when we
started redesigning the emergency calling system in the US.
... One of the big problems is that every country pretty much has,
for historical reasons, used a different digit pattern.
... Confusing scenario where in some cases a number is used
generically for all services, i.e. 911 (excepting poison control),
but other countries have different numbers for police, fire,
ambulance, etc. So, there was no hope of a standard for a number
identifier.
... We started on something different, RFC 5031. It's a URN for
services. urn:service:sos
... They're extensible via IANA.
-> [7]http://tools.ietf.org/html/rfc5031 RFC 5031
[7] http://tools.ietf.org/html/rfc5031
Henning: These are internal to the system, used for call routing.
... Allows devices to have an entry that gets to the right people.
... NG911 and NG122 have been moving in this direction. 911 it seems
to be accepted and getting deployment.
... We determined that this was extensible to other services too.
... N11 services -- always reserved 3 digit numbers other than 0 and
1 and end in 11, e.g. 211, 311
<ahill2> 611 is used by the cell company for their information
Henning: In that same spirit, we explored extending it to
non-communication services.
... Designed for consumer use, things that we can label, e.g.
"food", "fuel", "business", "communication" (wifi hotspots, internet
café). Design not to have thousands of categories, but still similar
to a GPS POI finder.
... 13 top level categories that then have further detail within.
... e.g. transportation.airport
... So, remaining issues and to-dos:
... Are there systems out there already? Can we extend it without
breaking? Maintained in some way that there is coherence in the
labeling? If not, and we go forward with the URN model, would we
register these things? With IANA?
... IANA would be just a database, would we sub-delegate that?
... How would maintenance be done on that?
... How is it maintained? I'm partial to something like the Olson
time zone database model.
... Not an official group, or a government thing. It used to just be
one person, but now there's a mailing list with consensus process.
... It will be maintained on a long term basis through IANA.
->
[8]http://tools.ietf.org/html/draft-lear-iana-timezone-database-02.h
tml IANA Procedures for Maintaining the Timezone Database
[8] http://tools.ietf.org/html/draft-lear-iana-timezone-database-02.html
Henning: So that's my brief summary of what I'm trying to do.
... So two things: is there obviously related work that someone else
has done? Or if not is there a group of people that might be
interested in forming a nucleus of an organization to do this?
... If people take it up, yes, great, if not the harm is fairly
limited.
robman: Two questions: how would URNs support non-English?
Henning: Are you familiar with IETF i18n document?
... This falls in the category of protocol label, not meant for
human consumption. Each label would be called something different in
each language, but as a protocol label for routing and querying is
that you should stick to one language.
... At the IETF, it's been English. There's nothing inherently in
the labels to prevent other languages, modulo the i18n URN problems.
robman: Because it's much more into the meaning than normal labels,
it'd have labels that mean things just in certain cultures.
Henning: Right, if there was a category that doesn't have a good
label in English, it's still plausible. It's not intended to
preclude anything, but with it being a protocol label there's less
concern over it at the moment.
robman: This is very top down. What if people could create random
labels? And if they're not used, they'll die off or be redundant
anyway?
Henning: There's likely not going to be a way to solve the language
labeling problem. I'm not opposed to the free text model. Because of
the need for i18n, and because it's not used as a search term, that
I have some doubt that free text will be successful.
[[I was just thinking of schema.org given this conversation, and now
I see: [9]http://www.schema.org/Place]]
[9] http://www.schema.org/Place
Henning: I'm not thinking the urn would be typed into a search for
instance.
... Right now we don't have network databases, but these static ones
that vendors maintain.
robman: I think the international bridging point you made is quite a
good argument for it.
ahill2: How do you see this translation between labels and URNs
happening?
... I can imagine a world where Google says "these are the
categorizations" and translate them to URNs, but for an ordinary
individual, what services are available to them to translate their
terms and searches into these categories?
Henning: I imagine the software built by say a tourist app or a GPS
vendor, would in turn, depending on the appropriate interface, would
build some subset of these labels into their system and translate
and expose them appropriately.
ahill2: You think this translation between common labels and URNs
happens ad-hoc? No central database of any sort?
Henning: I hadn't thought that far ahead, but if we were more
ambitious, there would be a description for each term, geared
towards localization.
... Nothing prevents that if we get enough people together.
ahill2: Isn't that what is happening today with Google? Isn't it the
translator today?
... If this knowledge is being crowd sourced somewhere, Google or
OSM, we should use that.
Henning: I was unimpressed by OSM process.
ahill2: I'm envisioning a search where these urns are available,
e.g. every result has a category. That would be neat, because then
the search would be the translator between free text and labels and
urns.
Henning: If someone could do that I'd be delighted.
<robman> [10]http://en.wikipedia.org/wiki/Web_Ontology_Language
[10] http://en.wikipedia.org/wiki/Web_Ontology_Language
robman: What about these taxonomy communities that are working in
their own domains, like OWL.
... It's not just locations we're talking about, but we're cross
domain here.
Henning: Yes, that's part of why I was asking for pointers to these
communities. Once we get into the ontology side more, that would be
helpful.
... I'm unsure that the type of work, if it's property attributes,
etc, if it's directly applicable, or if sub-pieces of that can be
pulled out. We don't, from a POI perspective, want a complete
ontology that crosses categories and properties.
... If there are communities we should know about, please let me
know. We looked about a year ago into this, building a system than
could combine ontologies, e.g. find a specific movie and dinner with
a cuisine type. We didn't find anything then, but we might have
looked in the wrong place.
ahill2: Can we remind ourselves of some of the other categorization
efforts that we discussed. I believe Library of Congress was
discussed and a number of them had URLs involved.
Henning: Any pointers you have, please pass along. One difference
between identifying specific objects and categorization is
one-to-one vs one-to-many.
Raj: Geonames
ahill2: Does that do categorization?
rsingh2: Yes, but they're just categorizing places, not business
classifications.
... They started with USGS classification system, but theirs is much
smaller problem than ours.
Henning: Looking at geonames, they've got postal codes as the lowest
level I see.
... School, post office, cemetery, etc. Not sure how many features
they have.
rsingh2: That's right out of USGS.
<karls> hi
ahill2: What did we propose to use?
Henning: The doc is all I know is from the doc NAICS.
rsingh2: I think coming up with a single country classification
scheme is easy, but what's harder is a POI system like for AT&T,
where they want you to search for say where to get phone cards.
... That's at one type of business in the USA, but another type in
other countries.
... Reconciling that between countries is very difficult.
karls: There's a ton of work on brand binding and chain binding to
help that work.
... That side-steps classification though.
... Using NAICS is mostly for information exchange, most of the time
these are hand tuned by the app devs. Many schemes that are app
specific.
... The low-level standards are just used for hand-off so people can
do mappings.
Henning: That's what I've seen as well.
karls: It's useful to carry around NAICS codes in terms of the spec,
as our spec is about exchanging information, but in terms of
customer facing stuff, it's pretty open ended. Our model should be
we'll support the structure, but you make it up.
rsingh2: My instinct is similar, we're not ready to tackle that in
version 1.
... We might be overstepping the bounds of what innovative
developers would build.
karls: Typically these systems were done for handhelds, or
constrained environments. I think search trumps all though.
... The conversation at Microsoft/Nokia/NavTEQ is do we care about
categorization anymore?
Henning: The search experience is good from large providers, but it
requires a fair amount of user skill to get what you want. Looking
at Restaurant, you have two things like Google maps, but also
specific ones like Urban Spoon.
... There's more relevant hits in the latter.
karls: Here's what I see: one end there are proprietary category
systems, on the other there's web page crawling for open ended
search.
... In between, you've got a lot of POI gazetteers who are doing
meta tagging, as it facilitates parametric search.
... The middle ground is the tagging. I thought the spec addressed
that capability to open endedly do the metatagging.
ahill2: Can you elucidate on that a bit more karls?
karls: Take a service like Open Table, where they have restaurant
categories and sub-categories.
<robman> +1 to link based structured data 8)
karls: You're not going to get that information out of scraping a
web page. That information is best consumed by an application by OT
if a POI has a pre-set, open-ended list of terms that describe it
well. It's tantamount to the meta tag on HTML pages.
... Gazetteers are doing field ops, web scraping, crowd sourcing,
etc, to distill down to ten or twenty keywords that are the most
descriptive to put in the POI.
<rsingh2> parametric search = faceted search
<rsingh2> [11]http://en.wikipedia.org/wiki/Faceted_search
[11] http://en.wikipedia.org/wiki/Faceted_search
karls: Typically the app tier puts a parametric search on top of
that: hours, beer, etc.
ahill2: We're talking about somewhere between category only search
and free text.
karls: You could argue that it's all categories or parameters, e.g.
24 hour restaurant could be a category or a property.
rsingh2: The popular term would be faceted search.
Henning: Close, but not quite, you might have things like types of
credit cards accepted, and it might be labels drawn from a set, or
specific information that isn't categorized: e.g. open hours.
<rsingh2> I'm late for another call. Bye all.
robman: That's why we were thinking open ended links, because it is
so closely tied to the users' mind space when they search.
... If we approach it as a categorization problem we have to
approach it differently.
Henning: I think I differ on that. If you look at OT, they do do
categorization, they do much better than just crowd source tagging.
karls: I think what we want to do is to be able to have OT exchange
their POIs outside their business sphere.
... So, we want to make sure the spec can support rich and
proprietary tagging, without defining the facets ourselves.
Henning: Why not some of the facets? I think I've demonstrated that
some are viable.
ahill2: One of the things we've been careful about is making sure
that there are multiple categorizations that could apply to a POI.
Henning: It could have multiple category schemes too.
ahill2: In your proposal, are you open to the idea that NAICS ends
up adding some of these categorizations that are facets as opposed
to routing to a specific business?
... That is: if there were a number of different categorizaties that
a business has, would NAICS be the appropriate place to build up a
category?
Henning: I'm not part of NAICS, but given that they're part of the
census, I imagine they wouldn't be looking at these properties. I
can't say what they should do, but my perception is that their
mission is industry classification statistics.
... eg. how many people work in fast food restaurants, rather than
say what credit cards they take.
karls: They're also missing juicy POIs, like golf courses, transit
stops, etc.
Henning: Yes, so far it seems outside their mission of what they're
doing.
ahill2: Sorry, I think I asked the question wrong. In your URN
proposal, would you see those categories, which are facets outside
of a category being appropriate, e.g. hours, or all the way down to
the kind of information from crowd sourcing.
... Where do you draw the line?
Henning: A URN to my mind is not as suitable for these
non-categorization models. You've identified some binary things, but
many are not easily represented in the same fashion. That said, we
have separately, and I didn't talk about it here as it's
preliminary, in the system we built, that has the ability to
retrieve an XML type document with suitable tags that have that
information.
... We could envision that being useful for us to agree on labeling
to enable exchange.
<ahill2> thanks, that answers my question
Henning: There's an opportunity there, didn't discuss it here, and
it's to some extent orthogonal, but there's a need for that as well,
maybe industry specific bodies, which might be in a position to do
that more appropriately.
... I look forward to the mailing list conversation.
cperey: As for next steps, Matt will publish the minutes of the
meeting. It's almost a transcript.
... He publishes that as a URL, it becomes archives for the group.
That gets it out to a larger audience, but after that it's kind of
up to this group. We're having our F2F in two weeks.
... We should work on this at the F2F and followup with actions from
that.
Henning: There's no dependency here, so that's fine.
... Right now, I don't even see it as appropriate to include it in
the doc, as it's not specific to this effort. But, I would like to
look for a community of interest to take it to the next level of
specificity.
... I'm not asking the WG to take on this particular task, it's
probably outside the immediate scope.
matt: Could be a CG perhaps? POI WG decided not to do this.
... Thank you!
Henning: Thanks, and thanks to Christine for arranging this.
Summary of Action Items
[End of minutes]
_________________________________________________________
Minutes formatted by David Booth's [12]scribe.perl version 1.136
([13]CVS log)
$Date: 2011/09/08 15:29:05 $
_________________________________________________________
[12] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
[13] http://dev.w3.org/cvsweb/2002/scribe/
Scribe.perl diagnostic output
[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.136 of Date: 2011/05/12 12:01:43
Check for newer version at [14]http://dev.w3.org/cvsweb/~checkout~/2002
/scribe/
[14] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/
Guessing input format: RRSAgent_Text_Format (score 1.00)
Succeeded: s/sos/service:sos/
Found Scribe: Matt
Inferring ScribeNick: matt
WARNING: No "Present: ... " found!
Possibly Present: Henning P11 P14 Raj aaaa aabb aacc aadd ahill2 cperey
danbri joined karls matt poiwg robman rsingh2 trackbot
You can indicate people for the Present list like this:
<dbooth> Present: dbooth jonathan mary
<dbooth> Present+ amy
WARNING: No meeting chair found!
You should specify the meeting chair like this:
<dbooth> Chair: dbooth
Found Date: 08 Sep 2011
Guessing minutes URL: [15]http://www.w3.org/2011/09/08-poiwg-minutes.ht
ml
People with action items:
[15] http://www.w3.org/2011/09/08-poiwg-minutes.html
WARNING: Input appears to use implicit continuation lines.
You may need the "-implicitContinuations" option.
End of [16]scribe.perl diagnostic output]
[16] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
Received on Thursday, 15 September 2011 02:59:21 UTC