- From: Aaron Gray <aaronngray@gmail.com>
- Date: Sat, 1 Apr 2023 00:24:03 +0100
- To: Ryan B <w3c@ryanb.org>
- Cc: Bob Wyman <bob@wyman.us>, public-swicg@w3.org
- Message-ID: <CAKXmGHB+kMbX+M9kCkMf9PH0ZPUijtqaVwfFOJAsMS59R0CXXQ@mail.gmail.com>
On Sat, 1 Apr 2023 at 00:12, Ryan B <w3c@ryanb.org> wrote: > Individual situations vary, but in general, real world HTML markup tends > to be much more brittle and change more frequently than JSON data. I've > spent a lot of time on tools that both scrape HTML in the wild and handle > medium to large JSON objects, and I've had better luck keeping the two > fairly separate. I'd suggest a pre-processing pass to scrape the HTML into > a consistent JSON schema, and only then handle that and your native JSON > together, however you like. > Yes, extracting the HTML and processing separately is probably the easiest option :) > > On Fri, Mar 31, 2023 at 3:06 PM Bob Wyman <bob@wyman.us> wrote: > >> In a Mastodon.social post >> <https://mastodon.social/@bobwyman/110120087223817037>, I asked: >> >> XPath and JSONPath are similar, but different. (See JSONPath spec >> <https://goessner.net/articles/JsonPath/>) This presents a problem for >> me since I'm building a system to query ActivityStreams objects that can >> include HTML wrapped in JSON. >> >> Should I: >> >> - Use XPath syntax for both JSON and HTML? >> - Use JSONPath syntax for both JSON and HTML? (If so, is there a >> reasonable extension to JSONPath to support selecting on HTML attributes?) >> - Switch between JSONPath and XPath depending on the underlying >> datatype? (e.g. Embedding XPath in JSONPath.) >> >> If you were writing a query, would you accept needing to know both >> syntaxes? >> >> I would appreciate any advice you might be able to provide. Also, I would >> be interested to hear if anyone else has already been faced with and >> addressed this issue. >> >> bob wyman >> > > > -- > https://snarfed.org/ > Aaron -- Aaron Gray - @AaronNGray@fosstodon.org Independent Open Source Software Engineer, Computer Language Researcher, Information Theorist, and Computer Scientist.
Received on Friday, 31 March 2023 23:24:21 UTC