- From: Norm Tovey-Walsh <ndw@nwalsh.com>
- Date: Fri, 19 Aug 2022 09:39:00 +0100
- To: xmlschema-dev@w3.org
- Message-ID: <m2zgg0shq7.fsf@nwalsh.com>
Greg Hunt <greg@firmansyah.com> writes: > From the comments in the W3C blog it sounds like Xerces in Java 11 > does not support this. I bet it does if you tell Xerces to follow redirects. (time passes) Okay. I gave myself 30 minutes to see if I could figure this out. It took closer to 40, but you know, that’s not bad for programming. 1. I couldn’t find any way to tell the Xerces parser bundled with JDK11 to follow redirects natively. I’m not saying it can’t be done, I just couldn’t figure it out in ~30 minutes. 2. The escape hatch that the parser does give you is the entity resolver. If you wrote your own entity resolver that followed redirects, that would work. 3. You don’t have to write your own, because XML Resolver exists. (Shameless plug because I wrote it.) Here is the boilerplate schema validation code that I cut and pasted out of the Oracle JDK11 docs. I’ve added exactly two lines to it: // THIS LINE CREATES THE RESOLVER org.xmlresolver.Resolver resolver = new org.xmlresolver.Resolver(); // parse an XML document into a DOM tree DocumentBuilder parser = DocumentBuilderFactory.newInstance().newDocumentBuilder(); Document document = parser.parse(new File("instance.xml")); // create a SchemaFactory capable of understanding WXS schemas SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI); // THIS LINE USES THE RESOLVER factory.setResourceResolver(resolver); // load a WXS schema, represented by a Schema instance Source schemaFile = new SAXSource(new InputSource("https://www.w3.org/2001/XMLSchema.xsd")); Schema schema = factory.newSchema(schemaFile); // create a Validator instance, which can be used to validate an instance document Validator validator = schema.newValidator(); // validate the DOM tree try { validator.validate(new DOMSource(document)); } catch (SAXException e) { // instance document is invalid! } With the addition of those two lines of code, it will follow redirects and happily validate. (For the record, I am not asserting that this is a simple and straightforward thing for every user to do. Lots of folks using JDK11 to validate have probably never heard of entity resolvers. Changing software is hard, especially if it’s considered a legacy application. In some environments, adding a new third party library may be very hard or impossible. I just wanted to work out what would be required to fix it, if you wanted to fix it. The answer is “Write or obtain an entity resolver that will follow redirects for you and use it.” That’s not hard in principle, even if it is hard in practice.) > Break the validation, even momentarily, and all you have is a legacy > technology that is harder to argue for. > > I am with Michael on this, publishing stable URIs, (and I am inclined > to factor in the frankly rather vague statements about dereferencing > URLs), constituted a promise to not change things, a promise that you > cannot evade by saying people ought to be reading the W3C blog and > updating their software. I think those are very reasonable and valid points. On the other hand, configuring software so that it dereferences www.w3.org to do validation of some local resource was probably not an explicit decision, it’s probably an accident. The application is going to fail when www.w3.org falls off the internet, which I’m sure it does periodically when maintenance is performed, or when someone borks DNS on purpose or by mistake. We know that http: URIs are insecure and subject to various kinds of attacks. If someone constructs an attack vector that uses a hacked schema injected into an insecure HTTP stream to get software to accept an otherwise invalid document with some downstream consequence that the black hats can exploit, that’s bad too. If a bit…unlikely. Be seeing you, norm -- Norman Tovey-Walsh <ndw@nwalsh.com> https://nwalsh.com/ > We think in generalities, but we live in detail--Alfred North Whitehead
Received on Friday, 19 August 2022 09:30:58 UTC