XML is ubiquitous in modern computing, used in everything from webpages to remoting protocols, from GUI definitions to mathematical notation, from sheet music to graphics formats and much more besides.
I’d like to think I have a realistic view of XML, that I recognize that it is not a panacea, that I can see that it has its technical shortcomings and that there may be better tools for solving a given problem.
In spite of this somewhat lukewarm stance on XML, I sometimes find myself in the position of defending XML and its use against unreasonable attacks by those who are vehement that XML has absolutely no merit whatsoever and dismiss a technology out-of-hand because XML is involved in any way at all.
XML – The Bad
So why do people detest XML so much? These are what I think are the main reasons:
- Redundancy– when people are arguing against XML they don’t tend to put it like this, they actually like to say “it’s not human-readable” or some such, but redundancy is often what they really mean, and their issue is with the XML closing tags repeating the tag name. This doesn’t make XML less human-readable, and arguably can improve clarity of where particular sections have ended (just as people sometimes put code comments after closing braces in programming languages to say which block the brace was closing – XML just builds that in for you). It might make XML less human-writeable, but that is mitigated by an XML-aware editor like Visual Studio that can add the closing tags automatically.
- Complexity – even at the basic level of the markup, there is more than there might be because XML has both attributes and elements, rather than just one way of declaring content. Add on the idiosyncracies of text content in XML, XML Schemas, DTDs, namespaces and all the other technologies that are built on top and it can look quite daunting and nebulous. Just like with the redundancy issue, people offer up a vague human-readability argument when claiming how JSON is a much better alternative, but here I believe it’s the minimalism of JSON that they find appealing, not better readability of the markup itself. Non-essential functionality in a language has a cost in increased learning curve, but that cost is offset against the savings in not having to roll-your-own solution and consequent improved maintainability and intelligibility to new engineers on a project.
- Strictness – in some people’s minds, XML is equated with a strict schema (defined by a DTD, or more likely, W3C XML Schema), and that moreover this is a bad thing. For starters, one isn’t forced to use any kind of schema language with XML, and even if one does use something like W3C XML Schema there are ways of allowing for loose structure and ad-hoc extensibility (e.g. allowing a free-for-all of elements and attributes in a foreign namespace). People who have issue with strictness tend to have future compatibility in mind; they want to be able to add in new definitions and just have them ignored by older applications that don’t understand them, rather than have data rejected as invalid. There’s a degree of subjectivity in that line of reasoning, and there are arguments for both lax and strict strategies with file formats or wire protocols, just as there are arguments for and against different type systems in programming languages. Even so, I acknowlege the tendency of formats idiomatically defined in W3C XML Schema to have elements forced into a particular sequence more than an application necessarily needs them to be, due to W3C XML Schema complex type inheritance always being done with sequences of the elements in each level of the inheritance hierarchy, and due to the UPA rule.
- Bad Apples – bad experiences with a technology may make people sour to it, even if the technology itself was not to blame. I suspect that C++ is often a victim of this tendency too. If a particular XML technology, e.g. XSLT (which can be as frustrating as it is useful), is found to be problematic by an engineer, then XML as a whole may be tarred with the same brush. Similarly, use of XML in a given application may have been unsuitable (e.g. XML used to store or transfer large amounts of unstructured data where storage space or bandwidth were at a premium), or the application itself may have been a failure, and though XML itself was not at fault, XML gets the guilt by association. Given how widely XML is used, it inevitably gets a lot of ire directed at it too.
XML – The Good
If XML were really as bad as is sometimes made out, then it wouldn’t be used in applications that actually work. Here are some reasons why XML might be an appropriate choice for your application.
- Support – You’ll find XML parsing and writing libraries for any modern general-purpose programming language. Chances are there’ll be library support available for other XML technologies such as schema validation in your chosen programming language too. Your favourite programming text editor will recognize XML markup. In general, you will find off-the-shelf tools and technologies, many available for free, that you can use to get the job done with little bespoke coding. Whilst growing in popularity, even JSON doesn’t yet come close to the level of support of XML.
- Maintainability – XML will be understood by any engineer who has to maintain your software later, even if that engineer hates XML. XML has been around for the better part of 2 decades, and there’s little chance of it disappearing from the engineer’s toolkit any time soon, even if other technologies do rise in popularity. Maintainability is also an argument against a bespoke domain-specific markup solution even if in other respects it might be technically superior.
- Flexibility – Minimalism comes with a cost too. Arguments about the complexity of XML can be turned around into arguments of why it is flexible enough to meet many more needs than a more limited markup language might be capable of. JSON doesn’t support comments natively, nor does it support mixed content, which is essential for a document format like XHTML to be able to mix text and tags inline, nor does JSON support multi-line text strings (newlines have to be escaped). This isn’t a criticism of JSON, merely an observation that XML is more suitable, and indeed more human-readable, when applied to certain problems.