Computer Hardware Reviews at Computer Power User Magazine. Your source for overclocking software guides, building your own computer, pc cooling and computer modding.
Home | Forums | Article Search | Subscribe & Shop | Contact Us | Log Out


Other Approaches To Schemas For XML Email This
Print This
View My Personal Library

Caught In The Web
February 2005 • Vol.5 Issue 2
Page(s) 84-85 in print issue
Add To My Personal Library

Other Approaches To Schemas For XML
Coder’s Corner
During the past seven months, we have described in great detail XML Schema, which is the W3C's (World Wide Web Consortium) official schema language for XML. Hints along the way, however, have suggested there are other approaches to defining schemas, some of which are actually better and easier to use than XML Schema. These rumors are true, and in the next few articles we will look into these interesting alternatives.

The best place to start, however, is with some history that shows how XML Schema and these other approaches originated. This will help give you some context into why XML Schema is the way it is and explain why the other approaches took hold outside the W3C.

A Little XML History

The XML Schema specification was made official in May 2001. However, the first draft was introduced in May 1999. Even before then, however, several proposals for an official XML Schema language had already been fed into the W3C's Schema specification process. These proposals reflected a lot of thought about how XML Schema should work and represented a wide variety of viewpoints. Some of these approaches reflected existing schema products, most of which are still in use today.

The roots of XML Schema largely lie in a proposal called XML-Data. This was published as a W3C Note in January 1998 and represented the collaborative effort of individuals from industry (Microsoft, ArborText, DataChannel, and Inso) and academia (the University of Edinburgh). The goal was to create a language for defining and documenting object classes. Thus, there was a clear object-oriented bias in their approach. The scope of XML-Data was actually far broader than just defining schemas for XML documents; the proposal covered both syntactic (such as describing XML document instances) and conceptual schemas (such as describing relationships amongst conceptual parts, such as RDF [Resource Description Framework] would do today or with database schemas). This model also included a type system, the precursor of the types the XML Schema defined.

XML-Data was published before the W3C work began on the XML Information Set, a formal, abstract data set that provides a consistent set of definitions for use in specifications that need to refer to the information in a well-formed XML document. Reading XML-Data (and some of the other proposals) between the lines suggests that issues identified in XML-Data led to development of the Information Set. XML-Data was largely a set of thought experiments, and there were few (if any) implementations of XML-Data processors.

XDR (XML-Data Reduced), published as a W3C Note in July 1998, was a stripped-down version of XML-Data (syntactic schemas only), such that an implementation could be quickly developed and deployed. In late 1998, Microsoft implemented XDR as a production-level tool in its XML parsers and BizTalk framework. This support remains, although XML Schema is now the recommended choice.

Growing out of XML-Data, DCD (Document Content Description) for XML was also published as a W3C Note in July 1998 as a joint submission from Microsoft, IBM, and Textuality (Tim Bray). This was an RDF vocabulary for describing constraints on the structure and content (including data types) of XML documents. This work helped clarify some of the concepts behind XML-Data, but it didn't result in any commercial tools.

DDML (Document Definition Markup Language) was published as a W3C Note in January 1999. This approach focused on the logical structure of documents and didn't address types. This grew out of analysis and reconsideration of ideas in XML-Data, and it was largely a community effort with little formal industry involvement. The approach mostly served as a thought experiment on schema design and had some impact on the design of XML Schema and the alternatives we'll discuss later.

Finally, SOX (Schema for Object-Oriented XML) was published as a W3C Note in July 1999 and came out of work at Commerce One. This approach was strongly object oriented, including concepts such as inheritance, and it also supported an extensible type library. Commerce One products still support SOX but, like XDR, the approach never gained acceptance outside the original vendor. Some of the object-oriented concepts in SOX, however, do appear in XML Schema.

Approaches Outside The W3C

As the XML Schema juggernaut rolled on, some XML experts became leery of the direction it was taking, with particular concern over the following:

• Strong coupling of the specification to the XML Information Set

• Tight coupling between specifications for structure and primitive data types

• Limitation to purely deterministic content models

• Overall complexity of the specification



This illustration shows the main relationships between current and historical schema languages for XML, although there was much cross-fertilization of ideas between the people involved in creating the languages.

By late 1999 it was obvious that these concerns wouldn't be resolved in XML Schema, so the various experts chose instead to tackle the issues by designing their own schema languages. Away from the constraints of the W3C process, they tried quite different approaches. The first was a language called DSD (Document Structure Definition). In contrast to XML Schema, this approach defines a schema as a set of rules (such as if element name = X, then content is "this") rather than declarative patterns. This approach wasn't widely embraced, and work ended sometime in 2002 or 2003.

A more important approach came with Schematron, introduced in late 1999 and developed largely by Rick Jelliffe. Unlike the grammar-based approach of XML Schema (where the schema defines allowed structural patterns), Schematron defines required structures as a set of required tree patterns expressed using XPath expressions. When comfortable with a recursive tree pattern-based way of thinking, this proves to be a powerful approach to schema specification. Indeed, Schematron has become modestly popular and is being adopted as an ISO standard.

Introduced in early 2000, RELAX (Regular Language description for XML) is a grammar-based schema approach Murata Makoto developed. Unlike XML Schema, RELAX doesn't demand deterministic content models and has a much cleaner separation between grammatical rules and data types. RELAX simply incorporates the primitive types defined by XML Schema and doesn't define its own. Importantly, the RELAX design is based on a formal mathematical foundation (called hedge automata theory) that ensures all schemas defined in RELAX are provably valid. This also lets RELAX describe structures that XML Schema (with its deterministic content model constraint) can't.

TREX (Tree Regular Expressions for XML), developed by James Clark, followed shortly on the heels of RELAX. TREX uses simple, easy-to-understand regular expressions to define grammatical constraints/rules. This language is based on an approach taken in XDuce, a programming language developed in early 2000 and designed specifically for processing XML. Roughly put, Clark took the type system in XDuce, expressed it in XML syntax, and added the features necessary for a minimal schema language. This language focused on structure and ignored primitive types. On the structural side, however, TREX was similar to that of RELAX.

In late 2001, TREX and RELAX were merged under the auspices of the ISO to form a new language called RELAX-NG, or RNG for short. This synthesized language retained the benefits of both parents: a simple to understand syntax, clean separation between structural patterns, and primitive types; mathematically rigorous underpinnings; and the ability to model structures that simply aren't possible with XML Schema. Moreover, the simpler underlying model means that element and attribute models are treated almost identically, which greatly simplifies writing and understanding schemas. RNG has a wide following of avid supporters and is relatively well-supported by modern XML Schema tools.

Last but not least is Examplotron. Developed by Eric van der Vlist, it uses XML instance documents to provide examples of the allowed structure. In its purest form, Examplotron is limited to modeling documents that are like the provided example. However, Examplotron also lets designers include RELAX-NG expressions to add more complex rules into an Examplotron schema. Thus, it is often best to think of Examplotron as a short cut to defining simple RNG schemas.

Next month, we will begin to look at some of these other schema approaches in far more detail. Look for it on CPUmag.com.

by Ian Graham


(You can find a document that provides links to online information related to this article at www.utoronto.ca/ian/articles/feb05.)





Want more information about a topic you found of interest while reading this article? Type a word or phrase that identifies the topic and click "Search" to find relevant articles from within our editorial database.

Enter A Subject (key words or a phrase):
ALL Words (‘digital’ AND ‘photography’)
ANY Words (‘digital’ OR ‘photography’)
Exact Match ('digital photography'- all words MUST appear together)



Home      Copyright & Legal Information      Privacy Policy      Site Map      Contact Us
Copyright © 2010 Sandhills Publishing Company U.S.A. All rights reserved.