Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint
Share this Page URL
Help

Introduction

Introduction

Welcome to Beginning XML, Fourth Edition, the book I wish I'd had when I was first learning the language!

When we wrote the first edition of this book, XML was a relatively new language but already gaining ground fast and becoming more and more widely used in a vast range of applications. By the time we started the second edition, XML had already proven itself to be more than a passing fad, and was in fact being used throughout the industry for an incredibly wide range of uses. As we began the third edition, it was clear that XML was a mature technology, but more important, it became evident that the XML landscape was dividing into several areas of expertise. In this edition, we needed to categorize the increasing number of specifications surrounding XML, which either use XML or provide functionality in addition to the XML core specification.

So what is XML? It's a markup language, used to describe the structure of data in meaningful ways. Anywhere that data is input/output, stored, or transmitted from one place to another, is a potential fit for XML's capabilities. Perhaps the most well-known applications are web-related (especially with the latest developments in handheld web access—for which some of the technology is XML-based). However, there are many other non-web-based applications for which XML is useful—for example, as a replacement for (or to complement) traditional databases, or for the transfer of financial information between businesses. News organizations, along with individuals, have also been using XML to distribute syndicated news stories and blog entries.

This book aims to teach you all you need to know about XML—what it is, how it works, what technologies surround it, and how it can best be used in a variety of situations, from simple data transfer to using XML in your web pages. It answers the fundamental questions:

  • What is XML?

  • How do you use XML?

  • How does it work?

  • What can you use it for, anyway?

0.1. Who Is This Book For?

This book is for people who know that it would be a pretty good idea to learn XML but aren't 100 percent sure why. You've heard the hype but haven't seen enough substance to figure out what XML is and what it can do. You may be using development tools that try to hide the XML behind user interfaces and scripts, but you want to know what is really happening behind the scenes. You may already be somehow involved in web development and probably even know the basics of HTML, although neither of these qualifications is absolutely necessary for this book.

What you don't need is knowledge of markup languages in general. This book assumes that you're new to the concept of markup languages, and we have structured it in a way that should make sense to the beginner and yet quickly bring you to XML expert status.

The word "Beginning" in the title refers to the style of the book, rather than the reader's experience level. There are two types of beginner for whom this book is ideal:

  • Programmers who are already familiar with some web programming or data exchange techniques. Programmers in this category will already understand some of the concepts discussed here, but you will learn how you can incorporate XML technologies to enhance those solutions you currently develop.

  • Those working in a programming environment but with no substantial knowledge or experience of web development or data exchange applications. In addition to learning how XML technologies can be applied to such applications, you will be introduced to some new concepts to help you understand how such systems work.

0.2. How This Book Is Organized

We've arranged the subjects covered in this book to take you from novice to expert in as logical a manner as we could. In this Fourth Edition, we have structured the book in sections that are based on various areas of XML expertise. Unless you are already using XML, you should start by reading the introduction to XML in Part I. From there, you can quickly jump into specific areas of expertise, or, if you prefer, you can read through the book in order. Keep in mind that there is quite a lot of overlap in XML, and that some of the sections make use of techniques described elsewhere in the book.

  • We begin by explaining what exactly XML is and why the industry felt that a language like this was needed.

  • After covering the why, the next logical step is the how, so we show you how to create well-formed XML.

  • Once you understand the whys and hows of XML, you'll go on to some more advanced things you can do when creating your XML documents, to make them not only well formed, but valid. (And you'll learn what "valid" really means.)

  • After you're comfortable with XML and have seen it in action, we unleash the programmer within and look at an XML-based programming language that you can use to transform XML documents from one format to another.

  • Eventually, you will need to store and retrieve XML information from databases. At this point, you will learn not only the state of the art for XML and databases, but also how to query XML information using an SQL-like syntax called XQuery.

  • XML wouldn't really be useful unless you could write programs to read the data in XML documents and create new XML documents, so we'll get back to programming and look at a couple of ways that you can do that.

  • Understanding how to program and use XML within your own business is one thing, but sending that information to a business partner or publishing it to the Internet is another. You'll learn about technologies that use XML that enable you to send messages across the Internet, publish information, and discover services that provide information.

  • Since you have all of this data in XML format, it would be great if you could easily display it to people, and it turns out you can. We'll show you an XML version of HTML called XHTML. You'll also look at a technology you may already be using in conjunction with HTML documents called CSS. CSS enables you to add visual styles to your XML documents. In addition, you'll learn how to design stunning graphics and make interactive forms using XML.

  • Finally, we end with a case study, which should help to give you ideas about how XML can be used in real-life situations, and which could be used in your own applications.

0.3. What's Covered in This Book

This book builds on the strengths of the earlier editions, and provides new material to reflect the changes in the XML landscape—notably XQuery, RSS and Atom, and AJAX. Updates have been made to reflect the most recent versions of specifications and best practices throughout the book. In addition to the many changes, each chapter has a set of exercise questions to test your understanding of the material. Possible solutions to these questions appear in Appendix A.

0.3.1. Part I: Introduction

The introduction is where most readers should begin. The first three chapters introduce some of the goals of XML as well as the specific rules for constructing XML. Once you have read this part you should be able to read and create your own XML documents.

0.3.1.1. Chapter 1: What Is XML?

Here we cover some basic concepts, introducing the fact that XML is a markup language (a bit like HTML) whereby you can define your own elements, tags, and attributes (known as a vocabulary). You'll see that tags have no presentation meaning—they're just a way to describe the structure of the data.

0.3.1.2. Chapter 2: Well-Formed XML

In addition to explaining what well-formed XML is, we offer a look at the rules that exist (the XML 1.0 and 1.1 Recommendations) for naming and structuring elements—you need to comply with these rules in order to produce well-formed XML.

0.3.1.3. Chapter 3: XML Namespaces

Because tags can be made up, you need to avoid name conflicts when sharing documents. Namespaces provide a way to uniquely identify a group of tags, using a URI. This chapter explains how to use namespaces.

0.3.2. Part II: Validation

In addition to the well-formedness rules you learn in Part I, you will most likely want to learn how to create and use different XML vocabularies. This Part introduces you to DTDs, XML Schemas, and RELAX NG: three languages that define custom XML vocabularies. We also show you how to utilize these definitions to validate your XML documents.

0.3.2.1. Chapter 4: Document Type Definitions

You can specify how an XML document should be structured, and even provide default values, using Document Type Definitions (DTDs). If XML conforms to the associated DTD, it is known as valid XML. This chapter covers the basics of using DTDs.

0.3.2.2. Chapter 5: XML Schemas

XML Schemas, like DTDs, enable you to define how a document should be structured. In addition to defining document structure, they enable you to specify the individual datatypes of attribute values and element content. They are a more powerful alternative to DTDs.

0.3.2.3. Chapter 6: RELAX NG

RELAX NG is a third technology used to define the structure of documents. In addition to a new syntax and new features, it takes the best from XML Schemas and DTDs, and is therefore very simple and very powerful. RELAX NG has two syntaxes; both the full syntax and compact syntax are discussed.

0.3.3. Part III: Processing

In addition to defining and creating XML documents, you need to know how to work with documents to extract information and convert it to other formats. In fact, easily extracting information and converting it to other formats is what makes XML so powerful.

0.3.3.1. Chapter 7: XPath

The XPath language is used to locate sections and data in the XML document, and it's important in many other XML technologies.

0.3.3.2. Chapter 8: XSLT

XML can be transformed into other XML documents, HTML, and other formats using XSLT stylesheets, which are introduced in this chapter.

0.3.4. Part IV: Databases

Creating and processing XML documents is good, but eventually you will want to store those documents. This section describes strategies for storing and retrieving XML documents and document fragments from different databases.

0.3.4.1. Chapter 9: XQuery, the XML Query Language

Very often, you will need to retrieve information from within a database. XQuery, which is built on XPath and XPath2, enables you to do this in an elegant way.

0.3.4.2. Chapter 10: XML and Databases

XML is perfect for structuring data, and some traditional databases are beginning to offer support for XML. This chapter discusses these, and provides a general overview of how XML can be used in an n-tier architecture. In addition, new databases based on XML are introduced.

0.3.5. Part V: Programming

At some point in your XML career, you will need to work with an XML document from within a custom application. The two most popular methodologies, the Document Object Model (DOM) and the Simple API for XML (SAX), are explained in this part.

0.3.5.1. Chapter 11: The Document Object Model (DOM)

Programmers can use a variety of programming languages to manipulate XML using the Document Object Model's objects, interfaces, methods, and properties, which are described in this chapter.

0.3.5.2. Chapter 12: Simple API for XML (SAX)

An alternative to the DOM for programmatically manipulating XML data is to use the Simple API for XML (SAX) as an interface. This chapter shows how to use SAX and utilizes examples from the Java SAX API.

0.3.6. Part VI: Communication

Sending and receiving data from one computer to another is often difficult, but several technologies have been created to make communication with XML much easier. In this part we discuss RSS and content syndication, as well as web services and SOAP. This edition includes a new chapter on Ajax techniques.

0.3.6.1. Chapter 13: RSS, Atom, and Content Syndication

RSS is an actively evolving technology that is used to publish syndicated news stories and website summaries on the Internet. This chapter not only discusses how to use the different versions of RSS and Atom, it also covers the future direction of the technology. In addition, we demonstrate how to create a simple newsreader application that works with any of the currently published versions.

0.3.6.2. Chapter 14: Web Services

Web services enable you to perform cross-computer communications. This chapter describes web services and introduces you to using remote procedure calls in XML (using XML-RPC and REST), as well as giving you a brief look at major topics such as SOAP. Finally, it breaks down the assortment of specifications designed to work in conjunction with web services.

0.3.6.3. Chapter 15: SOAP and WSDL

Fundamental to XML web services, the Simple Object Access Protocol (SOAP) is one of the most popular specifications for allowing cross-computer communications. Using SOAP, you can package up XML documents and send them across the Internet to be processed. This chapter explains SOAP and the Web Services Description Language (WSDL) that is used to publish your service.

0.3.6.4. Chapter 16: Ajax

Ajax enables you to utilize JavaScript with web services and SOAP, or REST communications. Additionally, Ajax patterns can be used within web pages to communicate with the web server without refreshing. This chapter is new to the Fourth Edition.

0.3.7. Part VII: Display

Several XML technologies are devoted to displaying the data stored inside of an XML document. Some of these technologies are web-based, and some are designed for applications and mobile devices. In this part we discuss the primary display strategies and formats used today.

0.3.7.1. Chapter 17: Cascading Style Sheets (CSS)

Website designers have long been using Cascading Style Sheets (CSS) with their HTML to easily make changes to a website's presentation without having to touch the underlying HTML documents. This power is also available for XML, enabling you to display XML documents right in the browser. Or, if you need a bit more flexibility with your presentation, you can use XSLT to transform your XML to HTML or XHTML and then use CSS to style these documents.

0.3.7.2. Chapter 18: XHTML

XHTML is a new version of HTML that follows the rules of XML. In this chapter we discuss the differences between HTML and XHTML, and show you how XHTML can help make your sites available to a wider variety of browsers, from legacy browsers to the latest browsers on mobile phones.

0.3.7.3. Chapter 19: Scalable Vector Graphics (SVG)

Do you want to produce a custom graphic using XML? SVG enables you to describe a graphic using XML-based vector commands. In this chapter we teach you the basics of SVG and then dive into a more complex SVG-based application that can be published to the Internet.

0.3.7.4. Chapter 20: XForms

XForms are XML-based forms that can be used to design desktop applications, paper-based forms, and of course XHTML-based forms. In this chapter we demonstrate both the basics and some of the more interesting uses of XForms.

0.3.8. Part VIII: Case Study

Throughout the book you'll gain an understanding of how XML is used in web, business-to-business (B2B), data storage, and many other applications. The case study covers an example application and shows how the theory can be put into practice in real-life situations. The case study is new to this edition.

0.3.8.1. Chapter 21: Case Study: Payment Calculator

This case study explores some of the possibilities and strategies for using XML in your website. It includes an example that demonstrates a loan payment calculator by creating a web page using XHTML and CSS, communicating with a local web service using AJAX, utilizing an XML Schema to build data structures in .NET, and ultimately using the Document Object Model to display the results in SVG. An online version of this case study on the book's website covers the same material using Ruby on Rails instead of .NET.

0.3.9. Appendixes

Appendix A provides answers to the exercise questions that appear throughout the book. The remaining appendixes provide reference material that you may find useful as you begin to apply the knowledge gained throughout the book in your own applications.

The appendixes consist of the following:

Appendixes A, B, and C are included within the book; Appendixes DG are available on the book's website.

0.4. What You Need to Use This Book

Because XML is a text-based technology, all you really need to create XML documents is Notepad or an equivalent text editor. However, to truly appreciate some of these samples in action, you might want to have a current Internet browser that can natively read XML documents, and even provide error messages if something is wrong. In any case, screenshots are provided throughout the book so that you can see what things should look like. Additionally, note the following:

  • If you do have Internet Explorer, you also have an implementation of the DOM, which you may find useful in the chapters on that subject.

  • Some of the examples and the case studies require access to a web server, such as Microsoft's IIS (or PWS) or Apache.

  • Throughout the book, other (freely available) XML tools are used, and we give instructions for obtaining these.

Within the validation section of the book we provide instructions on how to use Codeplot (http://codeplot.com). Codeplot is an online collaborative code editor with support for a wide assortment of XML technologies. Because many validation tools require programming experience or large downloads, the examples in this section instead use Codeplot. Codeplot can also be used to check the well-formedness of your XML documents, to transform XML documents using XSLT, and to assist you in coding XHTML, CSS, and SVG. The editor is free and was built using many of the techniques described in this book.

0.4.1. Programming Languages

We have tried to demonstrate the ubiquity of XML throughout the book. Some of the examples are specific to Windows, but most of the examples include information on working with other platforms, such as Linux. Many of the samples were rewritten in this edition to enable you to use any operating system or web browser.

Additionally, we have attempted to show the use of XML in a variety of programming languages, including Java, JavaScript, PHP, Python, Visual Basic, ASP, C#, and Ruby on Rails. Therefore, while there is a good chance that you will see an example written in your favorite programming language, there is also a good chance you will encounter an example in a language you have never used. Whenever a new language is introduced, we include information on downloading and installing the necessary tools to use it. Because our focus is XML, regardless of which programming language is used in an example, the core XML concept is explained in detail.

0.5. Conventions

To help you get the most from the text and keep track of what's happening, we've used several conventions throughout the book.

Try It Out

The Try It Out is an exercise you should work through, following the text in the book.

  1. They usually consist of a set of steps.

  2. Each step has a number.

  3. Follow the steps with your copy of the database.


0.5.1. How It Works

After each Try It Out, the code is explained in detail.

NOTE

Boxes like this one hold important, not-to-be forgotten information that is directly relevant to the surrounding text.

Tips, hints, tricks, and asides to the current discussion are offset and placed in italics like this.

As for styles in the text:

  • We highlight new terms and important words when we introduce them.

  • We show filenames, URLs, and code within the text like so: persistence.properties.

  • We present code in two different ways:

    In code examples we highlight new and important code with a gray background.
    The gray highlighting is not used for code that's less important in the present context, or has been shown before.
    
    					  

0.6. Source Code

As you work through the examples in this book, you may choose either to type in all the code manually or to use the source code files that accompany the book. All of the source code used in this book is available for download at www.wrox.com. Once at the site, simply locate the book's title (either by using the Search box or by using one of the title lists) and click the Download Code link on the book's detail page to obtain all the source code for the book.

Because many books have similar titles, you may find it easiest to search by ISBN; this book's ISBN is SB 978-0-470-11487-2

Once you download the code, just decompress it with your favorite compression tool. Alternately, you can go to the main Wrox code download page at www.wrox.com/dynamic/books/download.aspx to see the code available for this book and all other Wrox books.

0.7. Errata

We make every effort to ensure that there are no errors in the text or in the code. However, no one is perfect, and mistakes do occur. If you find an error in one of our books, such as a spelling mistake or a faulty piece of code, we would be very grateful for your feedback. By sending in errata you may save another reader hours of frustration, and at the same time you will be helping us provide even higher quality information.

To find the errata page for this book, go to www.wrox.com and locate the title using the Search box or one of the title lists. Then, on the book details page, click the Book Errata link. On this page you can view all errata that has been submitted for this book and posted by Wrox editors. A complete book list, including links to each book's errata, is also available at www.wrox.com/misc-pages/booklist.shtml.

If you don't spot "your" error on the Book Errata page, go to www.wrox.com/contact/techsupport.shtml and complete the form there to send us the error you have found. We'll check the information and, if appropriate, post a message to the book's errata page and fix the problem in subsequent editions of the book.

0.8. p2p.wrox.com

For author and peer discussion, join the P2P forums at p2p.wrox.com. The forums are a web-based system for you to post messages relating to Wrox books and related technologies and interact with other readers and technology users. The forums offer a subscription feature to e-mail you topics of interest of your choosing when new posts are made to the forums. Wrox authors, editors, other industry experts, and your fellow readers are present on these forums.

At http://p2p.wrox.com you will find a number of different forums that will help you not only as you read this book, but also as you develop your own applications. To join the forums, just follow these steps:

  1. Go to p2p.wrox.com and click the Register link.

  2. Read the terms of use and click Agree.

  3. Complete the required information to join as well as any optional information you wish to provide and click Submit.

  4. You will receive an e-mail with information describing how to verify your account and complete the joining process.

You can read messages in the forums without joining P2P, but in order to post your own messages, you must join.

Once you join, you can post new messages and respond to messages other users post. You can read messages at any time on the Web. If you would like to have new messages from a particular forum e-mailed to you, click the Subscribe to this Forum icon by the forum name in the forum listing.

For more information about how to use the Wrox P2P, be sure to read the P2P FAQs for answers to questions about how the forum software works as well as many common questions specific to P2P and Wrox books. To read the FAQs, click the FAQ link on any P2P page.