Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL

1.4. XML Terminology

We have discussed the origins, need, and relevance of XML. Now, let us dig a bit deeper into the XML terminology that we need to be familiar with. The simplest way to do this is to actually take a look at an XML document and then study its various parts. We will use the XML file shown in Figure 1.22 for our discussion.

Figure 1.22. Sample XML document
          <?xml version="1.0"?>
          <?xml-stylesheet type="text/xsl" href="books_list.xsl"?>

          <BOOK pubyear="1929">
                <BOOK_TITLE>Look Homeward, Angel</BOOK_TITLE>
                <AUTHOR>Wolfe, Thomas</AUTHOR>
          <BOOK pubyear="1973">
                 <BOOK_TITLE>Gravity's Rainbow</BOOK_TITLE>
                 <AUTHOR>Pynchon, Thomas</AUTHOR>
          <BOOK pubyear="1977">
                 <BOOK_TITLE>Cards as Weapons</BOOK_TITLE>
                 <AUTHOR>Jay, Ricky</AUTHOR>
          <BOOK pubyear="2001">
                 <BOOK_TITLE>Computer Networks</BOOK_TITLE>
                 <AUTHOR>Tanenbaum, Andrew</AUTHOR>

Every XML file has an extension of .XML. Let us call the above file as books.xml. As we can see, the file seems to contain information organised in a hierarchical manner, with some strange looking symbols. Let us understand this example step-by-step. In the process, we will start getting familiar with the XML syntax and terminology.

Figure 1.23 shows a short pictorial explanation of this XML document. A detailed explanation is provided in Table 1.1.

Figure 1.23. Terminology in XML – High-level overview

Table 1.1. XML example described
Contents of the XML fileDescription
<?xml version="1.0"?>This line identifies that this is an XML document. Every XML document must begin with this line. Note that the text is delimited inside the opening tag <? and the closing tag ?>. We shall soon see that, in general, XML contents are delimited inside the symbol pair < and >. However, some special keywords, including the xml declaration shown here, have a slightly different symbol pair (i.e. <? and ?>). Regardless, there is always an opening symbol, and a closing symbol for every line in an XML file.
<?xml-stylesheet type="text/xsl" href="books_list.xsl"?>Note that this line also comes with the symbol pair <? and ?>. This is a stylesheet declaration, which we shall ignore for the moment. This has no direct relevance to the content of the XML document. We will discuss this concept later in the book. However, for now, we wanted to make the point that apart from the xml declaration, an XML file can also contain other declarations, such as the one shown here.
<BOOKS>This line implicitly indicates the start of the actual contents in the XML file. Note that the word BOOKS is delimited by the symbols < and >. In XML, this whole text (i.e. <BOOKS>) is called as an element or a tag.
 Thus, an element or a tag in XML consists of the following parts:
 <is the start indicator for an element
 BOOKSis the name of the element (BOOKS is just an example)
 >is the end indicator for an element
 Thus, some of the other element names are <BOOK>, <BOOK_TITLE>, and <AUTHOR>.
 Also, the first element in an XML file is called as the root element or the root tag. Thus, <BOOKS> is the root element of this XML file. Quite clearly, every XML file must have exactly one root element.
<BOOK pubyear="1929">We should now be able to realise that this is also an element by the name BOOK. Like the previous element, there is a start indicator (<), followed by an element name (BOOK), followed by some other text (pubyear="1929"), ending with the end indicator (>).
 The other text, i.e., pubyear="1929" is called as an attribute in XML. An attribute serves the purpose of providing more information about an element. For example, here, the attribute informs us that the book being described was published in 1929. Attribute declarations consist of two portions, the attribute name and the attribute value. In this case, we have:
 pubyearas the attribute nameand
 1929as the attribute value 
<BOOK_TITLE>Look Homeward, Angel </BOOK_TITLE>This is another element declaration. The name of the element is BOOK_TITLE, enclosed, as before, inside the start indicator (<) and the end indicator (>). However, this declaration of <BOOK_TITLE> is followed by some other text, namely Look Homeward, Angel</BOOK_TITLE>. What is this text about?
 Look Homeward, Angelis the element value
 </BOOK_TITLE>indicates the end of the element declaration
 Now, this may sound confusing and raises the following issues:
  1. Why did we not have the end of the element declaration for the previous elements (i.e., for <BOOKS> and <BOOK>)? Well, every element in an XML file must have an end element declaration. That is, <BOOKS> and <BOOK> elements also have their corresponding end element declarations. Look for the </BOOKS> and </BOOK> elements in the XML document. The only question then remains is, why do they not immediately follow the element declarations, i.e., why are there a number of other things between <BOOKS> and </BOOKS>, and between <BOOK> and </BOOK>? This is exactly where the point of arranging information in a hierarchical manner comes into picture. That is, we wish to include all our book details inside the <BOOKS> and </BOOKS> tags. Within this, we want each individual book to be described under its own <BOOK> and </BOOK> tags. This is a hierarchy of information, and it can be described by using this technique of including everything under the <BOOKS> and </BOOKS> tags, and an individual book inside the <BOOK> and </BOOK> tags.

  2. Why did the previous element (i.e. <BOOK>) not have an element value, whereas this one has? Well, elements may or may not have an element value. The previous two elements did not have a value, but this one has.

  3. What about attributes? The previous element (i.e. <BOOK>) had an attribute called as pubyear with an attribute value of 1929. Well, like element values, attributes (and therefore, even attribute values) are also optional. The previous element had an attribute, but the current element does not. This is perfectly ok.

<AUTHOR>Wolfe, Thomas </AUTHOR>This element should be clearly understood by us without an explanation. It is simply the second sub-element under the first <BOOK> element. It does have an element value, but does not have an attribute. There is nothing special about this declaration.
</BOOK>This declaration indicates the end of the first <BOOK> element. Thus, whatever follows would not be a part of the <BOOK> element now. Instead, it would be a part of the <BOOKS> element.

Incidentally, what would be a part of the <BOOK> element? Quite clearly, whatever falls within the range of the <BOOK> and </BOOK> elements. That is, in this case, it would consist of the two tags shown below:

<BOOK_TITLE>Look Homeward, Angel</BOOK_TITLE>

<AUTHOR>Wolfe, Thomas</AUTHOR>
Remaining tagsWe will not describe the remaining tags and elements since they are quite similar to what we have discussed here.

As we can notice, some of the key terms that have been introduced here are: XML tag, element (composed of element name and element value), attribute (composed of attribute name and attribute value), and root element. Some of the other terms are start element indicator and end element indicator. Let us now understand their meanings.

At this stage, we should be familiar with the basic XML terminology. In case we are not, it is suggested that we reread the example and its description until it is clear. This is because the rest of the discussion will assume that we have a good understanding of these terms.

We will now have some exercises to refresh what we have learnt so far.

Exercise 1: Create an XML document template to describe the result of students in an examination. The description should include the student’s roll number, name, three subject names and marks, total marks, percentage, and result.

Solution 1 (a): This can be done in more than one ways. Following is one such possible way.

   <?xml version="1.0">
      <roll_number> … </roll_number>
      <student_name> … </student_name>
             <subject_1_name> … </subject_1_name>
             <subject_1_marks> … </subject_1_marks>
             <subject_2_name> … </subject_2_name>
             <subject_2_marks> … </subject_2_marks>
             <subject_3_name> … </subject_3_name>
             <subject_3_marks> … </subject_3_marks>
      <total_marks> … </total_marks>
      <percentage> … </percentage>
      <result> … </result>

Note that Solution 1(a) provides an elegant way of providing a template (i.e., structure) for constructing an XML message to store examination results. This could have been done in another manner, as shown in Solution 1(b).

Solution 1 (b): This solution offers another way to describe the XML message for examination results. It does not break down the hierarchy to the lowest possible level. That is, the information about subjects and the marks therein are at the same level, which is not a great approach.

   <?xml version="1.0">
      <roll_number> … </roll_number>
      <student_name> … </student_name>
      <subject_1_name> … </subject_1_name>
      <subject_1_marks> … </subject_1_marks>
      <subject_2_name> … </subject_2_name>
      <subject_2_marks> … </subject_2_marks>
      <subject_3_name> … </subject_3_name>
      <subject_3_marks> … </subject_3_marks>
      <total_marks> … </total_marks>
      <percentage> … </percentage>
      <result> … </result>

Notice that we have got rid of the elements that start and end the description of a particular subject, i.e. tags such as <subject_1> and </subject_1>, etc. It is generally not advisable.

Let us now have an exercise to recap the XML terminologies that we had studied earlier.

Exercise 2: With reference to Solution 1 (a), describe the various XML terms found there.

Solution 2: The XML terminology with reference to Solution 1 (a) is as follows.

Sr NoXML termExample
1XML document indicator<?xml version="1.0">
2Root element<exam_result>
3Element<roll_number> … </roll_number>
4Element name<roll_number>
5Element end indicator</roll_number>

Note that our example does not have any attributes.

To understand better the concepts learned so far, let us consider a few more XML examples as shown in the following exercises.

Exercise 3: Suppose we want to store information regarding employees in the following format in XML. Show such a file with one example:

   Employee ID           Numeric             5 positions
   Employee Name         Alphanumeric        30 positions
   Employee Department   Alphanumeric        2 positions
   Role                  Alphanumeric        20 positions
   Manager               Alphanumeric        30 positions

Solution 3:

   <?xml version="1.0"?>
            <EMP_NAME>Atul Kahate</EMP_ID>
            <ROLE>Project Manager</ROLE>
            <MANAGER>S Ketharaman</MANAGER>

Exercise 4: Suppose our banking application allows the user to perform an online funds transfer. This application generates an XML message, which needs to be sent to the database for actual updates. Create such a sample message, containing the following details:

   Transaction reference number    Numeric    10 positions
   From account                    Numeric    12 positions
   To account                      Numeric    12 positions
   Amount                          Numeric    5 positions (No
                                              fractions are allowed)
   Date and time                   Numeric    Timestamp field

Solution 4:

   <?xml version="1.0"?>

As we can see, XML can be used in a variety of situations to represent any kind of data. It need not be restricted to a particular domain, technology, or application. It can be used universally.

We will study a lot more about the various aspects of XML and its terminologies later.

  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint