Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint
Share this Page URL
Help

Chapter 4. Just Enough RDF > RDF Serialization Formats

4.3. RDF Serialization Formats

While the data model that RDF uses is very simple, the serialized representation tends to get complicated when an RDF graph is saved to a file or sent over a network because of the various methods used to compact the data while still leaving it readable. These compaction mechanisms generally take the form of shortcuts that identify multiple references to a graph node using a shared but complex structure.

The good news is that you really don’t have to worry about the complexities of the serialization formats, as there are open source RDF libraries for just about every modern programming language that handle them for you. (We believe the corollary is also true: if you are thinking about the serialization format and you aren’t in the business of writing an RDF library, then you should probably find a better library.) Because of this we won’t go into too much detail, but it is important to understand the basics so you can use the most appropriate format and debug the output.

We’ll be covering four serialization formats here: N-Triples, the simplest of notations; N3, a compaction of the N-Triple format; RDF/XML, one of the most frequently used serialization formats; and finally, “RDF in attributes” (known as RDFa), which can be embedded in other serialization formats such as XHTML.

NOTE

There are many books and online resources that cover these output formats in great detail. If you are interested in reading further about them, you can look at the complete RDF specification at http://www.w3.org/RDF/ or in the O’Reilly book Practical RDF by Shelley Powers.

4.3.1. A Graph of Friends

In order to compare the different serialization formats, let’s first build a simple graph that we can use throughout the examples to observe how the various serializations fold relationships together.

For this example graph, we’ll model a small part of Toby’s social sphere—in particular, how he knows the other authors of this book. In our graph we will not only include information about the people Toby knows, but we’ll also describe other relationships that can be used to uniquely identify Toby. This will include things like the home page of his blog, his email address, his interests, and any other names he might use.

These clues about Toby’s identity are important to help differentiate “our Toby” from the numerous other Tobys in the world. Human names are hardly unique, but by providing a collection of attributes about the person, hopefully we can pinpoint the individual of interest and obtain a strong identifier (URI) that we can use for future interaction.

As you might have discerned, the network of social relationships that people have with one another naturally lends itself to a graphical representation. So it is probably no surprise that machine-readable graphs of friends have coevolved with RDF, making social graphs one of the most widely available RDF datasets on the public Internet. Over time, the relationships expressed in these social graphs have settled into a collection of well-known predicates, forming a vocabulary of expression known as “Friend of a Friend” or simply FOAF.

Not surprisingly, the core FOAF vocabulary—the set of predicates—has been adopted and extended to describe a number of common “things” available on the Internet. For instance, FOAF provides a predicate for identifying photographs that portray the subject of the statement. While FOAF deals primarily with people, the formal definition of FOAF states that the foaf:depiction predicate can be used for graphics portraying any resource (or “thing”) in the world.

Figure 4-4 represents the small slice of Toby’s social world that we will concern ourselves with over the next few examples. With this graph in mind, let’s look at how this knowledge can be represented using different notations.

Figure 4-4. Toby’s FOAF graph


4.3.2. N-Triples

N-Triple notation is a very simple but verbose serialization, similar to what we have been using in our triple data files up to this point. Because of their simplicity, N-Triples were used by the W3C Core Working Group to unambiguously express various RDF test-case data models while developing the updated RDF specification. This simplicity also makes the N-Triple format useful when hand-crafting datasets for application testing and debugging.

Each line of output in N-Triple format represents a single statement containing a subject, predicate, and object followed by a dot. Except for blank nodes and literals, subjects, predicates, and objects are expressed as absolute URIs enclosed in angle brackets. Subjects and objects representing anonymous nodes are represented as _:name, where name is an alphanumeric node name that starts with a letter. Object literals are double-quoted strings that use the backslash to escape double-quotes, tabs, newlines, and the backslash character itself. String literals in N-Triple notation can optionally specify their language when followed by @lang, where lang is an ISO 639 language code. Literals can also provide information about their datatype when followed by ^^type, where type is commonly an XSD (XML Schema Definition) datatype.

The extension .nt is typically used when N-Triples are stored in a file, and when they are transmitted over HTTP, the mime type text/plain is used. The official N-Triple format is documented at http://www.w3.org/TR/rdf-testcases/#ntriples.

Our FOAF graph (as shown in Figure 4-4) can be represented in N-Triple format as:

<http://kiwitobes.com/toby.rdf#ts> <http://xmlns.com/foaf/0.1/homepage> 
    <http://kiwitobes.com/>.
<http://kiwitobes.com/toby.rdf#ts> <http://xmlns.com/foaf/0.1/nick> "kiwitobes".
<http://kiwitobes.com/toby.rdf#ts> <http://xmlns.com/foaf/0.1/name> "Toby Segaran".
<http://kiwitobes.com/toby.rdf#ts> <http://xmlns.com/foaf/0.1/mbox> 
    <mailto:toby@segaran.com>.
<http://kiwitobes.com/toby.rdf#ts> <http://xmlns.com/foaf/0.1/interest> 
    <http://semprog.com>.
<http://kiwitobes.com/toby.rdf#ts> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
    <http://xmlns.com/foaf/0.1/Person>.

<http://kiwitobes.com/toby.rdf#ts> <http://xmlns.com/foaf/0.1/knows> _:jamie .
<http://kiwitobes.com/toby.rdf#ts> <http://xmlns.com/foaf/0.1/knows> 
    <http://semprog.com/people/colin>.

_:jamie <http://xmlns.com/foaf/0.1/name> "Jamie Taylor".
_:jamie <http://xmlns.com/foaf/0.1/mbox> <mailto:jamie@semprog.com>.
_:jamie <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
    <http://xmlns.com/foaf/0.1/Person>.

<http://semprog.com/people/colin> <http://xmlns.com/foaf/0.1/name> "Colin Evans".
<http://semprog.com/people/colin> <http://xmlns.com/foaf/0.1/mbox> 
    <mailto:colin@semprog.com>.
<http://semprog.com/people/colin> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
    <http://xmlns.com/foaf/0.1/Person>.

<http://semprog.com> <http://www.w3.org/2000/01/rdf-schema#label> 
    "Semantic Programming".
<http://semprog.com> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
    <http://xmlns.com/foaf/0.1/Document>.

					  

4.3.3. N3

While N-Triples are conceptually very simple, you may have noticed a lot of repetition in the output. The redundant information takes additional time to transmit and parse. While it’s not a problem when working with small amounts of data, the additional information becomes a liability when working with large amounts of data. By adding a few additional structures, N3 condenses much of the repetition in the N-Triple format.

In an RDF graph, every connection between nodes represents a triple. Since each node may participate in a large number of relationships, we could significantly reduce the number of characters used in N-Triples if we used a short symbol to represent repeated nodes. We could go further, recognizing that many of the URIs used in a specific model frequently come from related URIs. In much the same way that XML provides a namespace mechanism for generating short Qualified Name (qnames) for nodes, N3 allows us to define a URI prefix and identify entity URIs relative to a set of prefixes declared at the beginning of the document. The statement:

@prefix semperp: <http://semprog.com/people/>.

allows us to shorten the absolute URI for Colin from <http://semprog.com/people/colin> to semperp:colin.

Since each node in an RDF graph is a potential subject about which we may have many things to say, it is not uncommon to see the same subject repeat many (many) times in N-Triple output. N3 reduces this repetition by allowing you to combine multiple statements about the same subject by using a semicolon (;) after the first statement, so you only need to state the predicate and object for other statements using the same subject. The following statement says that Colin knows Toby and that Colin’s email address is colin@semprog.com (note how semperp:colin, the subject, is only stated once):

semperp:colin foaf:knows <http://kiwitobes.com/toby.rdf#ts>;  
    foaf:mbox "colin@semprog.com".

N3 also provides a shortcut that allows you to express a group of statements that share a common anonymous subject (blank node) without having to specify an internal name for the blank node. As discussed earlier, mailing addresses are frequently modeled with a blank node to hold all the components of the address together. W3C has defined a vocabulary for representing the data elements of the vCard interchange format that includes predicates for modeling street addresses. For instance, to specify the address of O’Reilly, you could write:

[ <http://www.w3.org/2006/vcard/ns#street-address> "1005 Gravenstein Hwy North" ; 
       <http://www.w3.org/2006/vcard/ns#locality> "Sebastopol, California"
].

					  

Because it is important to explicitly state that an entity is of a certain type, N3 allows you to use the letter a as a predicate to represent the RDF “type” relationship represented by the URI <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>.

Another predicate for which N3 provides a shortcut is <http://www.w3.org/2002/07/owl#sameAs>. OWL (Web Ontology Language) is a vocabulary for defining precise relationships between model elements. We will have more to say about OWL in Chapter 6, but even when models don’t use the precision of OWL, you will frequently see the owl:sameAs predicate to express that two URIs refer to the same entity. The sameAs predicate is used so frequently that the authors of N3 designated the symbol = as shorthand for it.

Because N-Triples are a subset of N3, any library capable of reading N3 will also read N-Triples. The FOAF graph (Figure 4-4) in N3 would read:

@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix semperp: <http://semprog.com/people/>.
@prefix tobes: <http://kiwitobes.com/toby.rdf#>.

 tobes:ts a foaf:Person;
     foaf:homepage <http://kiwitobes.com/>;
     foaf:interest <http://semprog.com>;
     foaf:knows semperp:colin,
         [ a foaf:Person;
             foaf:mbox <mailto:jamie@semprog.com>;
             foaf:name "Jamie Taylor"];
     foaf:mbox <mailto:toby@segaran.com>;
     foaf:name "Toby Segaran";
     foaf:nick "kiwitobes". 

 <http://semprog.com> a foaf:Document;
     rdfs:label "Semantic Programming". 

 semperp:colin a foaf:Person;
     foaf:mbox <mailto:colin@semprog.com>;
     foaf:name "Colin Evans". 

4.3.4. RDF/XML

The original W3C Recommendation on RDF covered both a description of RDF as a data model and XML as an expression of RDF models. Because of this, people sometimes refer to RDF/XML as RDF, but it is important to recognize that it is just one possible representation of an RDF graph. RDF/XML is sometimes criticized for being difficult to read due to all the abbreviated structures it provides; still, it is one of the most frequently used formats, so it’s useful to have some familiarity with its layout.

Conceptually, RDF/XML is built up from a series of smaller descriptions, each of which traces a path through an RDF graph. These paths are described in terms of the nodes (subjects) and the links (predicates) that connect them to other nodes (objects). This sequence of “node, link, node” is repeatable, forming a “striped” structure (think of a candy cane, with nodes being red stripes and predicates being white stripes). Since each node encountered in these descriptions has a strong identifier, it is possible to weave the smaller descriptions together to learn the larger RDF graph structure. See Figure 4-5.

Figure 4-5. A stripe from Toby’s FOAF graph


If there is more than one path described in an RDF/XML document, all the descriptions must be children of a single RDF element; if there is only one path described, the rdf:RDF element may be omitted. As with other XML documents, the top-level element is frequently used to define other XML namespaces used throughout the document:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"/>

Paths are always described starting with a graph node, using an rdf:Description element. The URI reference for the node can be specified in the description element with an rdf:about attribute. For blank nodes, a local identifier (valid only within the context of the current document) can be specified using an rdf:NodeID attribute. Predicate links are specified as child elements of the rdf:Description node, which will have their own children representing graph nodes. The simple stripe representing Colin as a friend of Toby (Figure 4-5) would look like:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:foaf="http://xmlns.com/foaf/0.1/">

  <rdf:Description rdf:About="http://kiwitobes.com/toby.rdf#ts>
    <foaf:knows>
      <rdf:Description rdf:About="http://semprog.com/people/colin">
         <foaf:name>Colin Evans</foaf:name>
      </rdf:Description>
    </foaf:knows>
  </rdf:Description>

</rdf:RDF>

Literal objects can be specified as the text of an element, or as an attribute on the rdf:Description element. Let’s expand the example, adding more information about Colin and about another friend of Toby’s:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:foaf="http://xmlns.com/foaf/0.1/">

  <rdf:Description rdf:about="http://kiwitobes.com/toby.rdf#ts>
    <foaf:knows>
      <rdf:Description rdf:about="http://semprog.com/people/colin">
        <foaf:name>Colin Evans</foaf:name>
        <foaf:mbox>colin@semprog.com</foaf:mbox>
      </rdf:Description>
    </foaf:knows>

    <foaf:knows>
      <rdf:Description foaf:mbox="jamie@semprog.com"/>
    </foaf:knows>
  </rdf:Description>

</rdf:RDF>

While this is a perfectly reasonable description of Toby’s relationship to Colin and Jamie, we are still missing the rdf:type information that states that Toby, Colin, and Jamie are people. As in the other RDF serializations we have looked at, RDF/XML provides a shortcut for this very common statement, allowing you to replace the rdf:Description element with an element representing the rdf:type for the node. Thus the sequence of elements:

<rdf:Description rdf:about="http://www.kiwitobes.com/toby.rdf#ts><rdf:type>
    <foaf:Person>

is compacted into a single rdf:Description element of the form:

<foaf:Person rdf:about="http://kiwitobes.com/toby.rdf#ts">

The FOAF graph we represented in N-Triples and N3 can now be represented in RDF/XML as:

<rdf:RDF
  xmlns:foaf='http://xmlns.com/foaf/0.1/'
  xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
  xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#'>

  <foaf:Person rdf:about="http://kiwitobes.com/toby.rdf#ts">
    <foaf:name>Toby Segaran</foaf:name>
    <foaf:homepage rdf:resource="http://kiwitobes.com/"/>
    <foaf:nick>kiwitobes</foaf:nick>
    <foaf:mbox rdf:resource="mailto:toby@segaran.com"/>

    <foaf:interest>
      <foaf:Document rdf:about="http://semprog.com">
        <rdfs:label>Semantic Programming</rdfs:label>
      </foaf:Document>
    </foaf:interest>

    <foaf:knows>
      <foaf:Person rdf:about="http://semprog.com/people/colin">
        <foaf:name>Colin Evans</foaf:name>
        <foaf:mbox rdf:resource="mailto:colin@semprog.com"/>
      </foaf:Person>

    </foaf:knows>
    <foaf:knows>
      <foaf:Person>
        <foaf:name>Jamie Taylor</foaf:name>
        <foaf:mbox rdf:resource="mailto:jamie@semprog.com"/>
      </foaf:Person>
    </foaf:knows>
    
  </foaf:Person>
</rdf:RDF>

					  

These aren’t the only abbreviated structures RDF/XML provides, but this should be enough to let you read most RDF/XML files.

4.3.5. RDFa

RDFa isn’t a pure serialization format for RDF, but rather a way of annotating XHTML web pages with RDF data. The idea behind RDFa is that you only have to publish your content once, mixing the human-readable and machine-readable content together. This is a similar philosophy to that of Microformats, a simpler, more ad-hoc approach to adding rich semantic annotations to XHTML content.

RDFa uses a small set of XML attributes that are added to existing XHTML content tags in order to specify the semantics behind the information that is displayed. These attributes make the semantic meaning of existing XHTML content clear. The basic processing model is that the subject of a triple is the subject URI identified in a higher-level XHTML element in the DOM tree, and the predicate and object of a statement are lower down on the tree, children of the subject.

Instead of using URIs to describe subjects, predicates, and objects, many RDFa attributes use Compact URIs (or CURIEs) to reduce the amount of markup. CURIEs work just like XML Qualified Names (in fact, QNames are a subset of CURIEs), so everything you know about XML QNames (such as that foaf:nick actually means http://xmlns.com/foaf/0.1/nick) applies to CURIEs. But CURIEs are a bit more accommodating in what the localpart of the prefix:localpart expression can contain.

QName construction forbids slashes (/) in the localpart, thus requiring a separate XML namespace declaration for every QName using a different part of the path hierarchy. CURIEs relax this constraint, allowing statements like example:cow and example:places/barn to use one xmlns declaration—like http://example.org/farm/to generate the full URIs http://example.org/farm/cow and http://example.org/farm/places/barn, respectively.

CUIREs also allow for localparts that start with a number. This means that you could define an XML namespace as:

xmlns:amazonisbn="http://www.amazon.com/exec/obidos/ASIN/"

and then refer to the book Programming the Semantic Web by its ISBN with the CURIE:

amazonisbn:0596153813

CURIEs are great when you are working with predicates because you can make one xmlns declaration for each vocabulary you are using and quickly construct CURIEs for any property in the vocabulary. However, they can be frustrating when you want to talk about a wide range of subjects or objects (since you have to make an xmlns declaration for each unique base URI). To alleviate this problem, RDFa allows you to use full URIs for several of the subject and object attributes. But because full URIs use a colon to separate the protocol scheme from the hierarchical part of the URI, parsers could become confused when they see the http:—did you mean http: as in http://example.org/cow or were you writing a CURIE where http is a prefix for some namespace?

To avoid this confusion, RDFa defines a “safe CURIE” that makes it clear when a colon-separated statement is being used as a CURIE versus as a protocol identifier in a URI. To construct a safe CURIE, simply place your CURIE in square brackets, as in:

[example:place/barn]

Let’s look at the list of attributes used by RDFa, grouping them by the part of the RDF statement that they declare.

This is the attribute to set an RDF subject:


about

A URI (or safe CURIE) used as a subject in an RDF triple. By default, the base URI for the page is the root URI for all statements. Using an about attribute allows statements to be made where the base URI isn’t the subject.

These are the attributes to set an RDF predicate:


rel

CURIEs expressing relationships between two resources


property

CURIEs expressing relationships between a resource and a literal


rev

CURIEs expressing a reverse relationship between two resources

These are the attributes to set an RDF object:


content

A string, representing a literal RDF object


href

A URI resource expressing an RDF object (as inline clickable)


src

A URI resource expressing an RDF object (as an inline embedded item)


resource

A URI (or safe CURIE) expressing an RDF object when the object isn’t visible on the page

RDFa also provides special attributes for specifying datatypes and making rdf:type statements:


datatype

The datatype of a literal


typeof

The type of a subject

It is important to note that the attribute you use to set the predicate depends on the type of object in the RDF statement. If the object is a literal, then the predicate is specified with the property attribute. If, however, the object is a resource, the rel or rev attribute is used. Because XHTML is markup for producing human-readable displays, it may not be convenient to display data in the subject-predicate-object order of an RDF statement. To handle these situations, RDFa provides the rev attribute for setting the predicate, which also indicates that the order of the statement has been reversed (object, predicate, subject).

Minimally, we can specify RDF triples in a single markup element. In the following example, the object is a literal, so we use the attribute property to state the predicate:

<span xmlns:foaf="http://xmlns.com/foaf/0.1/"
      about="http://kiwitobes.com/toby.rdf#ts" 
      property="foaf:nick"
      content="kiwitobes" />

When the statement’s object is a resource, we use the attribute rel to state the predicate:

<span xmlns:foaf="http://xmlns.com/foaf/0.1/"
      about="http://kiwitobes.com/toby.rdf#ts" 
      rel="http://xmlns.com/foaf/0.1/homepage"
      href="http://kiwkitobes.com" />

But we can also use the XHTML element to display parts of the structure. For instance, we can make the following statement about Toby’s nickname when displaying the string “kiwitobes”:

Toby's nickname is: <span xmlns:foaf="http://xmlns.com/foaf/0.1/"
      about="http://kiwitobes.com/toby.rdf#ts" 
      property="http://xmlns.com/foaf/0.1/nick">kiwitobes</span>

Here’s an example of Toby’s FOAF record as a fragment of XHTML annotated with RDFa. The XML attributes and text that an RDFa parser would glean from this XHTML are in bold. The annotations can be made in any tags and are meant to reuse existing XHTML attributes and text that are also being used in the human-readable XHTML content. It can be tricky to figure out how to weave the attributes into existing documents, but the benefit is that all of the data is made available in one place and the concepts being discussed are unambiguously described using strong URIrefs:

<div xmlns:foaf="http://xmlns.com/foaf/0.1/"
        about="http://kiwitobes.com/toby.rdf#ts" typeof="foaf:Person"> 
        
  Name: <span property="foaf:name">Toby Segaran</span><br/> 
  Nickname: <span property="foaf:nick">kiwitobes</span><br/> 
  Interests: <a rel="foaf:interest" href="http://semprog.org">
                  <span property="rdfs:label">Semantic Programming</span></a>
  Homepage: <a rel="foaf:homepage" href="http://kiwkitobes.com/">KiwiTobes</a><p/> 

            
  Friends:<br/> 
  <ul rel="foaf:knows"> 
    <li about="http://semprog.com/people/colin" 
						typeof="foaf:Person" property="foaf:name">Colin Evans</li>
         
    <li typeof="foaf:Person"> 
        <span property="foaf:name">Jamie Taylor</span><br/> 
        Email: <a rel="foaf:mbox" href="mailto:jamie@semprog.com">
            jamie@semprog.com</a><br/>
    </li>
    
  </ul> 
</div> 


					  

In documents with copious amounts of human markup, it can be challenging to read RDFa. One way to work your way through the jungle of markup is to scan for rel, rev, and property attributes in a markup tag. Once you have found one of these elements, you know you have found the predicate of a statement. Then, search backward up the DOM tree to find the subject of the statement (remember, if you don’t find one, the document itself is the subject). Then search down the DOM tree to find the next item that can serve as an object for the statement. Keep in mind that if the predicate was specified using a rev attribute, the order of the statement will be reversed.


Because XHTML was designed to be extensible and allows new attributes to be added to markup elements, RDFa was specified as annotations on XHML. In practice, RDFa works perfectly well on HTML, though the markup will not validate against HTML 4.

Because it is easy to wrap templated HTML output with RDFa, and given the prevalence of database-driven websites, the amount of RDFa available on the Web is growing rapidly. A number of prominent sites, including MySpace and popular authoring tools, now have RDFa output capabilities, though it may not be obvious because RDFa doesn’t alter the HTML rendering.

In later chapters we will show how Yahoo! is extracting semantic data from sites using RDFa to enhance search, and we will revisit RDFa as we build more sophisticated semantic applications that not only consume semantic data, but republish their output for use by other semantic services.

RDFa is now a W3C Recommendation, but there have been several other attempts to embed RDF in HTML. One effort was eRDF (“embeddable RDF”), which predates RDFa. eRDF never reached a critical mass, and like other RDF microformats, it generally isn’t supported by RDF tools; still, it’s possible that you may run into it. You can read more about eRDF at http://research.talis.com/2005/erdf/wiki/Main/RdfInHtml.

Not the Last Word on Serialization

N-Triples, N3, RDF/XML, and RFDa are not the only RDF serializations you will find in the wild. Turtle is another popular and fairly simple serialization with its own compaction tricks. Turtle output is typically associated with the mime type application/x-turtle and the file extension .ttl. While you can learn more about it at http://www.dajobe.org/2004/01/turtle/, we believe that you need only be aware of its existence and know how to tell your favorite RDF library to read it.

If you run into a serialization that you find difficult to read or debug, try reading the data into your RDF library and then asking the library to serialize the graph back into a format you are comfortable with.