Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint
Share this Page URL
Help

Chapter 8. Structured Text

Chapter 8. Structured Text

In Chapter 6, we took a very brief look at the csv module that is used to read and write lines of tab- or comma-separated values, with each line corresponding to one item in the file. We’ve also looked at a variety of ways to scan files looking for certain patterns of data, including using str methods and regular expressions. Files that are in tab- or comma-separated values format, FASTA files, GenBank files, and many other file formats encountered in bioinformatics work are called flat files.[43] What is “flat” about them is that they are just text files: the data has no explicit structure beyond agreed-on conventions regarding special characters, blank lines, whitespace, etc. They can have introductory material before the data, other material after the data, several sets of data in one file, and so on.

[43] In computer science the term “flat file” usually has a stricter meaning, referring only to text files with one item per line, each having fields designated by separators (commas, tabs, vertical bars, spaces, etc.) or conforming to some specified number of characters. Files in formats such as FASTA and GenBank would be considered “free form,” even though they have some regularity.

The opposite of “flat” in this context is structured. A structured text file contains elements, each of which can have attributes and/or “sub” or child elements. There can be different kinds of elements, and in general there are rules specifying what attributes and children each kind of element can have. The linear approaches for processing text files that we’ve seen so far are inadequate for structured files, essentially because the files are two-dimensional. This chapter describes some ways to process structured files.


  

You are currently reading a PREVIEW of this book.

                                                                                        

Get instant access to over
$1 million worth of books and videos.

  

Start a Free Trial