Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Share this Page URL
Help

Components of a File > Compression - Pg. 196

196 Chapter7·DataStructuresandtheAnatomyofaFile MD5 was developed by Ron Rivest, who is a professor at the Massachusetts Institute of Technology. This hash algorithm produces a unique 128-bit value of data. This value is displayed as 32 hexadecimal characters. You will notice, as with all information presented in hex, that the numbers 0­9 and the letters a­f are used. An example of an MD5 hash is 1CFC968CAAB8084683B688BFEA357F91. The algorithm is called SHA1. This stands for Secure Hash Algorithm, and it was developed by the National Security Agency. This hash algorithm produces a unique 160-bit value, and it is displayed as 40 hexadecimal characters. Many people believe SHA1 will replace MD5 at some point. As with anything else, there is the slight possibility of two files producing the same hash. Further discussion of hash collisions is beyond the scope of this book. In the forensic and e-discovery context, a hash value has many purposes. For one, it is used to validate that a file has not been changed. A variety of forensic and e-discovery applications have built-in hash verification. The hash value is also used to eliminate duplicate files. On your computer, you may be surprise to find how many duplicate files exist. These files can even reside in the same folder as long as the filename is different. The hash value has nothing to do with the actual filename, but it has everything to do with the actual contents of the file. As we mentioned earlier in this chapter, much of a file's information resides outside the actual file. A file can also be changed without the user making any intentional changes. Because of unique identification and removal of duplicate files, this function is important to our profession. Compression