Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
Let’s look at some data on how a real-world program changed over time. There are hundreds of files in this particular program, but the details for each file won’t fit on this page, so four files have been chosen as examples. Details on these files are given in Table 4-1.
Table 4-1. Changes in files over time
|
File 1 |
File 2 |
File 3 |
File 4 | |
|---|---|---|---|---|
|
Period analyzed |
5 years, 2 months |
8 years, 3 months |
13 years, 3 months |
13 years, 4 months |
|
Lines originally |
423 |
192 |
227 |
309 |
|
Unchanged lines |
271 |
101 |
4 |
8 |
|
Lines now |
664 |
948 |
388 |
414 |
|
Grew by |
241 |
756 |
161 |
105 |
|
Times changed |
47 |
99 |
194 |
459 |
|
Lines added |
396 |
1,026 |
913 |
3,828 |
|
Lines deleted |
155 |
270 |
752 |
3,723 |
|
Lines modified |
124 |
413 |
1,382 |
3,556 |
|
Total changes |
675 |
1,709 |
3,047 |
11,107 |
|
Change ratio |
1.6x |
8.9x |
13x |
36x |
In this table:
The time period over which the file existed.
How many lines were in the file when it was originally written.
How many lines are the same now as they were when the file was originally written.
How many lines there are in the file now, at the end of the analysis period.
The difference between “Lines now” and “Lines originally.”
The total number of times a programmer made some set of changes to the file (where one set of changes involves changes to many lines). Usually one set of changes will represent one bug fix, one new feature, etc.
How many times, over the history of the file, a new line was added.
How many times, over the history of the file, an existing line was deleted.
How many times, over the history of the file, an existing line was changed (but not newly added or deleted).
The sum of the “Lines added,” “Lines deleted,” and “Lines modified” counts for that file.
How much larger “Total changes” is than “Lines originally.”
When we refer to “lines” in the above descriptions, that includes every line in the files: code, comments, documentation, and empty lines. If you were to do the analysis without counting comments, documentation, and empty lines, one major difference you would see is that the “Unchanged lines” count would become much smaller in proportion to the other numbers. (In other words, the unchanged lines are nearly always comments, documentation, or empty lines.)
The most important thing to realize from this table is that a lot of change happens in a software project. It becomes more and more likely that any particular line of code will change as time goes on, but you can’t predict exactly what is going to change, when it’s going to change, or how much it will have to change. Each of these four files changed in very different ways (you can see this even just looking at the numbers), but they all changed a significant amount.
There are a few other interesting things about the numbers, as well:
Looking at the change ratio, we see that more work was put into changing each file than writing it originally. Obviously, line counts aren’t a perfect estimate of how much work was actually done, but they do give us a general idea. Sometimes the ratio is huge—for example, file 4 had 36 times as many total changes as it did original lines.
The number of unchanged lines in each file is small compared to its “Lines originally” count, and even smaller compared to its “Lines now” count.
A lot of change can happen to a file even if it only gets a little bit bigger over time. For example, file 3 grew by only 161 lines over 13 years, but during that time the total changes count reached 3,047 lines.
The total changes count is always larger than the lines now count. In other words, you’re more likely to have changed a line in a file than to have a line in a file, once the file has been around for long enough.
In file 3, the number of lines modified is larger than the number of lines in the original file plus the number of lines added. That file’s lines have been modified more often than new lines have been added. In other words, some lines of that file have changed over and over. This is common on projects with a long lifetime.
The above points aren’t all that could be learned here—there is a lot more interesting analysis that could be done on these numbers. You’re encouraged to dig into this data (or work out similar numbers for your own project) and see what else you can learn.
Another good learning experience is looking over the history of changes made to one particular file. If you have a record of every change made to files in your program, and you have one file that’s been around for a long time, try looking at each change made over its lifetime. Think about if you could have predicted that change when the file was originally written, and consider whether the file could have been better written originally to make the changes simpler. Generally, try to understand each change and see if you can learn anything new about software development from doing so.