 |
Code Complete: A Practical Handbook of Software
Construction. Redmond, Wa.: Microsoft Press, 880 pages,
1993. Retail price: $35. ISBN: 1-55615-484-4.
Buy
Code Complete from Amazon.com. |
5.1 Valid Reasons to Create a Routine
Aside from the invention of the computer, the routine is the single greatest invention
in computer science. It makes programs easier to read and understand than any other
feature of any programming language. It's a crime to abuse the senior statesman of
computer science with code like that shown above.
Routines are also the greatest technique ever invented for saving space and improving
performance. Imagine how much larger your code would be if you had to repeat the code for
every call to a routine instead of branching to the routine. Imagine how hard it would be
to make performance improvements in the same code used in a dozen places rather than
making them all in one routine. Routines make modern programming possible.
"OK," you say, "I already know that routines are great, and I program
with them all the time. This seems kind of remedial, so what do you want me to do about
it?"
I want you to understand that there are many valid reasons to create a routine
and that there are right ways and wrong ways to do so. As an undergraduate computer
science student, I thought that the main reason to create a routine was to avoid duplicate
code. The introductory textbook I used said that routines were good because they made a
program easier to develop, debug, document, and maintain. Period. Aside from syntactic
details about how to use parameters and local variables, that was the complete explanation
of the theory and practice of routines. It was not a good or complete explanation.
Here's a list of valid reasons to create a routine. The reasons overlap somewhat, and
they're not intended to make an orthogonal set.
Reducing complexity. The single most important reason to create a routine is to
reduce a program's complexity. Create a routine to hide information so that you won't need
to think about it. Sure, you need to think about it when you write the routine. But after
it's written, you should be able to forget the details and use the routine without any
knowledge of its internal workings. Other reasons to create routines-minimizing code size,
improving maintainability, and improving correctness-are also good reasons, but without
the abstractive power of routines, complex programs would be impossible to manage
intellectually.
One indication that a routine needs to be broken out of another routine is deep nesting
of an inner loop or conditional. Reduce the routine's complexity by pulling the nested
part out and putting it in its own routine.
Avoiding duplicate code. Undoubtedly the most popular reason for creating a
routine is to avoid duplicate code. Indeed, creation of similar code in two routines
implies an error in decomposition. Split one or both routines, then let both call the part
that was split off. With code in one place, you save the space of duplicated code. Future
modifications are easier because you need to modify the code in only one location. The
code is more reliable because you have only one place in which to convince yourself that
the code is right. Modifications are more reliable because you avoid making slightly
different modifications that are supposed to be identical.
Limiting effects of changes. Isolate areas that are likely to change so that
effects of changes are limited to a single routine or, at most, a few routines. Design so
that areas that are most likely to change are the easiest to change. Areas likely to
change include hardware dependencies, input/output, complex data structures, and business
rules.
Hiding sequences. Hide the order in which events happen to be processed. For
example, if the program typically gets data from the user then gets auxiliary data from a
file, neither the routine that gets the user data nor the routine that gets the file data
should depend on the other routine being performed first. If you commonly have two lines
of code that read the top of a stack and decrement a StackTop variable, put them in
a PopStack() routine. Design the system so that either could be performed first,
then create a routine to hide the information about which happens to be performed first.
Improving performance. You can optimize the code in one place instead of several
places. Having code in one place means that a single optimization benefits all the
routines that use that routine, whether they use it directly or indirectly. It makes it
practical to recode the routine with a more efficient algorithm or a faster, more
difficult language, like assembler.
In the early days of computer programming, performance penalties for using routine on
some machines were prohibitive. A call to a routine meant that the operating system had to
swap out the program, swap in a directory of routines, swap in the routine, execute the
routine, swap out the routine, and swap the calling routine back in. All this swapping
chewed up resources and made the program slow. Modern machines, however, and
"modern" means any machine you're ever likely to work on, have virtually no
penalty for calling a routine. Moreover, because of the increased code size resulting from
putting code inline and the extra swapping that's associated with increased code size on
demand-paged virtual-memory machines, a recent study found a slight timing penalty
for using inline code rather than routines (Davidson and Holler 1992).
Making central points of control. Keep control in one place. Control assumes
many forms. Knowledge of the number of entries in a table is one form. Control of hardware
devices-disks, tapes, printers, plotters, and so on-is another. Using one routine to read
from a file and one routine to write to it is a form of centralized control. This is
especially useful because, if the file needs to be converted to an in-memory
data-structure, the changes affect only the access routines.
Reading and modifying the contents of internal data structures with specialized
routines is another form of centralized control.
The idea of centralized control is not really distinct from the other ideas presented.
It's especially similar to information hiding, but it has unique heuristic power that
makes it worth adding to your programming toolbox.
Hiding data structures. You can hide the implementation details of a data
structure so that most of the program doesn't need to worry about the messy details of
manipulating computer-science structures, but can deal with the data in terms of how it's
used in the problem domain. Routines that hide implementation details provide a valuable
level of abstraction that reduce a program's complexity. They centralize data structure
operations in one place and reduce the chance of errors working with the data structure.
They make it easy to change the structure without changing most of the program.
When hiding a data structure, refer to it independently of the media its stored on. If
you have an insurance rates table, for example, that's so big that it's always stored on
disk, you might be tempted to refer to it as a "rate file." When you
refer to it as a file, however, you're exposing more information about the data than you
need to. If you ever change the program so that the table is in memory instead of on disk,
the code that refers to it as a file will be incorrect, misleading, and confusing. Try,
instead, to make names of access routines independent of how the data is stored, and refer
to the abstract data type, insurance rates table, instead.
Hiding global data. If you need to use global data, you can hide its
implementation details as just described. Working with global data through access routines
provides several benefits. You can change the structure of the data without changing your
program. It allows you to monitor accesses to the data. The discipline of using access
routines also encourages you to think clearly about whether the data is really global; it
might be more accurate to treat it as data that's local to several routines in a single
module or as the part of an abstract data type that's visible to the rest of the program.
Hiding pointer operations. Pointer operations tend to be hard to read and error
prone. By isolating them in functions, you can concentrate on the intent of the operation
rather than the mechanics of pointer manipulation. Also, if the operations are done in
only one place, you are more certain that the code is correct. If you find a better data
structure than pointers, you can change the program traumatizing the routines that use the
pointers.
Promoting code reuse. Code put into modular routines can be reused in other
programs more easily than the same code embedded in a larger routine. Even if a section of
code is called from only one place in the program and is understandable as part of a
larger routine, it makes sense to put it into its own routine if that piece of code can be
used in another program.
Planning for a family of programs. If you expect a a program to be modified,
isolate the parts that you expect to change in their own routines. You can then modify the
routines without affecting the rest of the program or you can put in completely new
routines instead. For example, several years ago I managed a team which wrote a series of
programs used by our clients to sell insurance. We had to tailor each program to the
specific client's insurance rates, quote-report format, and so on. But many parts of the
programs were similar: the routines that input information about potential clients, that
stored information in a client database, that looked up rates and computed total rates for
a group, and so on. The team modularized the program so that each part that varied from
client to client was in its own module. The initial programming might have taken three
months or so, but when we got a new client, we merely wrote a handful of new modules for
the new client and drop them into the rest of the code. Two or three days work, and voila!
Custom software!
Making a section of code readable. Putting a section of code into a well-named
routine is one of the best ways to document its function. Instead of reading a series of
statements like
if ( Node <> NULL )
while ( Node.Next <> NULL ) do
Node = Node.Next
LeafName = Node.Name
else
LeafName = ""
You can read a statement like
LeafName = GetLeafName( Node )
The new routine is so short that all it needs for documentation is a good name. Using a
function call instead of six lines of code makes the routine that originally contained the
code less complex and documents it automatically.
Improving portability. Isolate use of nonportable capabilities to explicitly
identify and isolate future portability work. Nonportable capabilities include nonstandard
language features, hardware dependencies, operating system dependencies, and so on.
Isolating complex operations. Complex operations are prone to errors. Complex
operations include complicated algorithms, communications protocols, tricky boolean tests,
operations on complex data, and so on. If an error does occur, the error is easier to find
because it's not spread through the code, but is contained in a routine. The error does
not affect other code because only one routine has to be fixed-other code is not touched.
If you find a better, simpler, or more reliable algorithm, it's easier to replace the old
algorithm if it's isolated in a routine. During development, it's easier to try several
designs and use the one that works best.
Isolating use of nonstandard language functions. Most languages contain handy,
nonstandard extensions. Using them is a double edged sword because they might not be
available in a different environment, whether the different environment is different
hardware, a different vendor's implementation of the same language, or a new version of
the language from the same vendor. If you use them, build routines of your own that act as
gateways to them. You can then replace the vendor's nonstandard routines with custom
written ones if needed.
Simplifying complicated boolean tests. Detailed understanding of complicated
boolean tests is rarely necessary for understanding program flow. Putting the test in a
function makes the code more readable because (1) the details of the test are out of the
way; and (2) a descriptive function name summarizes the purpose of the test, making the
code more readable.
Giving the test a function of its own emphasizes its significance. It encourages extra
effort to make the details of the test readable inside its function. The result is that
both the main flow of the code and the test itself become clearer.
For the sake of modularization? Absolutely not. With so many good reasons to put
something in a routine, this one is unnecessary. In addition, some functions are performed
better in a single large routine. (The best length for a routine is discussed in Section
5.5, "How Long Can a Routine Be?")
Operations That Seem Too Simple to Put Into Routines
One of the strongest barriers to creating effective routines is a reluctance to make a
routine for a simple purpose only because it just seems too simple to deserve its own
routine. This is a strong mental block, and it takes experience to realize how helpful a
good, small routine is.
Small routines have several advantages. One is that they improve readability. For
example, I had the following single line of code in about a dozen places in a program:
Points := DeviceUnits * ( POINTS_PER_INCH / DeviceUnitsPerInch() );
This is not the most complicated line of code you'll ever read. Most people would
eventually figure out that it converts a measurement in device units to a measurement in
points. They would see that the dozen lines did the same thing. It could have been
clearer, however, so I created a routine to do it in one place,
FUNCTION DeviceUnitsToPoints( DeviceUnits: Integer ): Integer
begin
DeviceUnitsToPoints := DeviceUnits *
( POINTS_PER_INCH / DeviceUnitsPerInch() )
end;
When the routine is used, the dozen lines of code all looked more or less like,
Points := DeviceUnitsToPoints( DeviceUnits );
which was more readable--even approaching self-documenting.
This hints at another reason to put small operations into functions: small operations
tend to turn into slightly larger operations. I didn't know it when I wrote the routine
above, but under certain conditions and when certain devices were active, DeviceUnitsPerInch()
returned zero. That meant I had to account for division by zero, which took three more
lines of code, like this,
FUNCTION DeviceUnitsToPoints( DeviceUnits: Integer ): Integer
begin
if ( DeviceUnitsPerInch() <> 0 ) then
DeviceUnitsToPoints := DeviceUnits *
(POINTS_PER_INCH / DeviceUnitsPerInch())
else
DeviceUnitsToPoints := 0
end;
If that original line of code was still in a dozen places, the test would have been
repeated a dozen times, for a total of 36 new lines of code. A simple routine reduced the
36 new lines to three.
Summary of Valid Reasons to Create a Routine
- Reducing complexity
- Avoiding duplicate code
- Limiting effects of changes
- Hiding sequences
- Improving performance
- Making central points of control
- Hiding data structures
- Hiding global data
- Hiding pointer operations
- Promoting code reuse
- Planning for a family of programs
- Making a section of code readable
- Improving portability
- Isolating complex operations
- Isolating use of nonstandard language functions
- Simplifying complicated boolean tests
This material is Copyright © 1993 by Steven C. McConnell. All Rights Reserved.
|