Code Complete: A Practical Handbook of Software Construction. Redmond, Wa.: Microsoft Press, 880 pages, 1993. Retail price: $35. ISBN: 1-55615-484-4. 

Buy Code Complete from Amazon.com.


5.1 Valid Reasons to Create a Routine  

Aside from the invention of the computer, the routine is the single greatest invention in computer science. It makes programs easier to read and understand than any other feature of any programming language. It's a crime to abuse the senior statesman of computer science with code like that shown above.

Routines are also the greatest technique ever invented for saving space and improving performance. Imagine how much larger your code would be if you had to repeat the code for every call to a routine instead of branching to the routine. Imagine how hard it would be to make performance improvements in the same code used in a dozen places rather than making them all in one routine. Routines make modern programming possible.

"OK," you say, "I already know that routines are great, and I program with them all the time. This seems kind of remedial, so what do you want me to do about it?"

I want you to understand that there are many valid reasons to create a routine and that there are right ways and wrong ways to do so. As an undergraduate computer science student, I thought that the main reason to create a routine was to avoid duplicate code. The introductory textbook I used said that routines were good because they made a program easier to develop, debug, document, and maintain. Period. Aside from syntactic details about how to use parameters and local variables, that was the complete explanation of the theory and practice of routines. It was not a good or complete explanation.

Here's a list of valid reasons to create a routine. The reasons overlap somewhat, and they're not intended to make an orthogonal set.

Reducing complexity. The single most important reason to create a routine is to reduce a program's complexity. Create a routine to hide information so that you won't need to think about it. Sure, you need to think about it when you write the routine. But after it's written, you should be able to forget the details and use the routine without any knowledge of its internal workings. Other reasons to create routines-minimizing code size, improving maintainability, and improving correctness-are also good reasons, but without the abstractive power of routines, complex programs would be impossible to manage intellectually.

One indication that a routine needs to be broken out of another routine is deep nesting of an inner loop or conditional. Reduce the routine's complexity by pulling the nested part out and putting it in its own routine.

Avoiding duplicate code. Undoubtedly the most popular reason for creating a routine is to avoid duplicate code. Indeed, creation of similar code in two routines implies an error in decomposition. Split one or both routines, then let both call the part that was split off. With code in one place, you save the space of duplicated code. Future modifications are easier because you need to modify the code in only one location. The code is more reliable because you have only one place in which to convince yourself that the code is right. Modifications are more reliable because you avoid making slightly different modifications that are supposed to be identical.

Limiting effects of changes. Isolate areas that are likely to change so that effects of changes are limited to a single routine or, at most, a few routines. Design so that areas that are most likely to change are the easiest to change. Areas likely to change include hardware dependencies, input/output, complex data structures, and business rules.

Hiding sequences. Hide the order in which events happen to be processed. For example, if the program typically gets data from the user then gets auxiliary data from a file, neither the routine that gets the user data nor the routine that gets the file data should depend on the other routine being performed first. If you commonly have two lines of code that read the top of a stack and decrement a StackTop variable, put them in a PopStack() routine. Design the system so that either could be performed first, then create a routine to hide the information about which happens to be performed first.

Improving performance. You can optimize the code in one place instead of several places. Having code in one place means that a single optimization benefits all the routines that use that routine, whether they use it directly or indirectly. It makes it practical to recode the routine with a more efficient algorithm or a faster, more difficult language, like assembler.

In the early days of computer programming, performance penalties for using routine on some machines were prohibitive. A call to a routine meant that the operating system had to swap out the program, swap in a directory of routines, swap in the routine, execute the routine, swap out the routine, and swap the calling routine back in. All this swapping chewed up resources and made the program slow. Modern machines, however, and "modern" means any machine you're ever likely to work on, have virtually no penalty for calling a routine. Moreover, because of the increased code size resulting from putting code inline and the extra swapping that's associated with increased code size on demand-paged virtual-memory machines, a recent study found a slight timing penalty for using inline code rather than routines (Davidson and Holler 1992).

Making central points of control. Keep control in one place. Control assumes many forms. Knowledge of the number of entries in a table is one form. Control of hardware devices-disks, tapes, printers, plotters, and so on-is another. Using one routine to read from a file and one routine to write to it is a form of centralized control. This is especially useful because, if the file needs to be converted to an in-memory data-structure, the changes affect only the access routines.

Reading and modifying the contents of internal data structures with specialized routines is another form of centralized control.

The idea of centralized control is not really distinct from the other ideas presented. It's especially similar to information hiding, but it has unique heuristic power that makes it worth adding to your programming toolbox.

Hiding data structures. You can hide the implementation details of a data structure so that most of the program doesn't need to worry about the messy details of manipulating computer-science structures, but can deal with the data in terms of how it's used in the problem domain. Routines that hide implementation details provide a valuable level of abstraction that reduce a program's complexity. They centralize data structure operations in one place and reduce the chance of errors working with the data structure. They make it easy to change the structure without changing most of the program.

When hiding a data structure, refer to it independently of the media its stored on. If you have an insurance rates table, for example, that's so big that it's always stored on disk, you might be tempted to refer to it as a "rate file." When you refer to it as a file, however, you're exposing more information about the data than you need to. If you ever change the program so that the table is in memory instead of on disk, the code that refers to it as a file will be incorrect, misleading, and confusing. Try, instead, to make names of access routines independent of how the data is stored, and refer to the abstract data type, insurance rates table, instead.

Hiding global data. If you need to use global data, you can hide its implementation details as just described. Working with global data through access routines provides several benefits. You can change the structure of the data without changing your program. It allows you to monitor accesses to the data. The discipline of using access routines also encourages you to think clearly about whether the data is really global; it might be more accurate to treat it as data that's local to several routines in a single module or as the part of an abstract data type that's visible to the rest of the program.

Hiding pointer operations. Pointer operations tend to be hard to read and error prone. By isolating them in functions, you can concentrate on the intent of the operation rather than the mechanics of pointer manipulation. Also, if the operations are done in only one place, you are more certain that the code is correct. If you find a better data structure than pointers, you can change the program traumatizing the routines that use the pointers.

Promoting code reuse. Code put into modular routines can be reused in other programs more easily than the same code embedded in a larger routine. Even if a section of code is called from only one place in the program and is understandable as part of a larger routine, it makes sense to put it into its own routine if that piece of code can be used in another program.

Planning for a family of programs. If you expect a a program to be modified, isolate the parts that you expect to change in their own routines. You can then modify the routines without affecting the rest of the program or you can put in completely new routines instead. For example, several years ago I managed a team which wrote a series of programs used by our clients to sell insurance. We had to tailor each program to the specific client's insurance rates, quote-report format, and so on. But many parts of the programs were similar: the routines that input information about potential clients, that stored information in a client database, that looked up rates and computed total rates for a group, and so on. The team modularized the program so that each part that varied from client to client was in its own module. The initial programming might have taken three months or so, but when we got a new client, we merely wrote a handful of new modules for the new client and drop them into the rest of the code. Two or three days work, and voila! Custom software!

Making a section of code readable. Putting a section of code into a well-named routine is one of the best ways to document its function. Instead of reading a series of statements like

if ( Node <> NULL )
   while ( Node.Next <> NULL ) do
      Node = Node.Next
   LeafName = Node.Name
else
   LeafName = ""

You can read a statement like

LeafName = GetLeafName( Node )

The new routine is so short that all it needs for documentation is a good name. Using a function call instead of six lines of code makes the routine that originally contained the code less complex and documents it automatically.

Improving portability. Isolate use of nonportable capabilities to explicitly identify and isolate future portability work. Nonportable capabilities include nonstandard language features, hardware dependencies, operating system dependencies, and so on.

Isolating complex operations. Complex operations are prone to errors. Complex operations include complicated algorithms, communications protocols, tricky boolean tests, operations on complex data, and so on. If an error does occur, the error is easier to find because it's not spread through the code, but is contained in a routine. The error does not affect other code because only one routine has to be fixed-other code is not touched. If you find a better, simpler, or more reliable algorithm, it's easier to replace the old algorithm if it's isolated in a routine. During development, it's easier to try several designs and use the one that works best.

Isolating use of nonstandard language functions. Most languages contain handy, nonstandard extensions. Using them is a double edged sword because they might not be available in a different environment, whether the different environment is different hardware, a different vendor's implementation of the same language, or a new version of the language from the same vendor. If you use them, build routines of your own that act as gateways to them. You can then replace the vendor's nonstandard routines with custom written ones if needed.

Simplifying complicated boolean tests. Detailed understanding of complicated boolean tests is rarely necessary for understanding program flow. Putting the test in a function makes the code more readable because (1) the details of the test are out of the way; and (2) a descriptive function name summarizes the purpose of the test, making the code more readable.

Giving the test a function of its own emphasizes its significance. It encourages extra effort to make the details of the test readable inside its function. The result is that both the main flow of the code and the test itself become clearer.

For the sake of modularization? Absolutely not. With so many good reasons to put something in a routine, this one is unnecessary. In addition, some functions are performed better in a single large routine. (The best length for a routine is discussed in Section 5.5, "How Long Can a Routine Be?")

Operations That Seem Too Simple to Put Into Routines

One of the strongest barriers to creating effective routines is a reluctance to make a routine for a simple purpose only because it just seems too simple to deserve its own routine. This is a strong mental block, and it takes experience to realize how helpful a good, small routine is.

Small routines have several advantages. One is that they improve readability. For example, I had the following single line of code in about a dozen places in a program:

Points := DeviceUnits * ( POINTS_PER_INCH / DeviceUnitsPerInch() );

This is not the most complicated line of code you'll ever read. Most people would eventually figure out that it converts a measurement in device units to a measurement in points. They would see that the dozen lines did the same thing. It could have been clearer, however, so I created a routine to do it in one place,

FUNCTION DeviceUnitsToPoints( DeviceUnits: Integer ): Integer
begin
   DeviceUnitsToPoints := DeviceUnits *
      ( POINTS_PER_INCH / DeviceUnitsPerInch() )
end;

When the routine is used, the dozen lines of code all looked more or less like,

Points := DeviceUnitsToPoints( DeviceUnits );

which was more readable--even approaching self-documenting.

This hints at another reason to put small operations into functions: small operations tend to turn into slightly larger operations. I didn't know it when I wrote the routine above, but under certain conditions and when certain devices were active, DeviceUnitsPerInch() returned zero. That meant I had to account for division by zero, which took three more lines of code, like this,

FUNCTION DeviceUnitsToPoints( DeviceUnits: Integer ): Integer
begin
   if ( DeviceUnitsPerInch() <> 0 ) then
      DeviceUnitsToPoints := DeviceUnits *
         (POINTS_PER_INCH / DeviceUnitsPerInch())
   else
      DeviceUnitsToPoints := 0
end;

If that original line of code was still in a dozen places, the test would have been repeated a dozen times, for a total of 36 new lines of code. A simple routine reduced the 36 new lines to three.

Summary of Valid Reasons to Create a Routine

  • Reducing complexity
  • Avoiding duplicate code
  • Limiting effects of changes
  • Hiding sequences
  • Improving performance
  • Making central points of control
  • Hiding data structures
  • Hiding global data
  • Hiding pointer operations
  • Promoting code reuse
  • Planning for a family of programs
  • Making a section of code readable
  • Improving portability
  • Isolating complex operations
  • Isolating use of nonstandard language functions
  • Simplifying complicated boolean tests

This material is Copyright 1993 by Steven C. McConnell. All Rights Reserved.

 

Email me at stevemcc@construx.com.