The Big-Oh notation

When we compare algorithms in order to select one to use, we often need an understanding of their performance and space characteristics. Performance is important because, well, we’re always interested in raw speed; and space is important because we are always on the lookout for algorithms that don’t waste memory. Of course, there are other considerations too. For example, we might want to know how easy it is to implement algorithm X or algorithm Y. Yet most of the time we are primarily interested in performance and space characteristics.

We’ll talk about space considerations in a later article; for now, we’ll consider how to compare the performance of algorithms.

When comparing performance we need a compact notation to express its characteristics. For instance, it is awkward to say "the performance of algorithm X is proportional to the number of items it processes, cubed," or something equally as verbose. Fortunately Computer Science has a solution to this problem; it’s called the big-Oh notation.

We begin by running a series of profiling experiments to analyze the performance characteristics of the algorithm in which we’re interested. (If we’re Don Knuth, we can also try to derive the characteristics mathematically from first principles.) If we are lucky, the results of these profiling runs allow us to work out the mathematical function of n, the number of items, to which the time taken by the algorithm is proportional, and then say that the algorithm is an O(f(n)) algorithm, where f(n) is the mathematical function we determined. We read this as "big-Oh of f(n)", or, less rigorously, as "proportional to f(n)."

For example, if we timed experiments on a sequential search through an array for different numbers of items in the array, we would find that it is a O(n) algorithm. Binary search, on the other hand, we’d find out to be a O(log(n)) algorithm. Since log(n) < n, for all positive n, we could say that binary search is always faster than sequential search since the time taken would always be smaller. (However, in a moment, I shall be dishing out a couple of warnings about taking conclusions from the big-Oh notation too far. Be warned.)

Suppose that by experimentation we work out that Algorithm X is O(n2 + n), in other words, the time it takes to run is proportional to n2 + n. By "proportional to" we mean that we can find a constant k such that the following equation holds:

TimeTaken = k * (n2 + n)

Now, in general, the value of k doesn’t really affect our intuition of the performance of Algorithm X. Yes, higher values of k result in slower performance, but the important bits are within the parentheses, the n squared and the n. Increasing n doesn’t affect k; it’s constant, remember. In fact, knowing this, we can see that multiplying the mathematical function inside the big-Oh parentheses by a constant value has no effect. For example, O(3 * f(n)) is equal to O(f(n)); we can just take the ‘3’ out of the big-Oh notation and multiply it into the outside proportionality constant, the one we can conveniently ignore.

(The same goes for adding a constant inside the big-Oh parentheses; for large n, O(n + 42) is the same as O(n).)

If the value of n is large enough when we test Algorithm X, we can safely say that the effects of the "+ n" term are going to be swallowed up by the n2 term. In other words, providing n is large enough, O(n2 + n) is equal to O(n2). And that goes for any additional term in n: we can safely ignore it if, for sufficiently large n, its effects are swallowed by another term in n. So, for example, a term in n2 will be swallowed up by a term in n3; a term in log(n) will be swallowed up by a term in n; and so on. Note that this only applies when we’re adding or subtracting terms, we can’t ignore multiplying or dividing terms in the same manner (unless the term is constant, as we’ve shown).

This shows that arithmetic with the big-Oh notation is very easy. Let’s, for argument’s sake, suppose that we have an algorithm that performs several different tasks. The first task, taken on its own, is O(n), the second is O(n2), the third is O(log(n)). What is the overall big-Oh value for the performance of the algorithm? The answer is O(n2), since that is the dominant part of the algorithm, by far.

But, having said that, here comes the warning I was about to give you before about drawing conclusions from big-Oh values. Big-Oh values are representative of what happens with large values of n. For small values of n, the notation breaks down completely; other factors start to come into play and swamp the general results. For example, suppose we time two algorithms in an experiment. We manage to work out the two performance functions from our statistics:

Time taken for first = k1 * (n + 100000)
Time taken for second = k2 * n2

The two constants k1 and k2 are of the same magnitude. Which algorithm would you use? If we went with the big-Oh notation, we’d always choose the first algorithm because it’s O(n). However, if we actually found that in our applications n was never greater than 100, it would make more sense for us to use the second algorithm.

So, when you need to select an algorithm for some purpose, you must take into account not only the big-Oh value of the algorithm, but also its characteristics for the average number of items (or, if you like, the environment) for which you will be using the algorithm). Again, the only way you’ll ever know you’ve selected the right algorithm is by measuring its speed in your application, for your data, with a profiler. Don’t take anything on trust from an author like me; you should measure, time, and test.

There’s another issue we need to consider as well. The big-Oh notation generally refers to an average case scenario. In our sequential versus binary search thought experiment, if the item for which we were looking was always the first item in the array, we’d find that sequential search would always be faster than binary search — we would succeed in finding the element we wanted after only one test. This is known as a best case scenario and is O(1). (Big-Oh of 1 means that it takes a constant time, no matter how many items there are.)

If the item which we wanted was always the last item in the array, the sequential search would be a pretty bad algorithm. This is a worst case scenario and would be O(n), just like the average case.

Although binary search has a similar best case scenario (the item we want is bang in the middle of the array and is found at the first shot), its worst case scenario is still much better than that for sequential search.

In general, we should look at the big-Oh value for an algorithm’s average and worst cases. Best cases are usually not too interesting — we are generally more concerned with what happens "at the limit," since that is how our applications will be judged.

In conclusion, we have seen that the big-Oh notation is a valuable tool for us to characterize various algorithms that do similar jobs. We have also discussed that the big-Oh notation is generally valid only for large n, for small n we are advised to take each algorithm and time it. Also, the only way for us to truly know how an algorithm will perform in our application is to time it. Don’t guess, use a profiler.

In the second part of this article, you will learn about space and memory considerations and how those factors effect the selection of algorithms.

No comments yet

Leave a Reply

You must be logged in to post a comment.