Cogman
Lifer
What?
In the A is for Algorithms thread I wrote a little about Big Oh. However, I thought the subject deserved a little more attention. If you are looking at getting a job programming, many companies that will interview want to know if you know your Big Oh.
A brief recap. Big Oh describes algorithm operation growth as the input increases. Big Oh defines general classes of algorithm growth. It is more about talking about the curve of growth and less about the specifics of that curve.
For example, Lets say that we have a function f(x). Lets say that function does 3 * x ^ 5 + 43 operations. You could say that it is O(3 * x ^ 5 + 2 * x^2 + 43), however, Big Oh is more about classification, so the term that dominates growth is what is focused on. In this case, it should be plane that x^5 will be an overriding factor in how the curve grows and changes. Don't believe me? Try removing terms and graphing the result (a graphing calculator or wolfram alpha will be your friend here). You could approximate the growth fairly closely (especially for large x) with O(x^5).
The more formal definition of algorithm classifications requires calculus. If you had two big oh functions, f(x) and g(x), then you take the limit of the absolute value of f(x)/g(x) as x approaches infinity, if the result of that is infinity, f(x) is a faster growth than g(x). If the result is a constant value, f(x) and g(x) are considered the same growth. If the result is 0, g(x) is considered to be have faster growth than f(x). To take that limit you often have to apply L'Hôpital's rule.
Why does this matter?
It is an easy skill to learn the order of algorithms, and often slower growth functions are preferred over faster growth. In your day to day programming, you should be able to recognize something that will probably be slow based on the structure of the code.
However, before switching out something simple for something more complex to satisfy a performance itch, you should measure extensively. Big Oh talks only about general growth. It does not talk about small values and it does not take into account things like cache.
Take, for example, Matrix multiplication. The naive approach is an O(x^3) runtime. Strassen's algorithm has an O(x^2.8) complexity. Coppersmith–Winograd algorithm has a complexity of O(x^2.37).
Which would you think is used? Naively, you might say the Coppersmith algorithm is the right choice for high performance. And that would be a mistake. Unfortunately, the Coppersmith algorithm only becomes faster than Strassen's algorithm on matricies too big to be held on modern computers. As a result, Coppersmith is theoretically interesting, but practically useless.
In fact, libraries that implement matrix multiplication will often take a mixed approach. Strassen works by breaking the matrix into smaller matrices (divide and conquer) until it gets down single numbers. However, for smaller matrices the naive approach can often beat out Strassen. So a sophisticated algorithm will often use Strassen for larger matrices and then switch to the naive approach inside the Strassen algorithm when the matrices reach some threshold.
This sort of optimization is a common one as quite frequently, higher growth big O algorithms often perform better with small values.
Another example of this is timsort. Which is a hybrid of both a merge sort and an insertion sort (along with other goodness).
All this is to say, exercise caution when talking about big oh. It is a useful blunt tool.
Classifications?
With all that said. Here are some common classifications. A few bits of example code, and where you will see them. A more full list can be found just about everywhere you google Big Oh. You can find more names and full descriptions in the wikipedia time complexity article.
From smallest growth to largest growth, here are common complexities.
Constant O(1) - These are algorithms which care neither for the size of your data structures or input values. The simplest form of constant time algorithms are algorithms that are doing straight forward computations. For example
Assuming that the equality comparison is a constant time thing (it might not be if a and b are collections of some sort), then this bit of logic is constant time. You can think of constant time operations as simple mathematical or logic statements which don't grow based on the input.
Another common example you'll run across is around Hash based algorithms (Hash sets and Maps). In these, a hash is generated for the input value and lookup is done based on that hash value. Because the amount of searching does not increase with the size of the Set/Map (ideally) it is considered a constant time algorithm.
Logarithmic O(log(n)) - These algorithms are often associated with things like trees. Generally, you will find these occur when each step in the algorithm cuts the search area by some factor. For example. Assuming you have a Tree based map.
find q will be a operation O(log(n)). I would suggest reading up on binary trees to prove this out. Another common O(log(n)) operation is binary searches.
Linear O(n) - These are the algorithms that involve traversing every element of a collection. Often they take the form of
Assuming doStuff is a constant time operation.
Loglinear O(n log n) - These are often not commonly referred to as "Loglinear" or any of their other names (the others are referred to by their names commonly). Rather, you will hear people say "that is an n log n algorithm" or something similar.
These algorithms are often associated with divide and conquer algorithm methods. You will commonly see them referenced in terms of sorting. However, if you write one of these, it will probably be something like this
Because insert is a O(log(n)) operation and you do it for each element in a collection (which is a O(n) operation). You end up multiplying the complexity. This becomes a O(n * log n) operation.
As a side note. It does not register with many programmers that inserting many values into a tree is a O(n log n) operation. They tend to think of only the insertion cost of a single element (log n). Consequently, frequently building these datastructures may result in performance bottlenecks. Unless you need the results to stay sorted, I suggest steering away form tree based datastructures.
Polynomial O(n^x) - These algorithms are easy to make by mistake (or on purpose). They often look like this
The "x" is the nesting of the for loops. The above is a O(n^2) to make an O(n^3) algorithm you would add another nesting. These often sneak in by accident. Usually because of program complexity. Though, sometimes they are made by mistake or inexperience. Occasionally, these are done on purpose because the faster algorithm takes more memory or time. (see timsort again).
Factorial O(n!) - These are the brute force algorithms. You don't often see them in the wild. They are wildly inefficient and don't generally last long.
Usually, these are the algorithms that involve generating every combination of an answer, and then checking it to see if it is right. For example, cracking a password. These are used in cases where generating a test case is doable and easy. Checking if it is right is doable. However, direct calculation of the answer is hard if not impossible.
These are your algorithms of last resort. Often, when you have these, "close enough" is good enough. So instead of getting the right answer, getting an approximation is better. Sometimes, however, it is simply the case that a more optimal solution exists, it just isn't being used.
In the A is for Algorithms thread I wrote a little about Big Oh. However, I thought the subject deserved a little more attention. If you are looking at getting a job programming, many companies that will interview want to know if you know your Big Oh.
A brief recap. Big Oh describes algorithm operation growth as the input increases. Big Oh defines general classes of algorithm growth. It is more about talking about the curve of growth and less about the specifics of that curve.
For example, Lets say that we have a function f(x). Lets say that function does 3 * x ^ 5 + 43 operations. You could say that it is O(3 * x ^ 5 + 2 * x^2 + 43), however, Big Oh is more about classification, so the term that dominates growth is what is focused on. In this case, it should be plane that x^5 will be an overriding factor in how the curve grows and changes. Don't believe me? Try removing terms and graphing the result (a graphing calculator or wolfram alpha will be your friend here). You could approximate the growth fairly closely (especially for large x) with O(x^5).
The more formal definition of algorithm classifications requires calculus. If you had two big oh functions, f(x) and g(x), then you take the limit of the absolute value of f(x)/g(x) as x approaches infinity, if the result of that is infinity, f(x) is a faster growth than g(x). If the result is a constant value, f(x) and g(x) are considered the same growth. If the result is 0, g(x) is considered to be have faster growth than f(x). To take that limit you often have to apply L'Hôpital's rule.
Why does this matter?
It is an easy skill to learn the order of algorithms, and often slower growth functions are preferred over faster growth. In your day to day programming, you should be able to recognize something that will probably be slow based on the structure of the code.
However, before switching out something simple for something more complex to satisfy a performance itch, you should measure extensively. Big Oh talks only about general growth. It does not talk about small values and it does not take into account things like cache.
Take, for example, Matrix multiplication. The naive approach is an O(x^3) runtime. Strassen's algorithm has an O(x^2.8) complexity. Coppersmith–Winograd algorithm has a complexity of O(x^2.37).
Which would you think is used? Naively, you might say the Coppersmith algorithm is the right choice for high performance. And that would be a mistake. Unfortunately, the Coppersmith algorithm only becomes faster than Strassen's algorithm on matricies too big to be held on modern computers. As a result, Coppersmith is theoretically interesting, but practically useless.
In fact, libraries that implement matrix multiplication will often take a mixed approach. Strassen works by breaking the matrix into smaller matrices (divide and conquer) until it gets down single numbers. However, for smaller matrices the naive approach can often beat out Strassen. So a sophisticated algorithm will often use Strassen for larger matrices and then switch to the naive approach inside the Strassen algorithm when the matrices reach some threshold.
This sort of optimization is a common one as quite frequently, higher growth big O algorithms often perform better with small values.
Another example of this is timsort. Which is a hybrid of both a merge sort and an insertion sort (along with other goodness).
All this is to say, exercise caution when talking about big oh. It is a useful blunt tool.
Classifications?
With all that said. Here are some common classifications. A few bits of example code, and where you will see them. A more full list can be found just about everywhere you google Big Oh. You can find more names and full descriptions in the wikipedia time complexity article.
From smallest growth to largest growth, here are common complexities.
Constant O(1) - These are algorithms which care neither for the size of your data structures or input values. The simplest form of constant time algorithms are algorithms that are doing straight forward computations. For example
Code:
if (a == b) return c;
Assuming that the equality comparison is a constant time thing (it might not be if a and b are collections of some sort), then this bit of logic is constant time. You can think of constant time operations as simple mathematical or logic statements which don't grow based on the input.
Another common example you'll run across is around Hash based algorithms (Hash sets and Maps). In these, a hash is generated for the input value and lookup is done based on that hash value. Because the amount of searching does not increase with the size of the Set/Map (ideally) it is considered a constant time algorithm.
Logarithmic O(log(n)) - These algorithms are often associated with things like trees. Generally, you will find these occur when each step in the algorithm cuts the search area by some factor. For example. Assuming you have a Tree based map.
Code:
let map = someTreeBasedMap;
map.find(q);
find q will be a operation O(log(n)). I would suggest reading up on binary trees to prove this out. Another common O(log(n)) operation is binary searches.
Linear O(n) - These are the algorithms that involve traversing every element of a collection. Often they take the form of
Code:
for (value : values) {
doStuff(value);
}
Loglinear O(n log n) - These are often not commonly referred to as "Loglinear" or any of their other names (the others are referred to by their names commonly). Rather, you will hear people say "that is an n log n algorithm" or something similar.
These algorithms are often associated with divide and conquer algorithm methods. You will commonly see them referenced in terms of sorting. However, if you write one of these, it will probably be something like this
Code:
let map = someTreeBasedMap;
for (value : values) {
map.insert(value);
}
Because insert is a O(log(n)) operation and you do it for each element in a collection (which is a O(n) operation). You end up multiplying the complexity. This becomes a O(n * log n) operation.
As a side note. It does not register with many programmers that inserting many values into a tree is a O(n log n) operation. They tend to think of only the insertion cost of a single element (log n). Consequently, frequently building these datastructures may result in performance bottlenecks. Unless you need the results to stay sorted, I suggest steering away form tree based datastructures.
Polynomial O(n^x) - These algorithms are easy to make by mistake (or on purpose). They often look like this
Code:
for (i : is){
for (j : is) {
}
}
The "x" is the nesting of the for loops. The above is a O(n^2) to make an O(n^3) algorithm you would add another nesting. These often sneak in by accident. Usually because of program complexity. Though, sometimes they are made by mistake or inexperience. Occasionally, these are done on purpose because the faster algorithm takes more memory or time. (see timsort again).
Factorial O(n!) - These are the brute force algorithms. You don't often see them in the wild. They are wildly inefficient and don't generally last long.
Usually, these are the algorithms that involve generating every combination of an answer, and then checking it to see if it is right. For example, cracking a password. These are used in cases where generating a test case is doable and easy. Checking if it is right is doable. However, direct calculation of the answer is hard if not impossible.
These are your algorithms of last resort. Often, when you have these, "close enough" is good enough. So instead of getting the right answer, getting an approximation is better. Sometimes, however, it is simply the case that a more optimal solution exists, it just isn't being used.