One of the first things any programmer learns is Sorting. Most of us begin by learning something basic and digestible like Bubble Sort or Insertion Sort. Then, as we get more familiarized with complicated algorithms and tackle problems that involve more than just sorting an array of 10 numbers, we incorporate new sorting methods like Quick Sort or Merge Sort, or the more trendier ones, the Radix Sort or the Counting Sort.
However, this new wave of knowledge comes with more than just bragging rights. Knowing when and where to use which sorting algorithm is almost as important as knowing the algorithm itself. I mean, who would want to be the person who uses Heap Sort to sort characters in a word?
Various Classifications of a Sorting Algorithm
Sorting Algorithms can be classified into different categories based on certain parameters
- Number of Swaps: This is the number of times the elements of the input are swapped or inverted while sorting. Selection Sort requires the least amount of swapping and is extremely efficient for inputs that are nearly sorted.
- Number of Comparisons: This is the number of times the elements of the input are compared to one another while sorting. The Big-O time complexity of the number of comparisons required for all sorting algorithms, except for some special cases, is O(n log n) in the best case and O(n²) in the worst case.
- Use of Recursion: Recursion is the process when a function calls itself either directly or indirectly. Depending on the input, sorting algorithms that use recursion can be extremely efficient or inefficient when compared to the algorithms that rely on iteration. Sorting algorithms like Quick Sort use recursive techniques, while Selection Sort and Insertion Sort, among others, use iterative techniques. Additionally, algorithms like the Merge Sort, use both.
- Stability: Sorting algorithms are said to be stable if the repeated elements are in the same order as the input after sorting. These algorithms include, Insertion Sort, Merge Sort, etc.
- Space Complexity: Sorting algorithms are said to be in-place if the array is being sorted within itself, i.e., without using any additional memory than what is given in the input. Such algorithms have an O(1) Space Complexity. Examples include, Insertion sort and Quick Sort. Merge Sort is an example of out-place sort as it require extra memory space for its operations.
Choosing a Sorting Algorithm:
Insertion sort: When n is guaranteed to be small, including as the base case of a quick sort or merge sort. While this is O(n²), it is a stable and in-place sort.
Bubble sort, selection sort: When you want to refrain from using the standard library’s sorting algorithm, and want to perform simple and basic sorting. The only advantage these have over insertion sort is being slightly easier to implement.
Quick sort: When you don’t need a stable sort and average case performance matters more than worst case performance. A quick sort is O(n log n) on average, O(n²) in the worst case. A good implementation uses O(log n) auxiliary storage in the form of stack space for recursion.
Merge sort: When you need a stable, O(n log n) sort. The only downsides to it are that it uses O(n) auxiliary space and has a slightly larger constant than a quick sort.
Shell Sort: When you have a moderately large file(say N < 5000) and want to employ an easily implementable sorting technique. Shell sort requires O(n^(3/2)) operations in the worst case, which means that it can be quite effectively used even for moderately large files .
Heap sort: When you don’t need a stable sort and you care more about worst case performance than average case performance. It’s guaranteed to be O(n log n), and uses O(1) auxiliary space, meaning that you won’t unexpectedly run out of heap or stack space on very large inputs.
Non-comparison sorts:
Under some fairly limited conditions it’s possible to break the O(n log n) barrier and sort in O(n). Here are some cases where that’s worth a try:
Counting sort: When you are sorting integers with a limited range.
Radix sort: When log(n) is significantly larger than k, where k is the number of radix digits.
Bucket sort: When you can guarantee that your input is approximately uniformly distributed.