Replacement Selection Sort

This algorithm was described in 1965 by Donald Knuth, so it's designed to work in a very different computing environment than the one we're currently used to. Computers have very little memory (usually, some fixed number of registers), but have access to large external drives. It's common to build algorithms that would load some values into registers, process them there, then flush them directly back to the external storage. (Interestingly, this is similar to how processors work right now, except with main memory instead of external storage).

Let's assume that we have enough space in memory to hold two arrays: a first array Values of size n that can hold a bunch of values, and a second array Active of size n that holds boolean values. We're going to try to take a large input stream of unsorted values and do our best job to sort it, given that we only have enough room in memory to hold the Active and Values arrays, plus a few extra variables for storage space.

The idea behind the algorithm is as follows. First, load n values from the external source containing the unsorted sequence directly into the Values array. Then, set all of the Active values to true. For example, if n = 4, we might have this setup:

Values:    4    1    0    3
Active:    Yes  Yes  Yes  Yes

The replacement selection sort algorithm works by repeatedly looking for the lowest value in the Values array and writing it out to the output stream. In this case, we start off by finding the 0 value and writing it to the stream. This gives

Values:    4    1         3
Active:    Yes  Yes  Yes  Yes

Output:    0

Now, we have a blank spot in our Values array, so we can pull another value in from the external source. Let's suppose that we get 2. In that case, we have this setup:

Values:    4    1    2    3
Active:    Yes  Yes  Yes  Yes

Output:    0

Notice that since 2 > 0 and 0 was the smallest element here, we're guaranteed that when we wrote out the 0 to the output, the 2 shouldn't have come before it. That's good. Therefore, we go on to the next step of the algorithm and again find the smallest element here. That's the 1, so we send it to the output device:

Values:    4         2    3
Active:    Yes  Yes  Yes  Yes

Output:    0  1

Now, read in another value from the external source:

Values:    4    -1   2    3
Active:    Yes  Yes  Yes  Yes

Output:    0  1

Now we're in trouble. This new value (-1) is lower than the 1, meaning that if we truly wanted this value to go into the output in sorted order, it should come before the 1. However, we don't have enough memory to reread from the output device and fix it up. Instead, we're going to do the following. For now, let's keep the -1 in memory. We're going to do our best to sort the remaining elements, but when we're done doing so, we'll do a second iteration of generating a sorted sequence, and will put the -1 into that sequence. In other words, instead of producing one sorted sequence, we're going to produce two.

To indicate in memory that we don't want to write out the -1 yet, we're going to mark the slot holding -1 as inactive. This is shown here:

Values:    4    -1   2    3
Active:    Yes  NO   Yes  Yes

Output:    0  1

From now on, we'll just pretend that the -1 isn't here.

Let's continue onward. We now find the smallest value in memory that is still active (2), and write it out to the device:

Values:    4    -1        3
Active:    Yes  NO   Yes  Yes

Output:    0  1  2

We now pull the next value from the input device. Let's suppose it's 7:

Values:    4    -1   7    3
Active:    Yes  NO   Yes  Yes

Output:    0  1  2

Since 7 > 2, it comes after the 2 in the output, so we don't do anything.

On the next iteration, we find the lowest active value (3) and write it out:

Values:    4    -1   7    
Active:    Yes  NO   Yes  Yes

Output:    0  1  2  3

We pull the next value from the input device. Let's suppose it's also 3. In this case, we know that since 3 is the smallest value, we can just write it directly to the output stream, since 3 is the smallest of all the values here and we can save ourselves an iteration:

Values:    4    -1   7    
Active:    Yes  NO   Yes  Yes

Output:    0  1  2  3  3

We now pull the next value from the input device. Suppose it's 2. In that case, as before, we know that the 2 should come before the 3. Just like the earlier -1, this means that we need to hold the 2 in memory for now; we'll write it out later on. Now our setup looks like this:

Values:    4    -1   7    2
Active:    Yes  NO   Yes  NO

Output:    0  1  2  3  3

Now, we find the smallest active value (4) and write it to the output device:

Values:         -1   7    2
Active:    Yes  NO   Yes  NO

Output:    0  1  2  3  3  4

Suppose we now read in a 1 as our next input. We therefore put it into Values, but mark it inactive:

Values:    1    -1   7    2
Active:    NO   NO   Yes  NO

Output:    0  1  2  3  3  4

There's only one active value, which is 7, so we write it out:

Values:    1    -1        2
Active:    NO   NO   Yes  NO

Output:    0  1  2  3  3  4  7

Let's suppose we now read a 5. In that case, as before, we store it but mark the slot inactive:

Values:    1    -1   5    2
Active:    NO   NO   NO   NO

Output:    0  1  2  3  3  4  7

Notice that all the values are now inactive. This means that we've flushed out from memory all the values that can go into the current output run. Now, we need to go and write out all of the values we were holding for later. To do this, we mark all the values as active, then repeat as before:

Values:    1    -1   5    2
Active:    Yes  Yes  Yes  Yes

Output:    0  1  2  3  3  4  7

-1 is the smallest value, so we output it:

Values:    1         5    2
Active:    Yes  Yes  Yes  Yes

Output:    0  1  2  3  3  4  7  -1

Suppose that we read a 3. -1 < 3, so we load it into the Values array.

Values:    1    3    5    2
Active:    Yes  Yes  Yes  Yes

Output:    0  1  2  3  3  4  7 -1

1 is the smallest value here, so we remove it:

Values:         3    5    2
Active:    Yes  Yes  Yes  Yes

Output:    0  1  2  3  3  4  7 -1  1

Let's suppose that we're now out of input values. We'll mark this slot as done:

Values:    ---  3    5    2
Active:    Yes  Yes  Yes  Yes

Output:    0  1  2  3  3  4  7 -1  1

2 comes next:

Values:    ---  3    5    ---
Active:    Yes  Yes  Yes  Yes

Output:    0  1  2  3  3  4  7 -1  1  2

Then 3:

Values:    ---  ---  5    ---
Active:    Yes  Yes  Yes  Yes

Output:    0  1  2  3  3  4  7 -1  1  2  3

Finally, 5:

Values:    ---  ---  ---  ---
Active:    Yes  Yes  Yes  Yes

Output:    0  1  2  3  3  4  7 -1  1  2  3  5

And we're done! Note that the resulting sequence is not sorted, but it's a lot better than before. It now consists of two strands that are in sorted order. Merging them together (in the same way we'd do a merge for mergesort) will sort the resulting array. This algorithm could potentially have produced far more strands, but since our sample input was small, we only had two.


So how fast is this? Well, each iteration of the loop does a total of at most n comparisons (in memory), one read, and one write. Thus if there are N total values in the stream, the algorithm does O(nN) comparisons and O(N) memory operations. If memory operations are expensive, this isn't too bad, though you'll need a second pass at the end to merge everything back together.

In pseudocode, the algorithm looks like this:

Make Values an array of n elements.
Make Active an array of n booleans, all initially true.

Read n values from memory into Values.
Until no values are left to process:
    Find the smallest value that is still active.
    Write it to the output device.
    Read from the input device into the slot where the old element was.
    If it was smaller than the old element, mark the old slot inactive.
    If all slots are inactive, mark them all active.

I would be shocked if there was any reason to code this algorithm up now. This made sense many decades ago when memory was really, really small. Nowadays, there are far better external sorting algorithms available, and they'd almost certainly outperform this algorithm.

Hope this helps!

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章