Chapter 1 Arrays and Strings - 1.3

The statement of problem 1.3 is:

Design an algorithm and write code to remove the duplicate characters in a string without using any additional buffer. NOTE: One or two additional variables are fine. An extra copy of the array is not.
FOLLOW UP
Write the test cases for this method.

To be honest, the problem sucks. It doesn't make it clear which one should be removed in a bunch of duplicated characters. Let's assume that we should keep the first one of several duplicated characters. (First one means the first one we meet when iterating from left to right.)

Brute force solution takes O(n^2) running time.

Recall problem 1.1 and you will find this problem is a similar one. We can simply create a hash table for all characters and it will run in O(n). However, the smart method used in problem 1.1 has a space complexity of O(k), as k is the number of characters in the character set.

Another solution flashed into my mind is sorting the string in O(nlgn) and then using one extra variable to eliminate duplicated charaters. Nevertheless, it will change the order of characters and I am not sure whether it is proper.

Test cases of the program is given below:

1) empty string

2) "a"

3) "aa"

4) "aba"

5) "bab"

6) "abcd"

Seems that I cannot find a ideal solution whose running time is less than O(n^2). OK, let's turn to the answer page...

Well, the standard answer is the O(n^2) one and the author suggests that we'd better to ask what the interviewer means by an addtional buffer? Can we use addtional array of constant size?

I implemented the brute force solution as below:

def removeDuplicates(str):
    if len(str) < 2:
        return
    i = 0
    while i < len(str) - 1:
        j = i + 1
        while j < len(str):
            if str[i] == str[j]:
                del str[j]
                j = j - 1   # --j is wrong
            j = j + 1       # ++j is wrong
        i = i + 1           # ++i is wrong


if __name__ == '__main__':
    str = 'aaaaaabaaaaa'
    # One cannot change a string,
    # so we convert it to a list for convenience
    str_list = [i for i in str] 
    removeDuplicates(str_list)
    print str_list

I went into a endless loop when firstly tried to implement the algorithm above. The wrong code is in the comments: there is no ++ operator and -- operator in Python. ++ and -- will be translated to double positive operators and double negative operators.

Thanks to the delete operation in Python and it enable me to delete the duplicated elements directly. However, in some languages, we have no such a powerful tool in hands. For languages without delete operation, the answer page gives a smart implementation:

public static void removeDuplicates(char[] str) {
	if (str == null) return;
	int len = str.length;
	if (len < 2) return;

	int tail = 1;

	for (int i = 1; i < len; ++i) {
		int j;
		for (j = 0; j < tail; ++j) {
			if (str[i] == str[j]) break;
		}
		if (j == tail) {
			str[tail] = str[i];
			++tail;
		}
	}
	str[tail] = 0;
}

In my view, the algorithm above is similar to quick sort to some extent, for both of them keep the end of a particular segment in the whole array. This kind of strategy utilizes the memory that won't be used further.