187. Repeated DNA Sequences [leetcode]

題目:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"]. 
方法一:

hashmap + string操作。

時間:O(N)

class Solution {public: vector<string> findRepeatedDnaSequences(string s) { vector<string> res; if (s.size() < 10) return res; unordered_map<string, int> strs; for (int i = 0; i <= s.size() - 10; i++) { strs[s.substr(i, 10)]++; } for (unordered_map<string, int>:: iterator it = strs.begin(); it != strs.end(); it++) { if (it->second > 1) res.push_back(it->first); } return res; }};


該方法的缺點是由於把string用作key,空間佔用過大


接下來的方法參考的是討論區大神們的思路:

方法二:介個bit manipulation

把10位的string用一個int來表示。
A:00
C:01
G:10
T:11
因此一個DNA片段只要20個bit即可表示,範圍在INT內。

code:

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        vector<string> res;
        if (s.size() < 10) return res;
        unordered_map<char,int> change;
        change['A'] = 0;
        change['C'] = 1;
        change['G'] = 2;
        change['T'] = 3;
        unordered_map<int, int> strs;
        int num = 0;
        for (int i = 0; i < s.size(); i++) {
             num = num << 2 | change[s[i]];
            if (i < 9) continue;
            num &= 0xfffff;
            if (strs.count(num) && strs[num] == 1) {
                res.push_back(s.substr(i - 9, 10));
            }
            strs[num]++;
        }

        return res;
    }
};






發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章