2020 遊族杯 Problem F. Find / -type f -or -type d 字典樹

2020 ECNU Campus Online Invitational Contest Problem F

1. 題目描述

1.1. Limit

Time Limit: 1.5 sec

Memory Limit: 1024 MB

1.2. Problem Description

Cuber QQ wants to know the number of files with extension .eoj in his computer. This can be easily done with a command find / -type f | grep ’n.eoj$’ | wc -l. However, Cuber QQ is a grep-hater, who would rather write his own program than using grep. So he decides to take a detour: what if the command starts with something else? Is it still possible to recover the results?

If you are not familiar with command usages in Linux, all you need to know is that ls and find are two easy-to-use commands to inspect files and subdirectories a directory contains. For example, when you are trying to get a list of everything in your computer, you might try to use: find / -type f -or -type d, which will give you a list like:

Figure 1. Example \text{Figure 1. Example}

To make the problem even more interesting, Cuber QQ adds another shuf after find, so that the list is shuffled into a random order and his secrets will stay covered. Cuber QQ is wondering whether it’s possible to write a program cuber-qq-grep that filters out all the files with extension .eoj from the given shuffled list, which is his initial intention. Still, instead of giving the filtered list directly, Cuber QQ wants to know the length of this list, i.e., the number of files found. In other words, the following two commands will be almost equivalent:

  • find / -type f | grep ’n.eoj$’ | wc -l

  • find / -type f -or -type d | shuf | cuber-qq-grep

Well, there can be some subtle differences in input/output formats, but that’s not essential.

One more thing, on your file system, directory is only a logical concept. This means, a directory is created only when there is a file which relies on this directory is created and a directory cannot exist without files.

TL;DR, given the randomly shuffled list of all directories and files on a computer, count the number of files that ends with .eoj.

1.3. Input

The input starts with a line of one number n(1n105)n (1 \le n \le 10^5), which is the length of the following list.

In the following n lines, each line contains one string, which is an absolute path to a file or a directory.

The path starts with /, and is composed of multiple tokens (file names and directory names) concatenated with /. The tokens always start with a lowercase letter, followed by no more than 9 lowercase letters or dots. The root folder alone will not be included in this list.

It is guaranteed that the total length of n lines will be no longer than 10610^6.

1.4. Output

Output the number of files satisfying the above-mentioned condition, in one line.

1.5. Sample Input 1

3
/secret/eoj
/secret
/secret.eoj

1.6. Sample Onput 1

1

1.7. Sample Input 2

8
/i/am/an/ecnu/student
/i/am/an/ecnu
/i
/i/am/a
/i/am/an/idiot
/i/am/an
/i/am/a/genious
/i/am

1.8. Sample Onput 2

0

1.9. Sample Input 3

2
/cuber.eoj/qq.eoj
/cuber.eoj

1.10. Sample Onput 3

1

1.11. Source

2020 ECNU Campus Online Invitational Contest Problem F. Find / -type f -or -type d

2. 解讀

題目要求找出所有 .eoj 後綴的文件數量,不過如果存在 xxx.eoj/www 這樣的路徑,說明 xxx.eoj 是一個文件夾,需要把它從答案中去除掉。

使用字典樹進行存儲,將所有 .eoj 後綴的路徑存進一個 set 當中,最後對 set 進行遍歷,如果在字典數中以 set[i] 爲前綴的文件數量大於 1,即 set[i] 是一個文件夾,則把它從答案中去除。

用以下輸入爲例。

/a.eoj
/fg
/bcd.e

構建出的字典樹結構如圖2所示。

Figure 2. 字典樹 \text{Figure 2. 字典樹}

3. 代碼

#include <iostream>
#include <set>
#include <string.h>
using namespace std;

const int NUM = 1e6 + 1;

// 用數組定義字典樹,存儲下一個字符的位置
int trie[NUM][28];
// 以某一字符串爲前綴的單詞的數量
int num[NUM] = { 0 };
// 當前新分配的存儲位置
int pos = 1;
// 存儲結果
set<string> st;

// 在字典數中插入某個單詞
void trieInsert(string str)
{
    int p = 0;
    for (int i = 0; str[i]; i++) {
        int n;
        if (str[i] == '/') {
            n = 26;
        } else if (str[i] == '.') {
            n = 27;
        } else {
            n = str[i] - 'a';
        }
        if (trie[p][n] == 0) {
            // 如果對應字符沒有值,存儲下一個索引的值
            trie[p][n] = pos++;
        }
        p = trie[p][n];
        num[p]++;
    }
}

// 返回以某個字符串爲前綴的單詞的數量
int trieFind(string str)
{
    int p = 0;
    for (int i = 0; str[i]; i++) {
        int n;
        if (str[i] == '/') {
            n = 26;
        } else if (str[i] == '.') {
            n = 27;
        } else {
            n = str[i] - 'a';
        }
        if (trie[p][n] == 0) {
            return 0;
        }
        p = trie[p][n];
    }
    return num[p];
}

// 先插入trie樹,再找每個.eoj結尾的文件是否有後綴,若有則刪去
int main()
{
    int t, length, ans = 0;
    cin >> t;
    string str;
    while (t--) {
        cin >> str;
        // 插入trie樹
        trieInsert(str);
        if ((length = str.length()) >= 4 && str.substr(length - 4, length) == ".eoj") {
            st.insert(str);
            ans++;
        }
    }
    // 去除文件夾
    for (auto it = st.begin(); it != st.end(); it++) {
        if (trieFind(*it) > 1) {
            // 這裏不能用st.erase,推測是因爲set使用的是類似鏈表的存儲結構
            // 把這個元素去除以後,就找不到下一個元素指針了
            // st.erase(it);
            ans--;
        }
    }
    cout << ans << endl;

    return 0;
}


聯繫郵箱:[email protected]

CSDN:https://me.csdn.net/qq_41729780

知乎:https://zhuanlan.zhihu.com/c_1225417532351741952

公衆號:複雜網絡與機器學習

歡迎關注/轉載,有問題歡迎通過郵箱交流。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章