文件壓縮

文件壓縮

  提高文件的壓縮率一直是人們追求的目標。近幾年有人提出了這樣一種算法,它雖然只是單純地對文件進行重排,本身並不壓縮文件,但是經這種算法調整後的文件在大多數情況下都能獲得比原來更大的壓縮率。

  該算法具體如下:對一個長度爲n的字符串S,首先根據它構造n個字符串,其中第i個字符串由將S的前i-1個字符置於末尾得到。然後把這n個字符串按照首字符從小到大排序,如果兩個字符串的首字符相等,則按照它們在S中的位置從小到大排序。排序後的字符串的尾字符可以組成一個新的字符串S',它的長度也是n,並且包含了S中的每一個字符。最後輸出S'以及S的首字符在S'中的位置p。舉例:

  S: example

  1、構造n個字符串

  example
  xamplee
  ampleex
  mpleexa
  pleexam
  leexamp
  eexampl

  2、將字符串排序

  ampleex
  example
  eexampl
  leexamp
  mpleexa
  pleexam
  xamplee

  3、輸出
  xelpame S'
  7    p

  由於英語單詞構造的特殊性,某些字母對出現的頻率很高,因此在S'中相同的字母有很大機率排在一起,從而提高S'的壓縮率。雖然這種算法利用了英語單詞的特性,然而在實踐的過程中,人們發現它幾乎適用於所有的文件壓縮。

  任務1:zip1.pas(zip1.exe)
  讀入字符串S,輸出S'和p。
  輸入文件zip1.in包含兩行,第1行是一個整數n(1 <=n<=10000),代表S的長度,第2行是字符串S。
  輸出文件zip1.out包含兩行,第1行是S',第2行是整數p。

  任務2:zip2.pas(zip2.exe)

  讀入S'和p,輸出字符串S。
  輸入文件zip2.in包含三行,第1行是一個整數n(1<=n<=10000),代表S'的長度,第2行是字符串S',第3行是整數p。
  輸出文件zip2.out僅包含一行S。
  輸入樣例1:
  7
  example

  輸出樣例1:
  xelpame
  7

  輸入樣例2:
  7
  xelpame
  7

  輸出樣例2:
  example

Solution:
1. S --> S'
Following is the main process to get S' from S.

                  -------             ------- 
          |example|           |ampleex|
          |xamplee|           |example|
       (1)  |ampleex|    (2)    |eexampl|   (3)
example(S) ====> |mpleexa|  =======> |leexamp| ======> xelpame (S')
          |pleexam|           |mpleexa|
          |leexamp|           |pleexam|
          |eexampl|           |xamplee|
                  -------             -------  
                      <A>                  <B>

                          Figure 1. S-S' Process
                       
If you look at list <A> carefully, you will find that the combination of first character of each word in the list is exactly S, i.e., 'example'. And sorting output of S is exactly the combination of the first character of each word in list <B>, i.e., 'example' -----sort------> 'aeelmpx'.

Now lood at the first and last character of each word in list <B> carefully, do not forget to refer to Figure 2 below, do you find anything interesting? The secret is that for each word in list <B>, the previous char of the first character of the word in the Figure 2 is the exactly the last character of the word, which is part of the result, S'.
Take the first word 'ampleex' in list <B> for example, the first char of 'ampleex' is 'a', in Figure 2, the previous char of 'a' is 'x', so the last char of 'ampleex' is also 'x' and 'x' becomes the first char of S'. Same idea, for the second word 'example', 'e''s previous char is also 'e', so the second char of S' is 'e'.

          |--------->---------------->-----------|
          |--[e]->[x]->[a]->[m]->[p]->[l]->[e]<--|
                         
                          Figure 2. Circle

After getting the secret, we can simplify the process as Figure 3. Then, how to get the position of the first character in S in the result S'? Actually, P equals the position of second character in S in <C>. Refer Figure 3.

      (1)sort each char             (2)get previous
                in S                         of each char
example(S) =====================> aeelmpx ================> xelpame (S')
                                    <C>

                          Figure 3. Simple S-S' Process

2. S' --> S

How to get S by S' and P? See Figure 4 below. We know that P is 7. So it points to the last 'e' in S'. Then we can get the S like this. first write the last 'e' in <D>, then see the what is the conrespondent of 'e' in <E>, it 'x', so we write 'x' and catenate it to 'e'. Now result is 'ex'. Then see what is the conrespondent of 'x' in <E>, it's 'a', catenate it to the result and get 'exa', same idea, continue the process. See Figure 5. which demos the process. The combination of the characters in square brackets is S.

                ----        ----
         | x |       | a |
         | e |       | e |  
         | l | sort  | e |
         | p | ====> | l |
         | a |       | m |
 start point   | m |       | p |
P ----------->  | e |       | x |
                ----        ---- 
                   <D>         <E>
             
                Figure 4. S' - S process


            [e] ---- x -|           
                    |-----<----|
                    V 
            [x] ---- a -|
                    |-----<----|
                    V 
            [a] ---- m -|
                    |-----<----|
                    V 
            [m] ---- p -|
                    |-----<----|
                    V 
            [p] ---- l -|
                    |-----<----|
                    V 
            [l] ---- e -|
                    |-----<----|
                    V 
            [e] ---- e
        
        Figure 4. S' - S process
3. Code

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章