在字符集之間轉換文本文件的最佳方法? - Best way to convert text files between character sets?

問題:

What is the fastest, easiest tool or method to convert text files between character sets?在字符集之間轉換文本文件的最快、最簡單的工具或方法是什麼?

Specifically, I need to convert from UTF-8 to ISO-8859-15 and vice versa.具體來說,我需要從 UTF-8 轉換爲 ISO-8859-15,反之亦然。

Everything goes: one-liners in your favorite scripting language, command-line tools or other utilities for OS, web sites, etc.一切順利:您最喜歡的腳本語言、命令行工具或其他用於操作系統、網站等的實用程序。

Best solutions so far:迄今爲止的最佳解決方案:

On Linux/UNIX/OS X/cygwin:在 Linux/UNIX/OS X/cygwin 上:

  • Gnu iconv suggested by Troels Arvin is best used as a filter . Troels Arvin建議的 Gnu iconv最好用作過濾器 It seems to be universally available.它似乎是普遍可用的。 Example:例子:

    $ iconv -f UTF-8 -t ISO-8859-15 in.txt > out.txt

    As pointed out by Ben , there is an online converter using iconv .正如Ben所指出的,有一個使用 iconv在線轉換器

  • Gnu recode ( manual ) suggested by Cheekysoft will convert one or several files in-place . Cheekysoft建議的 Gnu recode手動)將就地轉換一個或多個文件 Example:例子:

    $ recode UTF8..ISO-8859-15 in.txt

    This one uses shorter aliases:這個使用較短的別名:

    $ recode utf8..l9 in.txt

    Recode also supports surfaces which can be used to convert between different line ending types and encodings: Recode 還支持可用於在不同行尾類型和編碼之間進行轉換的表面

    Convert newlines from LF (Unix) to CR-LF (DOS):將換行符從 LF (Unix) 轉換爲 CR-LF (DOS):

    $ recode ../CR-LF in.txt

    Base64 encode file: Base64 編碼文件:

    $ recode ../Base64 in.txt

    You can also combine them.您也可以將它們組合起來。

    Convert a Base64 encoded UTF8 file with Unix line endings to Base64 encoded Latin 1 file with Dos line endings:將帶有 Unix 行結尾的 Base64 編碼的 UTF8 文件轉換爲帶有 Dos 行結尾的 Base64 編碼的拉丁文 1 文件:

    $ recode utf8/Base64..l1/CR-LF/Base64 file.txt

On Windows with Powershell ( Jay Bazuzi ):在帶有Powershell ( Jay Bazuzi ) 的 Windows 上:

  • PS C:\\> gc -en utf8 in.txt | Out-File -en ascii out.txt

    (No ISO-8859-15 support though; it says that supported charsets are unicode, utf7, utf8, utf32, ascii, bigendianunicode, default, and oem.) (雖然不支持 ISO-8859-15;它說支持的字符集是 unicode、utf7、utf8、utf32、ascii、bigendianunicode、default 和 oem。)

Edit編輯

Do you mean iso-8859-1 support?你的意思是iso-8859-1支持嗎? Using "String" does this eg for vice versa使用“字符串”執行此操作,例如反之亦然

gc -en string in.txt | Out-File -en utf8 out.txt

Note: The possible enumeration values are "Unknown, String, Unicode, Byte, BigEndianUnicode, UTF8, UTF7, Ascii".注意:可能的枚舉值爲“Unknown、String、Unicode、Byte、BigEndianUnicode、UTF8、UTF7、Ascii”。


解決方案:

參考一: https://stackoom.com/question/Gs8
參考二: Best way to convert text files between character sets?
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章