CALIPSO數據批量下載方法

簡介

CALIPSO數據分發由最初的FTP下載,目前已經升級爲HTTPS協議下載,這估計也是大勢所處。使用HTTPS下載沒有FTP那麼方便,這裏需要用到WGET程序進行下載。

CALIPSO數據定製以及簡單介紹

可以在前期博客中找到簡介,看這篇博文前建議首先閱讀。
上篇CALIPSO下載方法。

wget下載方法

wget for windows 安裝

GNU Wget是一個在網絡上進行下載的簡單而強大的自由軟件,其本身也是GNU計劃的一部分。它的名字是“World Wide Web”和“Get”的結合,同時也隱含了軟件的主要功能。目前它支持通過HTTP、HTTPS,以及FTP這三個最常見的TCP/IP協議協議下載。

下載、安裝wget

1、下載wget
下載地址:https://eternallybored.org/misc/wget/
2、將下載好的壓縮包解壓,將其中的wget.exe放置在 C:\Windows\System32 目錄下
3、打開cmd 執行wget指令

Microsoft Windows [版本 6.1.7601]
版權所有 (c) 2009 Microsoft Corporation。保留所有權利。
H:\>wget --help
GNU Wget 1.20.3, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...

Mandatory arguments to long options are mandatory for short options too.

Startup:
  -V,  --version                   display the version of Wget and exit
  -h,  --help                      print this help
  -b,  --background                go to background after startup
  -e,  --execute=COMMAND           execute a `.wgetrc'-style command

Logging and input file:
  -o,  --output-file=FILE          log messages to FILE
  -a,  --append-output=FILE        append messages to FILE
  -d,  --debug                     print lots of debugging information
  -q,  --quiet                     quiet (no output)
  -v,  --verbose                   be verbose (this is the default)
  -nv, --no-verbose                turn off verboseness, without being quiet
       --report-speed=TYPE         output bandwidth as TYPE.  TYPE can be bits
  -i,  --input-file=FILE           download URLs found in local or external FILE

       --input-metalink=FILE       download files covered in local Metalink FILE

  -F,  --force-html                treat input file as HTML
  -B,  --base=URL                  resolves HTML input-file links (-i -F)
                                     relative to URL
       --config=FILE               specify config file to use
       --no-config                 do not read any config file
       --rejected-log=FILE         log reasons for URL rejection to FILE

Download:
  -t,  --tries=NUMBER              set number of retries to NUMBER (0 unlimits)
       --retry-connrefused         retry even if connection is refused
       --retry-on-http-error=ERRORS    comma-separated list of HTTP errors to re
try
  -O,  --output-document=FILE      write documents to FILE
  -nc, --no-clobber                skip downloads that would download to
                                     existing files (overwriting them)
       --no-netrc                  don't try to obtain credentials from .netrc
  -c,  --continue                  resume getting a partially-downloaded file
       --start-pos=OFFSET          start downloading from zero-based position OF
FSET
       --progress=TYPE             select progress gauge type
       --show-progress             display the progress bar in any verbosity mod
e
  -N,  --timestamping              don't re-retrieve files unless newer than
                                     local
       --no-if-modified-since      don't use conditional if-modified-since get
                                     requests in timestamping mode
       --no-use-server-timestamps  don't set the local file's timestamp by
                                     the one on the server
  -S,  --server-response           print server response
       --spider                    don't download anything
  -T,  --timeout=SECONDS           set all timeout values to SECONDS
       --dns-timeout=SECS          set the DNS lookup timeout to SECS
       --connect-timeout=SECS      set the connect timeout to SECS
       --read-timeout=SECS         set the read timeout to SECS
  -w,  --wait=SECONDS              wait SECONDS between retrievals
       --waitretry=SECONDS         wait 1..SECONDS between retries of a retrieva
l
       --random-wait               wait from 0.5*WAIT...1.5*WAIT secs between re
trievals
       --no-proxy                  explicitly turn off proxy
  -Q,  --quota=NUMBER              set retrieval quota to NUMBER
       --bind-address=ADDRESS      bind to ADDRESS (hostname or IP) on local hos
t
       --limit-rate=RATE           limit download rate to RATE
       --no-dns-cache              disable caching DNS lookups
       --restrict-file-names=OS    restrict chars in file names to ones OS allow
s
       --ignore-case               ignore case when matching files/directories
  -4,  --inet4-only                connect only to IPv4 addresses
  -6,  --inet6-only                connect only to IPv6 addresses
       --prefer-family=FAMILY      connect first to addresses of specified famil
y,
                                     one of IPv6, IPv4, or none
       --user=USER                 set both ftp and http user to USER
       --password=PASS             set both ftp and http password to PASS
       --ask-password              prompt for passwords
       --use-askpass=COMMAND       specify credential handler for requesting
                                     username and password.  If no COMMAND is
                                     specified the WGET_ASKPASS or the SSH_ASKPA
SS
                                     environment variable is used.
       --no-iri                    turn off IRI support
       --local-encoding=ENC        use ENC as the local encoding for IRIs
       --remote-encoding=ENC       use ENC as the default remote encoding
       --unlink                    remove file before clobber
       --keep-badhash              keep files with checksum mismatch (append .ba
dhash)
       --metalink-index=NUMBER     Metalink application/metalink4+xml metaurl or
dinal NUMBER
       --metalink-over-http        use Metalink metadata from HTTP response head
ers
       --preferred-location        preferred location for Metalink resources

Directories:
  -nd, --no-directories            don't create directories
  -x,  --force-directories         force creation of directories
  -nH, --no-host-directories       don't create host directories
       --protocol-directories      use protocol name in directories
  -P,  --directory-prefix=PREFIX   save files to PREFIX/..
       --cut-dirs=NUMBER           ignore NUMBER remote directory components

HTTP options:
       --http-user=USER            set http user to USER
       --http-password=PASS        set http password to PASS
       --no-cache                  disallow server-cached data
       --default-page=NAME         change the default page name (normally
                                     this is 'index.html'.)
  -E,  --adjust-extension          save HTML/CSS documents with proper extension
s
       --ignore-length             ignore 'Content-Length' header field
       --header=STRING             insert STRING among the headers
       --compression=TYPE          choose compression, one of auto, gzip and non
e. (default: none)
       --max-redirect              maximum redirections allowed per page
       --proxy-user=USER           set USER as proxy username
       --proxy-password=PASS       set PASS as proxy password
       --referer=URL               include 'Referer: URL' header in HTTP request

       --save-headers              save the HTTP headers to file
  -U,  --user-agent=AGENT          identify as AGENT instead of Wget/VERSION
       --no-http-keep-alive        disable HTTP keep-alive (persistent connectio
ns)
       --no-cookies                don't use cookies
       --load-cookies=FILE         load cookies from FILE before session
       --save-cookies=FILE         save cookies to FILE after session
       --keep-session-cookies      load and save session (non-permanent) cookies

       --post-data=STRING          use the POST method; send STRING as the data
       --post-file=FILE            use the POST method; send contents of FILE
       --method=HTTPMethod         use method "HTTPMethod" in the request
       --body-data=STRING          send STRING as data. --method MUST be set
       --body-file=FILE            send contents of FILE. --method MUST be set
       --content-disposition       honor the Content-Disposition header when
                                     choosing local file names (EXPERIMENTAL)
       --content-on-error          output the received content on server errors
       --auth-no-challenge         send Basic HTTP authentication information
                                     without first waiting for the server's
                                     challenge

HTTPS (SSL/TLS) options:
       --secure-protocol=PR        choose secure protocol, one of auto, SSLv2,
                                     SSLv3, TLSv1, TLSv1_1, TLSv1_2 and PFS
       --https-only                only follow secure HTTPS links
       --no-check-certificate      don't validate the server's certificate
       --certificate=FILE          client certificate file
       --certificate-type=TYPE     client certificate type, PEM or DER
       --private-key=FILE          private key file
       --private-key-type=TYPE     private key type, PEM or DER
       --ca-certificate=FILE       file with the bundle of CAs
       --ca-directory=DIR          directory where hash list of CAs is stored
       --crl-file=FILE             file with bundle of CRLs
       --pinnedpubkey=FILE/HASHES  Public key (PEM/DER) file, or any number
                                   of base64 encoded sha256 hashes preceded by
                                   'sha256//' and separated by ';', to verify
                                   peer against
       --random-file=FILE          file with random data for seeding the SSL PRN
G

       --ciphers=STR           Set the priority string (GnuTLS) or cipher list s
tring (OpenSSL) directly.
                                   Use with care. This option overrides --secure
-protocol.
                                   The format and syntax of this string depend o
n the specific SSL/TLS engine.
HSTS options:
       --no-hsts                   disable HSTS
       --hsts-file                 path of HSTS database (will override default)


FTP options:
       --ftp-user=USER             set ftp user to USER
       --ftp-password=PASS         set ftp password to PASS
       --no-remove-listing         don't remove '.listing' files
       --no-glob                   turn off FTP file name globbing
       --no-passive-ftp            disable the "passive" transfer mode
       --preserve-permissions      preserve remote file permissions
       --retr-symlinks             when recursing, get linked-to files (not dir)


FTPS options:
       --ftps-implicit                 use implicit FTPS (default port is 990)
       --ftps-resume-ssl               resume the SSL/TLS session started in the
 control connection when
                                         opening a data connection
       --ftps-clear-data-connection    cipher the control channel only; all the
data will be in plaintext
       --ftps-fallback-to-ftp          fall back to FTP if FTPS is not supported
 in the target server
WARC options:
       --warc-file=FILENAME        save request/response data to a .warc.gz file

       --warc-header=STRING        insert STRING into the warcinfo record
       --warc-max-size=NUMBER      set maximum size of WARC files to NUMBER
       --warc-cdx                  write CDX index files
       --warc-dedup=FILENAME       do not store records listed in this CDX file
       --no-warc-compression       do not compress WARC files with GZIP
       --no-warc-digests           do not calculate SHA1 digests
       --no-warc-keep-log          do not store the log file in a WARC record
       --warc-tempdir=DIRECTORY    location for temporary files created by the
                                     WARC writer

Recursive download:
  -r,  --recursive                 specify recursive download
  -l,  --level=NUMBER              maximum recursion depth (inf or 0 for infinit
e)
       --delete-after              delete files locally after downloading them
  -k,  --convert-links             make links in downloaded HTML or CSS point to

                                     local files
       --convert-file-only         convert the file part of the URLs only (usual
ly known as the basename)
       --backups=N                 before writing file X, rotate up to N backup
files
  -K,  --backup-converted          before converting file X, back up as X.orig
  -m,  --mirror                    shortcut for -N -r -l inf --no-remove-listing

  -p,  --page-requisites           get all images, etc. needed to display HTML p
age
       --strict-comments           turn on strict (SGML) handling of HTML commen
ts

Recursive accept/reject:
  -A,  --accept=LIST               comma-separated list of accepted extensions
  -R,  --reject=LIST               comma-separated list of rejected extensions
       --accept-regex=REGEX        regex matching accepted URLs
       --reject-regex=REGEX        regex matching rejected URLs
       --regex-type=TYPE           regex type (posix|pcre)
  -D,  --domains=LIST              comma-separated list of accepted domains
       --exclude-domains=LIST      comma-separated list of rejected domains
       --follow-ftp                follow FTP links from HTML documents
       --follow-tags=LIST          comma-separated list of followed HTML tags
       --ignore-tags=LIST          comma-separated list of ignored HTML tags
  -H,  --span-hosts                go to foreign hosts when recursive
  -L,  --relative                  follow relative links only
  -I,  --include-directories=LIST  list of allowed directories
       --trust-server-names        use the name specified by the redirection
                                     URL's last component
  -X,  --exclude-directories=LIST  list of excluded directories
  -np, --no-parent                 don't ascend to the parent directory

Email bug reports, questions, discussions to <[email protected]>
and/or open issues at https://savannah.gnu.org/bugs/?func=additem&group=wget.                                                                     

CALIPSO數據下載

提交定製需求後,稍許會收到 Langley 返回到郵件信息。
在這裏插入圖片描述

中文函數指令介紹

* 啓動 
  -V,  --version           顯示wget的版本後退出
  -h,  --help              打印語法幫助
  -b,  --background        啓動後轉入後臺執行
  -e,  --execute=COMMAND   執行`.wgetrc'格式的命令,wgetrc格式參見/etc/wgetrc或~/.wgetrc
* 記錄和輸入文件 
  -o,  --output-file=FILE     把記錄寫到FILE文件中
  -a,  --append-output=FILE   把記錄追加到FILE文件中
  -d,  --debug                打印調試輸出
  -q,  --quiet                安靜模式(沒有輸出)
  -v,  --verbose              冗長模式(這是缺省設置)
  -nv, --non-verbose          關掉冗長模式,但不是安靜模式
  -i,  --input-file=FILE      下載在FILE文件中出現的URLs
  -F,  --force-html           把輸入文件當作HTML格式文件對待
  -B,  --base=URL             將URL作爲在-F -i參數指定的文件中出現的相對鏈接的前綴
       --sslcertfile=FILE     可選客戶端證書
       --sslcertkey=KEYFILE   可選客戶端證書的KEYFILE
       --egd-file=FILE        指定EGD socket的文件名
* 下載 
       --bind-address=ADDRESS   指定本地使用地址(主機名或IP,當本地有多個IP或名字時使用)
  -t,  --tries=NUMBER           設定最大嘗試鏈接次數(0 表示無限制).
  -O   --output-document=FILE   把文檔寫到FILE文件中
  -nc, --no-clobber             不要覆蓋存在的文件或使用.#前綴
  -c,  --continue               接着下載沒下載完的文件
       --progress=TYPE          設定進程條標記
  -N,  --timestamping           不要重新下載文件除非比本地文件新
  -S,  --server-response        打印服務器的迴應
       --spider                 不下載任何東西
  -T,  --timeout=SECONDS        設定響應超時的秒數
  -w,  --wait=SECONDS           兩次嘗試之間間隔SECONDS秒
       --waitretry=SECONDS      在重新鏈接之間等待1...SECONDS秒
       --random-wait            在下載之間等待0...2*WAIT秒
  -Y,  --proxy=on/off           打開或關閉代理
  -Q,  --quota=NUMBER           設置下載的容量限制
       --limit-rate=RATE        限定下載輸率
* 目錄 
  -nd  --no-directories            不創建目錄
  -x,  --force-directories         強制創建目錄
  -nH, --no-host-directories       不創建主機目錄
  -P,  --directory-prefix=PREFIX   將文件保存到目錄 PREFIX/...
       --cut-dirs=NUMBER           忽略 NUMBER層遠程目錄
* HTTP 選項 
       --http-user=USER      設定HTTP用戶名爲 USER.
       --http-passwd=PASS    設定http密碼爲 PASS.
  -C,  --cache=on/off        允許/不允許服務器端的數據緩存 (一般情況下允許).
  -E,  --html-extension      將所有text/html文檔以.html擴展名保存
       --ignore-length       忽略 `Content-Length'頭域
       --header=STRING       在headers中插入字符串 STRING
       --proxy-user=USER     設定代理的用戶名爲 USER
       --proxy-passwd=PASS   設定代理的密碼爲 PASS
       --referer=URL         在HTTP請求中包含 `Referer: URL'頭
  -s,  --save-headers        保存HTTP頭到文件
  -U,  --user-agent=AGENT    設定代理的名稱爲 AGENT而不是 Wget/VERSION.
       --no-http-keep-alive  關閉 HTTP活動鏈接 (永遠鏈接).
       --cookies=off         不使用 cookies.
       --load-cookies=FILE   在開始會話前從文件 FILE中加載cookie
       --save-cookies=FILE   在會話結束後將 cookies保存到 FILE文件中
* FTP 選項 
  -nr, --dont-remove-listing   不移走 `.listing'文件
  -g,  --glob=on/off           打開或關閉文件名的 globbing機制
       --passive-ftp           使用被動傳輸模式 (缺省值).
       --active-ftp            使用主動傳輸模式
       --retr-symlinks         在遞歸的時候,將鏈接指向文件(而不是目錄)
* 遞歸下載 
  -r,  --recursive          遞歸下載--慎用!
  -l,  --level=NUMBER       最大遞歸深度 (inf 或 0 代表無窮).
       --delete-after       在現在完畢後局部刪除文件
  -k,  --convert-links      轉換非相對鏈接爲相對鏈接
  -K,  --backup-converted   在轉換文件X之前,將之備份爲 X.orig
  -m,  --mirror             等價於 -r -N -l inf -nr.
  -p,  --page-requisites    下載顯示HTML文件的所有圖片
* 遞歸下載中的包含和不包含(accept/reject) 
  -A,  --accept=LIST                分號分隔的被接受擴展名的列表
  -R,  --reject=LIST                分號分隔的不被接受的擴展名的列表
  -D,  --domains=LIST               分號分隔的被接受域的列表
       --exclude-domains=LIST       分號分隔的不被接受的域的列表
       --follow-ftp                 跟蹤HTML文檔中的FTP鏈接
       --follow-tags=LIST           分號分隔的被跟蹤的HTML標籤的列表
  -G,  --ignore-tags=LIST           分號分隔的被忽略的HTML標籤的列表
  -H,  --span-hosts                 當遞歸時轉到外部主機
  -L,  --relative                   僅僅跟蹤相對鏈接
  -I,  --include-directories=LIST   允許目錄的列表
  -X,  --exclude-directories=LIST   不被包含目錄的列表
  -np, --no-parent                  不要追溯到父目錄

更多wget介紹可參考

https://blog.csdn.net/huyuan7494/article/details/77933136

下載方法

打開cmd,我們指定將上述郵件中的2020180233820_50640文件中的所有文件批量下載到H盤。
在cmd中輸入:

cd /d H:\

複製郵件中的代碼至cmd中,

wget -r -np -nH -w1 --cut-dirs=2 --reject="index.html*" https://xfr139.larc.nasa.gov/sflops/Distribution/2020180233820_50640/

運行情況
在這裏插入圖片描述

下載結果
在這裏插入圖片描述

總結

不知道是怎麼回事,只要開啓wget下載,過不了多久電腦就會自動關閉。重新嘗試了幾次,發現更是出現了電腦藍屏的現象。目前還不知道是否wget導致。鑑於wget難以使用,目前已經改用火狐瀏覽器外加downthemall插件。不管是速度還是操作上都是比較方便的,推薦使用火狐downthemall進行下載。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章