earthdata.nasa 各種對地觀測及再分析資料 python批下載方法(附代碼)

原創

2020-06-27 00:34

美國宇航局的earthdata數據網站下發布了各式各樣的數據集，公衆可免費下載
(https://search.earthdata.nasa.gov/search)
然而當我們需要大批量下載數據時，手動下載的方式就顯得捉襟見肘了
不同的數據集下給出了不同的批獲取方式，有的需要Linux系統支持，有的給出了C++的下載腳本等等。
那麼當然這些獲取方式對我們做遙感的就有點呵呵呵。。。
下面給出了一個簡單的python爬蟲，只要簡單的python庫支持就可以打遍全網數據~

一般來說，在這個網站上下載數據，當請求下載成功以後，它們會給出url下載鏈接列表，大概就是下圖這個樣子吧，也不一定都是這樣。

下載出來的就是放在txt這樣文件中的一條條下載鏈接
對了注意在這上面下載數據是需要註冊賬戶的
然後就通過python循環，一條條調取下載就行了~

# -*- coding: utf-8 -*-
import requests # get the requsts library from https://github.com/requests/requests
import re # 主要是可能會需要進行文件名字符串的匹配
 
 
# overriding requests.Session.rebuild_auth to mantain headers when redirected
# 一個自定義的類來輔助下載
class SessionWithHeaderRedirection(requests.Session):
 
    AUTH_HOST = 'urs.earthdata.nasa.gov'
 
    def __init__(self, username, password):
 
        super().__init__()
 
        self.auth = (username, password)

   # Overrides from the library to keep headers when redirected to or from
 
   # the NASA auth host.
 
    def rebuild_auth(self, prepared_request, response):
 
        headers = prepared_request.headers
 
        url = prepared_request.url

        if 'Authorization' in headers:
 
            original_parsed = requests.utils.urlparse(response.request.url)
 
            redirect_parsed = requests.utils.urlparse(url)

            if (original_parsed.hostname != redirect_parsed.hostname) and redirect_parsed.hostname != self.AUTH_HOST and original_parsed.hostname != self.AUTH_HOST:
 
                del headers['Authorization']

        return
# create session with the user credentials that will be used to authenticate access to the data
# 在這裏填入你在earthdata註冊的信息
username = "xxx"
password = "xxx"
# 相當於登錄
session = SessionWithHeaderRedirection(username, password)
# 打開你獲取的url文件，懂python的大佬應該都明白接來下在幹什麼了
f=open('xxx.txt','r')
urls=f.readlines()

for i in range(0,len(urls)):
    # the url of the file we wish to retrieve
    url = urls[i]
    url = url.replace('\n','')

    # extract the filename from the url to be used when saving the file
    # 自行對下載到的文件進行命名，注意要寫清楚自己下載的文件後綴如.hdf
    filename='xxxx.hdf'
    # submit the request using the session 
    # 下面就是一個從獲取到保存到本地的過程
    response = session.get(url, stream=True) 
    # response.status_code返回200則代表訪問成功
    print(response.status_code,'\t',filename) 
    with open(filename, 'wb') as fd: 
        for chunk in response.iter_content(chunk_size=1024*1024): 
            fd.write(chunk)
    fd.close()
f.close()

其實包括一些團隊自己搭建的網站，很多公佈在網上的數據集手動下載的都比較麻煩
像一些considerate的網站可能還會返給你url列表
如果實在沒有的，就只能扒網頁源碼看url了，然後爬取這些鏈接或者根據命名規則自己生成
一般沒有登錄要求的網站，實際上可以直接放在迅雷裏面，建一個下載任務然後下載，是不會被限速的
或者寫python來request，後面我如果記起來可能會再仔細講講

眼看着已經忙了大半年沒有更博了，今天在家突然心血來潮，這大概要從一隻蝙蝠講起了

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

earthdata.nasa 各種對地觀測及再分析資料 python批下載方法(附代碼)

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

Shell/Python中的用戶名獲取

earthdata.nasa 各種對地觀測及再分析資料 python批下載方法(附代碼)

ECMWF 歐洲中期天氣預報中心下載長序列氣象數據（溫度，風場等）

VIIRS-NPP夜間燈光遙感數據下載和預處理

PyQt5 如何改變各控件的疊置順序（有遮蓋情況）

Python實現方位四叉樹

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結