python爬蟲入門一:HTTP、網頁基礎、requests、API、JS

Datawhale 爬蟲綜合實踐-Task04 HTTP、網頁基礎、requests、API、JS

1. 學習內容

  1. 互聯網、HTTP
  2. 網頁基礎
  3. requests
  4. 使用API
  5. JS入門

2. 互聯網、HTTP

2.1 互聯網

互聯網也叫因特網(Internet),是指網絡與網絡所串聯成的龐大網絡,這些網絡以一組標準的網絡協議族相連,連接全世界幾十億個設備,形成邏輯上的單一巨大國際網絡。它由從地方到全球範圍內幾百萬個私人的、學術界的、企業的和政府的網絡所構成。通過電子、無線和光纖等一系列廣泛的技術來實現。這種將計算機網絡互相連接在一起的方法可稱作“網絡互聯”,在此基礎上發展出來的覆蓋全世界的全球性互聯網絡稱爲“互聯網”,即相互連接在一起的網絡。

提示:
互聯網並不等於萬維網(WWW),萬維網只是一個超文本相互鏈接而成的全球性系統,而且是互聯網所能提供的服務之一。互聯網包含廣泛的信息資源和服務,例如相互關聯的超文本文件,還有萬維網的應用,支持電子郵件的基礎設施、點對點網絡、文件共享,以及IP電話服務。

2.2 HTTP

HTTP是一個客戶端(用戶)和服務器端(網站)之間進行請求和應答的標準。通過使用網頁瀏覽器、網絡爬蟲或者其他工具,客戶端可以向服務器上的指定端口(默認端口爲80)發起一個HTTP請求。這個客戶端成爲客戶代理(user agent)。應答服務器上存儲着一些資源碼,比如HTML文件和圖像。這個應答服務器成爲源服務器(origin server)。在用戶代理和源服務器中間可能存在多個“中間層”,比如代理服務器、網關或者隧道(tunnel)。儘管TCP/IP是互聯網最流行的協議,但HTTP中並沒有規定必須使用它或它支持的層。

事實上。HTTP可以在互聯網協議或其他網絡上實現。HTTP假定其下層協議能夠提供可靠的傳輸,因此,任何能夠提供這種保證的協議都可以使用。使用TCP/IP協議族時RCP作爲傳輸層。通常由HTTP客戶端發起一個請求,創建一個到服務器指定端口(默認是80端口)的TCP鏈接。HTTP服務器則在該端口監聽客戶端的請求。一旦收到請求,服務器會向客戶端返回一個狀態(比如“THTTP/1.1 200 OK”),以及請求的文件、錯誤信息等響應內容。

HTTP的請求方法有很多種,主要包括以下幾個:

  • GET:向指定的資源發出“顯示”請求。GET方法應該只用於讀取數據,而不應當被用於“副作用”的操作中(例如在Web Application中)。其中一個原因是GET可能會被網絡蜘蛛等隨意訪問。

  • HEAD:與GET方法一樣,都是向服務器發出直頂資源的請求,只不過服務器將不會出傳回資源的內容部分。它的好處在於,使用這個方法可以在不必傳輸內容的情況下,將獲取到其中“關於該資源的信息”(元信息或元數據)。

  • POST:向指定資源提交數據,請求服務器進行處理(例如提交表單或者上傳文件)。數據被包含在請求文本中。這個請求可能會創建新的資源或修改現有資源,或二者皆有。

  • PUT:向指定資源位置上傳輸最新內容。

  • DELETE:請求服務器刪除Request-URL所標識的資源,或二者皆有。

  • TRACE:回顯服務器收到的請求,主要用於測試或診斷。

  • OPTIONS:這個方法可使服務器傳回該資源所支持的所有HTTP請求方法。用“*”來代表資源名稱向Web服務器發送OPTIONS請求,可以測試服務器共能是否正常。

  • CONNECT:HTTP/1.1 協議中預留給能夠將連接改爲管道方式的代理服務器。通常用於SSL加密服務器的連接(經由非加密的HTTP代理服務器)。方法名稱是區分大小寫的。當某個請求所針對的資源不支持對應的請求方法的時候,服務器應當返回狀態碼405(Method Not Allowed),當服務器不認識或者不支持對應的請求方法的時候,應當返回狀態碼501(Not Implemented)。

3. 網頁基礎

3.1 網頁組成

我們的數據來源是網頁,那麼我們在真正抓取數據之前,有必要先了解一下一個網頁的組成。

網頁是由 HTML 、 CSS 、JavaScript 組成的。

HTML 是用來搭建整個網頁的骨架,而 CSS 是爲了讓整個頁面更好看,包括我們看到的顏色,每個模塊的大小、位置等都是由 CSS 來控制的, JavaScript 是用來讓整個網頁“動起來”,這個動起來有兩層意思,一層是網頁的數據動態交互,還有一層是真正的動,比如我們都見過一些網頁上的動畫,一般都是由 JavaScript 配合 CSS 來完成的。

我們打開 Chrome 瀏覽器,隨便打開一個網站,打開 F12 開發者工具,可以看到:

在這裏插入圖片描述

在選項 Elements 中可以看到網頁的源代碼,這裏展示的就是 HTML 代碼。

不同類型的文字通過不同類型的標籤來表示,如圖片用 <img> 標籤表示,視頻用 <video> 標籤表示,段落用 <p> 標籤表示,它們之間的佈局又常通過佈局標籤 <div> 嵌套組合而成,各種標籤通過不同的排列和嵌套才形成了網頁的框架。

在右邊 Style 標籤頁中,顯示的就是當前選中的 HTML 代碼標籤的 CSS 層疊樣式,“層疊”是指當在HTML中引用了數個樣式文件,並且樣式發生衝突時,瀏覽器能依據層疊順序處理。“樣式”指網頁中文字大小、顏色、元素間距、排列等格式。

而 JavaScript 就厲害了,它在 HTML 代碼中通常使用 <script> 進行包裹,可以直接書寫在 HTML 頁面中,也可以以文件的形式引入。

3.2 網頁結構

我們來手寫一個簡單 HTML 頁面來感受下。
首先創建一個文本文件,將後綴名改爲 .html ,比如demo.html,寫入如下內容:

Demo

Hello World

Hello Python.
首先,整個文檔是以 DOCTYPE 來開頭的,這裏定義了文檔類型是 html ,整個文檔最外層的標籤是 <html> ,並且結尾還以 </html> 來表示閉和。

這裏簡單講一下,瀏覽器解析 HTML 的時候,並不強制需要每個標籤都一定要有閉和標籤,但是爲了語義明確,最好每個標籤都跟上對應的閉和標籤。大家可以嘗試刪除其中的閉和標籤進行嘗試,並不會影響瀏覽器的解析。

整個 HTML 文檔一般分爲 head 和 body 兩個部分,在 head 頭中,我們一般會指定當前的編碼格式爲 UTF-8 ,並且使用 title 來定義網頁的標題,這個會顯示在瀏覽器的標籤上面。

body 中的內容一般爲整個 html 文檔的正文,html的標籤由<h1>到<h6>六個標籤構成,字體由大到小遞減,換行標籤爲<br>,鏈接使用<a>來創建,herf屬性包含鏈接的URL地址,比如<a href=“http://www.baidu.com” >一個指向百度的鏈接</a>

一個指向百度的鏈接

大多數原色的屬性以“名稱-值”的形式成對出現,由“=”連接並寫在開始標籤元素名之後。值一般由單引號或雙引號包圍,有些值的內容包含特定字符,在html中可以去掉引號。不加引號的屬性值被認爲是不安全的。要注意的是,許多元素存在一些共同的屬性:

  • id 屬性爲元素提供在全文檔內的唯一標識。它用於識別元素,以便樣式表可以改變其外觀屬性,腳本可以改變、顯示或刪除其內容或格式化。對於添加到頁面的url,它爲元素提供了一個全局唯一識別,通常爲頁面的子章節。

  • class 屬性提供了一種將類似元素分類的方式,常被用於語義化或格式化。例如,一個html文檔可以指定class="標記"來表明所有具有這一類值得元素都屬於文檔的主文本。格式化後,這樣的元素可能會聚集在一起,並作爲頁面腳註而不會出現在html代碼中。類值也可以多值聲明。如class="標記 重要"將元素同時放入“標記”與“重要”兩類中。

  • style 屬性可以將表現性質賦予一個特定原色。比起使用id或class屬性從樣式表中選擇元素,“style”被認爲是一個更好的做法。

  • tile 屬性用於給元素一個附加的說明。大多數瀏覽器中這一屬性顯示爲工具提示。

將上面創建的.html文件用瀏覽器打開,這個頁面的顯示如下:

在這裏插入圖片描述## HTML DOM
在 HTML 中,所有標籤定義的內容都是節點,它們構成了一個 HTML DOM 樹。

根據 W3C 的 HTML DOM 標準,HTML 文檔中的所有內容都是節點:

  • 整個文檔是一個文檔節點

  • 每個 HTML 元素是元素節點

  • HTML 元素內的文本是文本節點

  • 每個 HTML 屬性是屬性節點

  • 註釋是註釋節點

HTML DOM 將 HTML 文檔視作樹結構。這種結構被稱爲節點樹:

在這裏插入圖片描述

通過 HTML DOM,樹中的所有節點均可通過 JavaScript 進行訪問。所有 HTML 元素(節點)均可被修改,也可以創建或刪除節點。

節點樹中的節點彼此擁有層級關係。

父(parent)、子(child)和同胞(sibling)等術語用於描述這些關係。父節點擁有子節點。同級的子節點被稱爲同胞(兄弟或姐妹)。

  • 在節點樹中,頂端節點被稱爲根(root)

  • 每個節點都有父節點、除了根(它沒有父節點)

  • 一個節點可擁有任意數量的子節點

  • 同胞是擁有相同父節點的節點

下面的圖片展示了節點樹的一部分,以及節點之間的關係:

在這裏插入圖片描述

3.3 CSS

前面我們介紹到 CSS 可以用來美化網頁,那麼我們簡單加一點 CSS 修改下頁面的顯示效果。

Demo

Hello World

Hello Python.
我們在 head 中添加了 style 標籤,並註明其中的內容解析方式爲 CSS 。其中的內容的含義是讓文本居中顯示,先看下增加 CSS 後的頁面效果吧:

在這裏插入圖片描述

可以看到,原來居左的文字已經居中顯示了。

那麼,CSS 是如何表示它要修飾的文檔結構的呢?這就要說到 CSS 選擇器了。

在CSS中,我們使用CSS選擇器來定位節點。例如,上例中 div 節點的 id 爲 container ,那麼就可以表示爲 #container ,其中 # 開頭代表選擇 id ,其後緊跟 id 的名稱。另外,如果我們想選擇 class 爲 wrapper 的節點,便可以使用 .wrapper ,這裏以點 . 開頭代表選擇 class ,其後緊跟 class 的名稱。

另外, CSS 選擇器還支持嵌套選擇,各個選擇器之間加上空格分隔開便可以代表嵌套關係,如 #container .wrapper p 則代表先選擇 id 爲 container 的節點,然後選中其內部的 class 爲 wrapper 的節點,然後再進一步選中其內部的 p 節點。另外,如果不加空格,則代表並列關係,如 div#container .wrapper p.text 代表先選擇 id 爲 container 的 div 節點,然後選中其內部的 class 爲 wrapper 的節點,再進一步選中其內部的 class 爲 text 的 p 節點。這就是 CSS 選擇器,其篩選功能還是非常強大的。

如果時間充裕或者想深入學習html和css的同學可以到 菜鳥教程中繼續學習

3.4 使用開發者工具檢查網頁

如果想要編寫一個爬取網頁內容的爬蟲程序,在動手編寫前,最重要的準備工作可能就是檢查目標網頁了。下面以Chrome爲例,看看如何使用開發者工具。以python官網的“python之禪”爲例,首先在Chrome中打開https://www.python.org/dev/peps/pep-0020/ ,可以選擇“菜單”中的“更多工具”\rightarrow“開發者工具”,也可以直接在網頁內容中右擊並選擇“檢查”選項,還可以按f12鍵。效果如下圖所示。

在這裏插入圖片描述

Chrome的開發者模式爲用戶提供了下面幾組工具。

  • Elements:允許用戶從瀏覽器的角度來觀察網頁,用戶可以藉此看到Chrome渲染頁面所需要的HTML、CSS和DOM(Document Object Model)對象。

  • Network:可以看到網頁向服務氣請求了哪些資源、資源的大小以及加載資源的相關信息。此外,還可以查看HTTP的請求頭、返回內容等。

  • Source:即源代碼面板,主要用來調試JavaScript。

  • Console:即控制檯面板,可以顯示各種警告與錯誤信息。在開發期間,可以使用控制檯面板記錄診斷信息,或者使用它作爲shell在頁面上與JavaScript交互。

  • Performance:使用這個模塊可以記錄和查看網站生命週期內發生的各種事情來提高頁面運行時的性能。

  • Memory:這個面板可以提供比Performance更多的信息,比如跟蹤內存泄漏。

  • Application:檢查加載的所有資源。

  • Security:即安全面板,可以用來處理證書問題等。

另外,通過切換設備模式可以觀察網頁在不同設備上的顯示效果,快捷鍵爲:Ctrl + Shift + M(或者在 Mac上使用 Cmd + Shift + M),如下圖所示。

在這裏插入圖片描述

在“Elements”面板中,開發者可以檢查和編輯頁面的HTML與CSS。選中並雙擊元素就可以編輯元素了,比如將“python”這幾個字去掉,右鍵該元素,選擇“Delete Element”,效果如下圖所示:

在這裏插入圖片描述

當然,右擊後還有很多操作,值得一提的是快捷菜單中的“Copy XPath”選項。由於XPath是解析網頁的利器,因此Chrome中的這個功能對於爬蟲程序編寫而言就顯得十分實用和方便了。

使用“Network”工具可以清楚地查看網頁加載網絡資源地過程和相關信息。請求的每個資源在“Network”表格中顯示爲一行,對於某個特定的網絡請求,可以進一步查看請求頭、響應頭及已經返回的內容等信息。對於需要填寫併發送表單的網頁而言(比如執行用戶登錄操作,以百度貼吧爲例),在“Network”面板勾選“Preserve log”複選框,然後進行登錄,就可以記錄HTTP POST信息,查看發送的表單信息詳情。之後在貼吧首頁開啓開發者工具後再登錄時,就可以看到下圖所示的信息,其中“Form Data”就包含向服務器發送的表單信息詳情。

在這裏插入圖片描述
另外“Network”中的“Preview”也是比較常用,可以用來預覽數據。當然,Chrome瀏覽器的開發者工具還有很多更爲複雜的內容,這裏就不再進行講述了,大家需要用到的時候再去學習即可。

4. requests

4.1 requests.get

下面我們介紹一下requests庫的基本應用,首先講解一下requests.get

我們的目的是爬取出python之禪
在這裏插入圖片描述

一個網絡爬蟲程序最普遍的過程:

  1. 訪問站點;
  2. 定位所需的信息;
  3. 得到並處理信息。
import requests
url = 'https://www.python.org/dev/peps/pep-0020/'
res = requests.get(url)
text = res.text
text
'<!doctype html>\n<!--[if lt IE 7]>   <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9">   <![endif]-->\n<!--[if IE 7]>      <html class="no-js ie7 lt-ie8 lt-ie9">          <![endif]-->\n<!--[if IE 8]>      <html class="no-js ie8 lt-ie9">                 <![endif]-->\n<!--[if gt IE 8]><!--><html class="no-js" lang="en" dir="ltr">  <!--<![endif]-->\n\n<head>\n    <meta charset="utf-8">\n    <meta http-equiv="X-UA-Compatible" content="IE=edge">\n\n    <link rel="prefetch" href="//ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js">\n\n    <meta name="application-name" content="Python.org">\n    <meta name="msapplication-tooltip" content="The official home of the Python Programming Language">\n    <meta name="apple-mobile-web-app-title" content="Python.org">\n    <meta name="apple-mobile-web-app-capable" content="yes">\n    <meta name="apple-mobile-web-app-status-bar-style" content="black">\n\n    <meta name="viewport" content="width=device-width, initial-scale=1.0">\n    <meta name="HandheldFriendly" content="True">\n    <meta name="format-detection" content="telephone=no">\n    <meta http-equiv="cleartype" content="on">\n    <meta http-equiv="imagetoolbar" content="false">\n\n    <script src="/static/js/libs/modernizr.js"></script>\n\n    <link href="/static/stylesheets/style.67f4b30f7483.css" rel="stylesheet" type="text/css" title="default" />\n    <link href="/static/stylesheets/mq.3ae8e02ece5b.css" rel="stylesheet" type="text/css" media="not print, braille, embossed, speech, tty" />\n    \n\n    <!--[if (lte IE 8)&(!IEMobile)]>\n    <link href="/static/stylesheets/no-mq.fcf414dc68a3.css" rel="stylesheet" type="text/css" media="screen" />\n    \n    \n    <![endif]-->\n\n    \n    <link rel="icon" type="image/x-icon" href="/static/favicon.ico">\n    <link rel="apple-touch-icon-precomposed" sizes="144x144" href="/static/apple-touch-icon-144x144-precomposed.png">\n    <link rel="apple-touch-icon-precomposed" sizes="114x114" href="/static/apple-touch-icon-114x114-precomposed.png">\n    <link rel="apple-touch-icon-precomposed" sizes="72x72" href="/static/apple-touch-icon-72x72-precomposed.png">\n    <link rel="apple-touch-icon-precomposed" href="/static/apple-touch-icon-precomposed.png">\n    <link rel="apple-touch-icon" href="/static/apple-touch-icon-precomposed.png">\n\n    \n    <meta name="msapplication-TileImage" content="/static/metro-icon-144x144-precomposed.png"><!-- white shape -->\n    <meta name="msapplication-TileColor" content="#3673a5"><!-- python blue -->\n    <meta name="msapplication-navbutton-color" content="#3673a5">\n\n    <title>PEP 20 -- The Zen of Python | Python.org</title>\n\n    <meta name="description" content="The official home of the Python Programming Language">\n    <meta name="keywords" content="Python programming language object oriented web free open source software license documentation download community">\n\n    \n    <meta property="og:type" content="website">\n    <meta property="og:site_name" content="Python.org">\n    <meta property="og:title" content="PEP 20 -- The Zen of Python">\n    <meta property="og:description" content="The official home of the Python Programming Language">\n    \n    <meta property="og:image" content="https://www.python.org/static/opengraph-icon-200x200.png">\n    <meta property="og:image:secure_url" content="https://www.python.org/static/opengraph-icon-200x200.png">\n    \n    <meta property="og:url" content="https://www.python.org/dev/peps/pep-0020/">\n\n    <link rel="author" href="/static/humans.txt">\n\n    <link rel="alternate" type="application/rss+xml" title="Python Enhancement Proposals"\n          href="https://www.python.org/dev/peps/peps.rss/">\n    <link rel="alternate" type="application/rss+xml" title="Python Job Opportunities"\n          href="https://www.python.org/jobs/feed/rss/">\n    <link rel="alternate" type="application/rss+xml" title="Python Software Foundation News"\n          href="https://feeds.feedburner.com/PythonSoftwareFoundationNews">\n    <link rel="alternate" type="application/rss+xml" title="Python Insider"\n          href="https://feeds.feedburner.com/PythonInsider">\n\n    \n\n    \n    <script type="application/ld+json">\n     {\n       "@context": "https://schema.org",\n       "@type": "WebSite",\n       "url": "https://www.python.org/",\n       "potentialAction": {\n         "@type": "SearchAction",\n         "target": "https://www.python.org/search/?q={search_term_string}",\n         "query-input": "required name=search_term_string"\n       }\n     }\n    </script>\n\n    \n    <script type="text/javascript">\n    var _gaq = _gaq || [];\n    _gaq.push([\'_setAccount\', \'UA-39055973-1\']);\n    _gaq.push([\'_trackPageview\']);\n\n    (function() {\n        var ga = document.createElement(\'script\'); ga.type = \'text/javascript\'; ga.async = true;\n        ga.src = (\'https:\' == document.location.protocol ? \'https://ssl\' : \'http://www\') + \'.google-analytics.com/ga.js\';\n        var s = document.getElementsByTagName(\'script\')[0]; s.parentNode.insertBefore(ga, s);\n    })();\n    </script>\n    \n</head>\n\n<body class="python pages pep-page">\n\n    <div id="touchnav-wrapper">\n\n        <div id="nojs" class="do-not-print">\n            <p><strong>Notice:</strong> While Javascript is not essential for this website, your interaction with the content will be limited. Please turn Javascript on for the full experience. </p>\n        </div>\n\n        <!--[if lte IE 8]>\n        <div id="oldie-warning" class="do-not-print">\n            <p>\n                <strong>Notice:</strong> Your browser is <em>ancient</em>. Please\n                <a href="http://browsehappy.com/">upgrade to a different browser</a> to experience a better web.\n            </p>\n        </div>\n        <![endif]-->\n\n        <!-- Sister Site Links -->\n        <div id="top" class="top-bar do-not-print">\n\n            <nav class="meta-navigation container" role="navigation">\n\n                \n                <div class="skip-link screen-reader-text">\n                    <a href="#content" title="Skip to content">Skip to content</a>\n                </div>\n\n                \n                <a id="close-python-network" class="jump-link" href="#python-network" aria-hidden="true">\n                    <span aria-hidden="true" class="icon-arrow-down"><span>&#9660;</span></span> Close\n                </a>\n\n                \n\n<ul class="menu" role="tree">\n    \n    <li class="python-meta ">\n        <a href="/" title="The Python Programming Language" >Python</a>\n    </li>\n    \n    <li class="psf-meta ">\n        <a href="/psf-landing/" title="The Python Software Foundation" >PSF</a>\n    </li>\n    \n    <li class="docs-meta ">\n        <a href="https://docs.python.org" title="Python Documentation" >Docs</a>\n    </li>\n    \n    <li class="pypi-meta ">\n        <a href="https://pypi.python.org/" title="Python Package Index" >PyPI</a>\n    </li>\n    \n    <li class="jobs-meta ">\n        <a href="/jobs/" title="Python Job Board" >Jobs</a>\n    </li>\n    \n    <li class="shop-meta ">\n        <a href="/community/" title="Python Community" >Community</a>\n    </li>\n    \n</ul>\n\n\n                <a id="python-network" class="jump-link" href="#top" aria-hidden="true">\n                    <span aria-hidden="true" class="icon-arrow-up"><span>&#9650;</span></span> The Python Network\n                </a>\n\n            </nav>\n\n        </div>\n\n        <!-- Header elements -->\n        <header class="main-header" role="banner">\n            <div class="container">\n\n                <h1 class="site-headline">\n                    <a href="/"><img class="python-logo" src="/static/img/python-logo.png" alt="python&trade;"></a>\n                </h1>\n\n                <div class="options-bar-container do-not-print">\n                    <a href="https://psfmember.org/civicrm/contribute/transact?reset=1&id=2" class="donate-button">Donate</a>\n                    <div class="options-bar">\n                        \n                        <a id="site-map-link" class="jump-to-menu" href="#site-map"><span class="menu-icon">&equiv;</span> Menu</a><form class="search-the-site" action="/search/" method="get">\n                            <fieldset title="Search Python.org">\n\n                                <span aria-hidden="true" class="icon-search"></span>\n\n                                <label class="screen-reader-text" for="id-search-field">Search This Site</label>\n                                <input id="id-search-field" name="q" type="search" role="textbox" class="search-field" placeholder="Search" value="" tabindex="1">\n\n                                <button type="submit" name="submit" id="submit" class="search-button" title="Submit this Search" tabindex="3">\n                                    GO\n                                </button>\n\n                                \n                                <!--[if IE]><input type="text" style="display: none;" disabled="disabled" size="1" tabindex="4"><![endif]-->\n\n                            </fieldset>\n                        </form><span class="breaker"></span><div class="adjust-font-size" aria-hidden="true">\n                            <ul class="navigation menu" aria-label="Adjust Text Size on Page">\n                                <li class="tier-1 last" aria-haspopup="true">\n                                    <a href="#" class="action-trigger"><strong><small>A</small> A</strong></a>\n                                    <ul class="subnav menu">\n                                        <li class="tier-2 element-1" role="treeitem"><a class="text-shrink" title="Make Text Smaller" href="javascript:;">Smaller</a></li>\n                                        <li class="tier-2 element-2" role="treeitem"><a class="text-grow" title="Make Text Larger" href="javascript:;">Larger</a></li>\n                                        <li class="tier-2 element-3" role="treeitem"><a class="text-reset" title="Reset any font size changes I have made" href="javascript:;">Reset</a></li>\n                                    </ul>\n                                </li>\n                            </ul>\n                        </div><div class="winkwink-nudgenudge">\n                            <ul class="navigation menu" aria-label="Social Media Navigation">\n                                <li class="tier-1 last" aria-haspopup="true">\n                                    <a href="#" class="action-trigger">Socialize</a>\n                                    <ul class="subnav menu">\n                                        <li class="tier-2 element-1" role="treeitem"><a href="https://www.facebook.com/pythonlang?fref=ts"><span aria-hidden="true" class="icon-facebook"></span>Facebook</a></li>\n                                        <li class="tier-2 element-2" role="treeitem"><a href="https://twitter.com/ThePSF"><span aria-hidden="true" class="icon-twitter"></span>Twitter</a></li>\n                                        <li class="tier-2 element-3" role="treeitem"><a href="/community/irc/"><span aria-hidden="true" class="icon-freenode"></span>Chat on IRC</a></li>\n                                    </ul>\n                                </li>\n                            </ul>\n                        </div>\n                        <span data-html-include="/authenticated"></span>\n                    </div><!-- end options-bar -->\n                </div>\n\n                <nav id="mainnav" class="python-navigation main-navigation do-not-print" role="navigation">\n                    \n                        \n<ul class="navigation menu" role="menubar" aria-label="Main Navigation">\n  \n    \n    \n    <li id="about" class="tier-1 element-1  " aria-haspopup="true">\n        <a href="/about/" title="" class="">About</a>\n        \n            \n\n<ul class="subnav menu" role="menu" aria-hidden="true">\n    \n        <li class="tier-2 element-1" role="treeitem"><a href="/about/apps/" title="">Applications</a></li>\n    \n        <li class="tier-2 element-2" role="treeitem"><a href="/about/quotes/" title="">Quotes</a></li>\n    \n        <li class="tier-2 element-3" role="treeitem"><a href="/about/gettingstarted/" title="">Getting Started</a></li>\n    \n        <li class="tier-2 element-4" role="treeitem"><a href="/about/help/" title="">Help</a></li>\n    \n        <li class="tier-2 element-5" role="treeitem"><a href="http://brochure.getpython.info/" title="">Python Brochure</a></li>\n    \n</ul>\n\n        \n    </li>\n    \n    \n    \n    <li id="downloads" class="tier-1 element-2  " aria-haspopup="true">\n        <a href="/downloads/" title="" class="">Downloads</a>\n        \n            \n\n<ul class="subnav menu" role="menu" aria-hidden="true">\n    \n        <li class="tier-2 element-1" role="treeitem"><a href="/downloads/" title="">All releases</a></li>\n    \n        <li class="tier-2 element-2" role="treeitem"><a href="/downloads/source/" title="">Source code</a></li>\n    \n        <li class="tier-2 element-3" role="treeitem"><a href="/downloads/windows/" title="">Windows</a></li>\n    \n        <li class="tier-2 element-4" role="treeitem"><a href="/downloads/mac-osx/" title="">Mac OS X</a></li>\n    \n        <li class="tier-2 element-5" role="treeitem"><a href="/download/other/" title="">Other Platforms</a></li>\n    \n        <li class="tier-2 element-6" role="treeitem"><a href="https://docs.python.org/3/license.html" title="">License</a></li>\n    \n        <li class="tier-2 element-7" role="treeitem"><a href="/download/alternatives" title="">Alternative Implementations</a></li>\n    \n</ul>\n\n        \n    </li>\n    \n    \n    \n    <li id="documentation" class="tier-1 element-3  " aria-haspopup="true">\n        <a href="/doc/" title="" class="">Documentation</a>\n        \n            \n\n<ul class="subnav menu" role="menu" aria-hidden="true">\n    \n        <li class="tier-2 element-1" role="treeitem"><a href="/doc/" title="">Docs</a></li>\n    \n        <li class="tier-2 element-2" role="treeitem"><a href="/doc/av" title="">Audio/Visual Talks</a></li>\n    \n        <li class="tier-2 element-3" role="treeitem"><a href="https://wiki.python.org/moin/BeginnersGuide" title="">Beginner&#39;s Guide</a></li>\n    \n        <li class="tier-2 element-4" role="treeitem"><a href="https://devguide.python.org/" title="">Developer&#39;s Guide</a></li>\n    \n        <li class="tier-2 element-5" role="treeitem"><a href="https://docs.python.org/faq/" title="">FAQ</a></li>\n    \n        <li class="tier-2 element-6" role="treeitem"><a href="http://wiki.python.org/moin/Languages" title="">Non-English Docs</a></li>\n    \n        <li class="tier-2 element-7" role="treeitem"><a href="http://python.org/dev/peps/" title="">PEP Index</a></li>\n    \n        <li class="tier-2 element-8" role="treeitem"><a href="https://wiki.python.org/moin/PythonBooks" title="">Python Books</a></li>\n    \n        <li class="tier-2 element-9" role="treeitem"><a href="/doc/essays/" title="">Python Essays</a></li>\n    \n</ul>\n\n        \n    </li>\n    \n    \n    \n    <li id="community" class="tier-1 element-4  " aria-haspopup="true">\n        <a href="/community/" title="" class="">Community</a>\n        \n            \n\n<ul class="subnav menu" role="menu" aria-hidden="true">\n    \n        <li class="tier-2 element-1" role="treeitem"><a href="/community/survey" title="">Community Survey</a></li>\n    \n        <li class="tier-2 element-2" role="treeitem"><a href="/community/diversity/" title="">Diversity</a></li>\n    \n        <li class="tier-2 element-3" role="treeitem"><a href="/community/lists/" title="">Mailing Lists</a></li>\n    \n        <li class="tier-2 element-4" role="treeitem"><a href="/community/irc/" title="">IRC</a></li>\n    \n        <li class="tier-2 element-5" role="treeitem"><a href="/community/forums/" title="">Forums</a></li>\n    \n        <li class="tier-2 element-6" role="treeitem"><a href="/psf/annual-report/2019/" title="">PSF Annual Impact Report</a></li>\n    \n        <li class="tier-2 element-7" role="treeitem"><a href="/community/workshops/" title="">Python Conferences</a></li>\n    \n        <li class="tier-2 element-8" role="treeitem"><a href="/community/sigs/" title="">Special Interest Groups</a></li>\n    \n        <li class="tier-2 element-9" role="treeitem"><a href="/community/logos/" title="">Python Logo</a></li>\n    \n        <li class="tier-2 element-10" role="treeitem"><a href="https://wiki.python.org/moin/" title="">Python Wiki</a></li>\n    \n        <li class="tier-2 element-11" role="treeitem"><a href="/community/merchandise/" title="">Merchandise</a></li>\n    \n        <li class="tier-2 element-12" role="treeitem"><a href="/community/awards" title="">Community Awards</a></li>\n    \n        <li class="tier-2 element-13" role="treeitem"><a href="/psf/conduct/" title="">Code of Conduct</a></li>\n    \n</ul>\n\n        \n    </li>\n    \n    \n    \n    <li id="success-stories" class="tier-1 element-5  " aria-haspopup="true">\n        <a href="/success-stories/" title="success-stories" class="">Success Stories</a>\n        \n            \n\n<ul class="subnav menu" role="menu" aria-hidden="true">\n    \n        <li class="tier-2 element-1" role="treeitem"><a href="/success-stories/category/arts/" title="">Arts</a></li>\n    \n        <li class="tier-2 element-2" role="treeitem"><a href="/success-stories/category/business/" title="">Business</a></li>\n    \n        <li class="tier-2 element-3" role="treeitem"><a href="/success-stories/category/education/" title="">Education</a></li>\n    \n        <li class="tier-2 element-4" role="treeitem"><a href="/success-stories/category/engineering/" title="">Engineering</a></li>\n    \n        <li class="tier-2 element-5" role="treeitem"><a href="/success-stories/category/government/" title="">Government</a></li>\n    \n        <li class="tier-2 element-6" role="treeitem"><a href="/success-stories/category/scientific/" title="">Scientific</a></li>\n    \n        <li class="tier-2 element-7" role="treeitem"><a href="/success-stories/category/software-development/" title="">Software Development</a></li>\n    \n</ul>\n\n        \n    </li>\n    \n    \n    \n    <li id="news" class="tier-1 element-6  " aria-haspopup="true">\n        <a href="/blogs/" title="News from around the Python world" class="">News</a>\n        \n            \n\n<ul class="subnav menu" role="menu" aria-hidden="true">\n    \n        <li class="tier-2 element-1" role="treeitem"><a href="/blogs/" title="Python Insider Blog Posts">Python News</a></li>\n    \n        <li class="tier-2 element-2" role="treeitem"><a href="/psf/newsletter/" title="Python Software Foundation Newsletter">PSF Newsletter</a></li>\n    \n        <li class="tier-2 element-3" role="treeitem"><a href="http://planetpython.org/" title="Planet Python">Community News</a></li>\n    \n        <li class="tier-2 element-4" role="treeitem"><a href="http://pyfound.blogspot.com/" title="PSF Blog">PSF News</a></li>\n    \n        <li class="tier-2 element-5" role="treeitem"><a href="http://pycon.blogspot.com/" title="PyCon Blog">PyCon News</a></li>\n    \n</ul>\n\n        \n    </li>\n    \n    \n    \n    <li id="events" class="tier-1 element-7  " aria-haspopup="true">\n        <a href="/events/" title="" class="">Events</a>\n        \n            \n\n<ul class="subnav menu" role="menu" aria-hidden="true">\n    \n        <li class="tier-2 element-1" role="treeitem"><a href="/events/python-events" title="">Python Events</a></li>\n    \n        <li class="tier-2 element-2" role="treeitem"><a href="/events/python-user-group/" title="">User Group Events</a></li>\n    \n        <li class="tier-2 element-3" role="treeitem"><a href="/events/python-events/past/" title="">Python Events Archive</a></li>\n    \n        <li class="tier-2 element-4" role="treeitem"><a href="/events/python-user-group/past/" title="">User Group Events Archive</a></li>\n    \n        <li class="tier-2 element-5" role="treeitem"><a href="https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event" title="">Submit an Event</a></li>\n    \n</ul>\n\n        \n    </li>\n    \n    \n    \n    \n  \n</ul>\n\n                    \n                </nav>\n\n                <div class="header-banner "> <!-- for optional "do-not-print" class -->\n                    \n                    \n                </div>\n\n                \n                \n\n             </div><!-- end .container -->\n        </header>\n\n        <div id="content" class="content-wrapper">\n            <!-- Main Content Column -->\n            <div class="container">\n\n                <section class="main-content with-left-sidebar" role="main">\n\n                    \n<ul class="breadcrumbs menu">\n    <li>\n        <a href="/" title="The Python Programming Language">Python</a><span class="prompt">&gt;&gt;&gt;</span>\n    </li>\n    <li>\n        <a href="/dev/">Python Developer\'s Guide</a><span class="prompt">&gt;&gt;&gt;</span>\n    </li>\n    <li>\n        <a href="/dev/peps/">PEP Index</a><span class="prompt">&gt;&gt;&gt;</span>\n    </li>\n    <li>PEP 20 -- The Zen of Python</li>\n</ul>\n\n\n                    \n\n                    \n<style>\n   .pep-page pre {\n        padding: .5em;\n        background: inherit;\n        border-left: 0px;\n        -webkit-box-shadow: 0 0 0 0;\n        -moz-box-shadow: 0 0 0 0;\n        box-shadow: 0 0 0 0;\n   }\n   .pep-page pre.literal-block {\n       background-color: #e6e8ea;\n       border: 1px solid #ddd;\n       padding: 1em;\n       -webkit-box-shadow: 0 0 1em rgba( 0, 0, 0, 0.2 );\n       -moz-box-shadow: 0 0 1em rgba( 0, 0, 0, 0.2 );\n       box-shadow: 0 0 1em rgba( 0, 0, 0, 0.2 );\n   }\n</style>\n\n    \n    <article class="text">\n        \n\n        <header class="article-header">\n            <h1 class="page-title">PEP 20 -- The Zen of Python</h1>\n        </header>\n\n        <!--\nThis HTML is auto-generated.  DO NOT EDIT THIS FILE!  If you are writing a new\nPEP, see http://www.python.org/dev/peps/pep-0001 for instructions and links\nto templates.  DO NOT USE THIS HTML FILE AS YOUR TEMPLATE!\n--><table class="rfc2822 docutils field-list" frame="void" rules="none">\n<col class="field-name"/>\n<col class="field-body"/>\n<tbody valign="top">\n<tr class="field"><th class="field-name">PEP:</th><td class="field-body">20</td>\n</tr>\n<tr class="field"><th class="field-name">Title:</th><td class="field-body">The Zen of Python</td>\n</tr>\n<tr class="field"><th class="field-name">Author:</th><td class="field-body">tim.peters at gmail.com (Tim Peters)</td>\n</tr>\n<tr class="field"><th class="field-name">Status:</th><td class="field-body">Active</td>\n</tr>\n<tr class="field"><th class="field-name">Type:</th><td class="field-body">Informational</td>\n</tr>\n<tr class="field"><th class="field-name">Created:</th><td class="field-body">19-Aug-2004</td>\n</tr>\n<tr class="field"><th class="field-name">Post-History:</th><td class="field-body">22-Aug-2004</td>\n</tr>\n</tbody>\n</table>\n<hr/>\n<div class="contents topic" id="contents">\n<p class="topic-title">Contents</p>\n<ul class="simple">\n<li><a class="reference internal" href="#abstract" id="id1">Abstract</a></li>\n<li><a class="reference internal" href="#the-zen-of-python" id="id2">The Zen of Python</a></li>\n<li><a class="reference internal" href="#easter-egg" id="id3">Easter Egg</a></li>\n<li><a class="reference internal" href="#references" id="id4">References</a></li>\n<li><a class="reference internal" href="#copyright" id="id5">Copyright</a></li>\n</ul>\n</div>\n<div class="section" id="abstract">\n<h1><a class="toc-backref" href="#id1">Abstract</a></h1>\n<p>Long time Pythoneer Tim Peters succinctly channels the BDFL\'s guiding\nprinciples for Python\'s design into 20 aphorisms, only 19 of which\nhave been written down.</p>\n</div>\n<div class="section" id="the-zen-of-python">\n<h1><a class="toc-backref" href="#id2">The Zen of Python</a></h1>\n<pre class="literal-block">\nBeautiful is better than ugly.\nExplicit is better than implicit.\nSimple is better than complex.\nComplex is better than complicated.\nFlat is better than nested.\nSparse is better than dense.\nReadability counts.\nSpecial cases aren\'t special enough to break the rules.\nAlthough practicality beats purity.\nErrors should never pass silently.\nUnless explicitly silenced.\nIn the face of ambiguity, refuse the temptation to guess.\nThere should be one-- and preferably only one --obvious way to do it.\nAlthough that way may not be obvious at first unless you\'re Dutch.\nNow is better than never.\nAlthough never is often better than *right* now.\nIf the implementation is hard to explain, it\'s a bad idea.\nIf the implementation is easy to explain, it may be a good idea.\nNamespaces are one honking great idea -- let\'s do more of those!\n</pre>\n</div>\n<div class="section" id="easter-egg">\n<h1><a class="toc-backref" href="#id3">Easter Egg</a></h1>\n<pre class="literal-block">\n&gt;&gt;&gt; import this\n</pre>\n</div>\n<div class="section" id="references">\n<h1><a class="toc-backref" href="#id4">References</a></h1>\n<ul class="simple">\n<li>Originally posted to <a class="reference external" href="mailto:comp.lang.python/[email protected]">comp.lang.python/[email protected]</a> under a\nthread called "The Way of Python"\n<a class="reference external" href="https://groups.google.com/d/msg/comp.lang.python/B_VxeTBClM0/L8W9KlsiriUJ">https://groups.google.com/d/msg/comp.lang.python/B_VxeTBClM0/L8W9KlsiriUJ</a></li>\n</ul>\n</div>\n<div class="section" id="copyright">\n<h1><a class="toc-backref" href="#id5">Copyright</a></h1>\n<p>This document has been placed in the public domain.</p>\n<!-- Local Variables:\nmode: indented-text\nindent-tabs-mode: nil\nsentence-end-double-space: t\nfill-column: 70\nEnd: -->\n</div>\nSource: <a href="https://github.com/python/peps/blob/master/pep-0020.txt">https://github.com/python/peps/blob/master/pep-0020.txt</a>\n\n    </article>\n\n\n                </section>\n\n                \n\n\n<aside class="left-sidebar" role="secondary">\n\n    \n\n        <div class="twitter-widget sidebar-widget">\n        <a class="twitter-timeline" data-dnt="true" href="https://twitter.com/ThePSF" data-widget-id="434113224703610882">Tweets by @ThePSF</a>\n        <script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?\'http\':\'https\';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>\n    </div>\n\n    <div class="psf-sidebar-widget sidebar-widget">\n        <h3 class="widget-title">The PSF</h3>\r\n<p>The Python Software Foundation is the organization behind Python. Become a member of the PSF and help advance the software and our mission. </p>\n\n    </div>\n\n</aside>\n\n\n\n                \n                \n\n\n            </div><!-- end .container -->\n        </div><!-- end #content .content-wrapper -->\n\n        <!-- Footer and social media list -->\n        <footer id="site-map" class="main-footer" role="contentinfo">\n            <div class="main-footer-links">\n                <div class="container">\n\n                    \n                    <a id="back-to-top-1" class="jump-link" href="#python-network"><span aria-hidden="true" class="icon-arrow-up"><span>&#9650;</span></span> Back to Top</a>\n\n                    \n\n<ul class="sitemap navigation menu do-not-print" role="tree" id="container">\n    \n    <li class="tier-1 element-1">\n        <a href="/about/" >About</a>\n        \n            \n\n<ul class="subnav menu">\n    \n        <li class="tier-2 element-1" role="treeitem"><a href="/about/apps/" title="">Applications</a></li>\n    \n        <li class="tier-2 element-2" role="treeitem"><a href="/about/quotes/" title="">Quotes</a></li>\n    \n        <li class="tier-2 element-3" role="treeitem"><a href="/about/gettingstarted/" title="">Getting Started</a></li>\n    \n        <li class="tier-2 element-4" role="treeitem"><a href="/about/help/" title="">Help</a></li>\n    \n        <li class="tier-2 element-5" role="treeitem"><a href="http://brochure.getpython.info/" title="">Python Brochure</a></li>\n    \n</ul>\n\n        \n    </li>\n    \n    <li class="tier-1 element-2">\n        <a href="/downloads/" >Downloads</a>\n        \n            \n\n<ul class="subnav menu">\n    \n        <li class="tier-2 element-1" role="treeitem"><a href="/downloads/" title="">All releases</a></li>\n    \n        <li class="tier-2 element-2" role="treeitem"><a href="/downloads/source/" title="">Source code</a></li>\n    \n        <li class="tier-2 element-3" role="treeitem"><a href="/downloads/windows/" title="">Windows</a></li>\n    \n        <li class="tier-2 element-4" role="treeitem"><a href="/downloads/mac-osx/" title="">Mac OS X</a></li>\n    \n        <li class="tier-2 element-5" role="treeitem"><a href="/download/other/" title="">Other Platforms</a></li>\n    \n        <li class="tier-2 element-6" role="treeitem"><a href="https://docs.python.org/3/license.html" title="">License</a></li>\n    \n        <li class="tier-2 element-7" role="treeitem"><a href="/download/alternatives" title="">Alternative Implementations</a></li>\n    \n</ul>\n\n        \n    </li>\n    \n    <li class="tier-1 element-3">\n        <a href="/doc/" >Documentation</a>\n        \n            \n\n<ul class="subnav menu">\n    \n        <li class="tier-2 element-1" role="treeitem"><a href="/doc/" title="">Docs</a></li>\n    \n        <li class="tier-2 element-2" role="treeitem"><a href="/doc/av" title="">Audio/Visual Talks</a></li>\n    \n        <li class="tier-2 element-3" role="treeitem"><a href="https://wiki.python.org/moin/BeginnersGuide" title="">Beginner&#39;s Guide</a></li>\n    \n        <li class="tier-2 element-4" role="treeitem"><a href="https://devguide.python.org/" title="">Developer&#39;s Guide</a></li>\n    \n        <li class="tier-2 element-5" role="treeitem"><a href="https://docs.python.org/faq/" title="">FAQ</a></li>\n    \n        <li class="tier-2 element-6" role="treeitem"><a href="http://wiki.python.org/moin/Languages" title="">Non-English Docs</a></li>\n    \n        <li class="tier-2 element-7" role="treeitem"><a href="http://python.org/dev/peps/" title="">PEP Index</a></li>\n    \n        <li class="tier-2 element-8" role="treeitem"><a href="https://wiki.python.org/moin/PythonBooks" title="">Python Books</a></li>\n    \n        <li class="tier-2 element-9" role="treeitem"><a href="/doc/essays/" title="">Python Essays</a></li>\n    \n</ul>\n\n        \n    </li>\n    \n    <li class="tier-1 element-4">\n        <a href="/community/" >Community</a>\n        \n            \n\n<ul class="subnav menu">\n    \n        <li class="tier-2 element-1" role="treeitem"><a href="/community/survey" title="">Community Survey</a></li>\n    \n        <li class="tier-2 element-2" role="treeitem"><a href="/community/diversity/" title="">Diversity</a></li>\n    \n        <li class="tier-2 element-3" role="treeitem"><a href="/community/lists/" title="">Mailing Lists</a></li>\n    \n        <li class="tier-2 element-4" role="treeitem"><a href="/community/irc/" title="">IRC</a></li>\n    \n        <li class="tier-2 element-5" role="treeitem"><a href="/community/forums/" title="">Forums</a></li>\n    \n        <li class="tier-2 element-6" role="treeitem"><a href="/psf/annual-report/2019/" title="">PSF Annual Impact Report</a></li>\n    \n        <li class="tier-2 element-7" role="treeitem"><a href="/community/workshops/" title="">Python Conferences</a></li>\n    \n        <li class="tier-2 element-8" role="treeitem"><a href="/community/sigs/" title="">Special Interest Groups</a></li>\n    \n        <li class="tier-2 element-9" role="treeitem"><a href="/community/logos/" title="">Python Logo</a></li>\n    \n        <li class="tier-2 element-10" role="treeitem"><a href="https://wiki.python.org/moin/" title="">Python Wiki</a></li>\n    \n        <li class="tier-2 element-11" role="treeitem"><a href="/community/merchandise/" title="">Merchandise</a></li>\n    \n        <li class="tier-2 element-12" role="treeitem"><a href="/community/awards" title="">Community Awards</a></li>\n    \n        <li class="tier-2 element-13" role="treeitem"><a href="/psf/conduct/" title="">Code of Conduct</a></li>\n    \n</ul>\n\n        \n    </li>\n    \n    <li class="tier-1 element-5">\n        <a href="/success-stories/" title="success-stories">Success Stories</a>\n        \n            \n\n<ul class="subnav menu">\n    \n        <li class="tier-2 element-1" role="treeitem"><a href="/success-stories/category/arts/" title="">Arts</a></li>\n    \n        <li class="tier-2 element-2" role="treeitem"><a href="/success-stories/category/business/" title="">Business</a></li>\n    \n        <li class="tier-2 element-3" role="treeitem"><a href="/success-stories/category/education/" title="">Education</a></li>\n    \n        <li class="tier-2 element-4" role="treeitem"><a href="/success-stories/category/engineering/" title="">Engineering</a></li>\n    \n        <li class="tier-2 element-5" role="treeitem"><a href="/success-stories/category/government/" title="">Government</a></li>\n    \n        <li class="tier-2 element-6" role="treeitem"><a href="/success-stories/category/scientific/" title="">Scientific</a></li>\n    \n        <li class="tier-2 element-7" role="treeitem"><a href="/success-stories/category/software-development/" title="">Software Development</a></li>\n    \n</ul>\n\n        \n    </li>\n    \n    <li class="tier-1 element-6">\n        <a href="/blogs/" title="News from around the Python world">News</a>\n        \n            \n\n<ul class="subnav menu">\n    \n        <li class="tier-2 element-1" role="treeitem"><a href="/blogs/" title="Python Insider Blog Posts">Python News</a></li>\n    \n        <li class="tier-2 element-2" role="treeitem"><a href="/psf/newsletter/" title="Python Software Foundation Newsletter">PSF Newsletter</a></li>\n    \n        <li class="tier-2 element-3" role="treeitem"><a href="http://planetpython.org/" title="Planet Python">Community News</a></li>\n    \n        <li class="tier-2 element-4" role="treeitem"><a href="http://pyfound.blogspot.com/" title="PSF Blog">PSF News</a></li>\n    \n        <li class="tier-2 element-5" role="treeitem"><a href="http://pycon.blogspot.com/" title="PyCon Blog">PyCon News</a></li>\n    \n</ul>\n\n        \n    </li>\n    \n    <li class="tier-1 element-7">\n        <a href="/events/" >Events</a>\n        \n            \n\n<ul class="subnav menu">\n    \n        <li class="tier-2 element-1" role="treeitem"><a href="/events/python-events" title="">Python Events</a></li>\n    \n        <li class="tier-2 element-2" role="treeitem"><a href="/events/python-user-group/" title="">User Group Events</a></li>\n    \n        <li class="tier-2 element-3" role="treeitem"><a href="/events/python-events/past/" title="">Python Events Archive</a></li>\n    \n        <li class="tier-2 element-4" role="treeitem"><a href="/events/python-user-group/past/" title="">User Group Events Archive</a></li>\n    \n        <li class="tier-2 element-5" role="treeitem"><a href="https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event" title="">Submit an Event</a></li>\n    \n</ul>\n\n        \n    </li>\n    \n    <li class="tier-1 element-8">\n        <a href="/dev/" >Contributing</a>\n        \n            \n\n<ul class="subnav menu">\n    \n        <li class="tier-2 element-1" role="treeitem"><a href="https://devguide.python.org/" title="">Developer&#39;s Guide</a></li>\n    \n        <li class="tier-2 element-2" role="treeitem"><a href="https://bugs.python.org/" title="">Issue Tracker</a></li>\n    \n        <li class="tier-2 element-3" role="treeitem"><a href="https://mail.python.org/mailman/listinfo/python-dev" title="">python-dev list</a></li>\n    \n        <li class="tier-2 element-4" role="treeitem"><a href="/dev/core-mentorship/" title="">Core Mentorship</a></li>\n    \n        <li class="tier-2 element-5" role="treeitem"><a href="/dev/security/" title="">Report a Security Issue</a></li>\n    \n</ul>\n\n        \n    </li>\n    \n</ul>\n\n\n                    <a id="back-to-top-2" class="jump-link" href="#python-network"><span aria-hidden="true" class="icon-arrow-up"><span>&#9650;</span></span> Back to Top</a>\n                    \n\n                </div><!-- end .container -->\n            </div> <!-- end .main-footer-links -->\n\n            <div class="site-base">\n                <div class="container">\n                    \n                    <ul class="footer-links navigation menu do-not-print" role="tree">\n                        <li class="tier-1 element-1"><a href="/about/help/">Help &amp; <span class="say-no-more">General</span> Contact</a></li>\n                        <li class="tier-1 element-2"><a href="/community/diversity/">Diversity <span class="say-no-more">Initiatives</span></a></li>\n                        <li class="tier-1 element-3"><a href="https://github.com/python/pythondotorg/issues">Submit Website Bug</a></li>\n                        <li class="tier-1 element-4">\n                            <a href="https://status.python.org/">Status <span class="python-status-indicator-default" id="python-status-indicator"></span></a>\n                        </li>\n                    </ul>\n\n                    <div class="copyright">\n                        <p><small>\n                            <span class="pre">Copyright &copy;2001-2020.</span>\n                            &nbsp;<span class="pre"><a href="/psf-landing/">Python Software Foundation</a></span>\n                            &nbsp;<span class="pre"><a href="/about/legal/">Legal Statements</a></span>\n                            &nbsp;<span class="pre"><a href="/privacy/">Privacy Policy</a></span>\n                            &nbsp;<span class="pre"><a href="/psf/sponsorship/sponsors/#heroku">Powered by Heroku</a></span>\n                        </small></p>\n                    </div>\n\n                </div><!-- end .container -->\n            </div><!-- end .site-base -->\n\n        </footer>\n\n    </div><!-- end #touchnav-wrapper -->\n\n    \n    <script src="//ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js"></script>\n    <script>window.jQuery || document.write(\'<script src="/static/js/libs/jquery-1.8.2.min.js"><\\/script>\')</script>\n\n    <script src="/static/js/libs/masonry.pkgd.min.js"></script>\n    <script src="/static/js/libs/html-includes.js"></script>\n\n    <script type="text/javascript" src="/static/js/main-min.037d9037f112.js" charset="utf-8"></script>\n    \n\n    <!--[if lte IE 7]>\n    <script type="text/javascript" src="/static/js/plugins/IE8-min.6dc39b5a0bdb.js" charset="utf-8"></script>\n    \n    \n    <![endif]-->\n\n    <!--[if lte IE 8]>\n    <script type="text/javascript" src="/static/js/plugins/getComputedStyle-min.c3860be1d290.js" charset="utf-8"></script>\n    \n    \n    <![endif]-->\n\n    \n\n    \n    \n\n</body>\n</html>\n'

可以看到返回的其實就是開發者工具下Elements的內容,只不過是字符串類型,接下來我們要用python的內置函數find來定位“python之禪”的索引,然後從這段字符串中取出它

通過觀察網站,我們可以發現這段話在一個特殊的容器中,通過審查元素,使用快捷鍵Ctrl+shift+c快速定位到這段話也可以發現這段話包圍在pre標籤中,因此我們可以由這個特定用find函數找出具體內容

<pre> 標籤可定義預格式化的文本。
被包圍在 <pre> 標籤 元素中的文本通常會保留空格和換行符。而文本也會呈現爲等寬字體。·

## 爬取python之禪並存入txt文件

with open('zon_of_python.txt', 'w') as f:
    f.write(text[text.find('<pre')+28:text.find('</pre>')-1])
print(text[text.find('<pre')+28:text.find('</pre>')-1])
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

利用python自帶的urllib完成以上操作:

import urllib
url = 'https://www.python.org/dev/peps/pep-0020/'
res = urllib.request.urlopen(url).read().decode('utf-8')
print(res[res.find('<pre')+28:res.find('</pre>')-1])
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

urllib是python3的標準庫,包含了很多基本功能,比如向網絡請求數據、處理cookie、自定義請求頭等,顯然,就代碼量而言,urllib的工作量比Requests要大,而且看起來也不甚簡潔

4.2 requests.post

我們先以金山詞霸爲例,有道翻譯百度翻譯谷歌翻譯都有加密,以後可以自己嘗試。

首先進入金山詞霸首頁http://www.iciba.com/

然後打開開發者工具下的“Network”,翻譯一段話,比如剛剛我們爬到的第一句話“Beautiful is better than ugly.”

點擊翻譯後可以發現Name下多了一項請求方法是POST的數據,點擊Preview可以發現數據中有我們想要的翻譯結果

在這裏插入圖片描述

我們目前需要用到的兩部分信息是Request Headers中的User-Agent,和Form Data

在這裏插入圖片描述

在這裏插入圖片描述

接下來我們利用金山詞霸來翻譯我們剛剛爬出來的python之禪

import requests
def translate(word):
    url="http://fy.iciba.com/ajax.php?a=fy"

    data={
        'f': 'auto',
        't': 'auto',
        'w': word,
    }
    
    headers={
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
    }#User-Agent會告訴網站服務器,訪問者是通過什麼工具來請求的,如果是爬蟲請求,一般會拒絕,如果是用戶瀏覽器,就會應答。
    response = requests.post(url,data=data,headers=headers)     #發起請求
    json_data=response.json()   #獲取json數據
    #print(json_data)
    return json_data
    
def run(word):    
    result = translate(word)['content']['out']   
    print(result)
    return result

def main():
    with open('zon_of_python.txt') as f:
        zh = [run(word) for word in f]

    with open('zon_of_python_zh-CN.txt', 'w') as g:
        for i in zh:
            g.write(i + '\n')
            
if __name__ == '__main__':
    main()
 美麗勝過醜陋。
 外顯優於內隱..
簡單勝於複雜。
 複雜勝於複雜。
 平比嵌套好..
 疏而不密..
 可讀性計數。
 特殊情況不足以打破規則。
 儘管實用性勝過純度。
 錯誤永遠不應該悄悄地過去。
 除非有明確的沉默。
 面對曖昧,拒絕猜測的誘惑..
 應該有一種----最好只有一種----明顯的辦法來做到這一點。
雖然這種方式一開始可能不明顯,除非你是荷蘭人。
 現在總比永遠好。
 雖然從來沒有比現在更好。
 如果實施很難解釋,那是個壞主意。
 如果實現很容易解釋,這可能是個好主意。
 命名空間是一個偉大的想法-讓我們做更多的這些!

4.3 request.get進階:爬取豆瓣電影

import requests
import os

if not os.path.exists('image'):
     os.mkdir('image')

def parse_html(url):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"}
    res = requests.get(url, headers=headers)
    text = res.text
    item = []
    for i in range(25):
        text = text[text.find('alt')+3:]
        item.append(extract(text))
    return item
       
def extract(text):
    text = text.split('"')
    name = text[1]
    image = text[3]
    return name, image

def write_movies_file(item, stars):
    print(item)
    with open('douban_film.txt','a',encoding='utf-8') as f:
        f.write('排名:%d\t電影名:%s\n' % (stars, item[0]))
    r = requests.get(item[1])
    with open('image/' + str(item[0]) + '.jpg', 'wb') as f:
        f.write(r.content)
        
def main():
    stars = 1
    for offset in range(0, 250, 25):
        url = 'https://movie.douban.com/top250?start=' + str(offset) +'&filter='
        for item in parse_html(url):
            write_movies_file(item, stars)
            stars += 1

if __name__ == '__main__':
    main()

('肖申克的救贖', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p480747492.jpg')
('霸王別姬', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2561716440.jpg')
('阿甘正傳', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1484728154.jpg')
('這個殺手不太冷', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p511118051.jpg')
('美麗人生', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2578474613.jpg')
('泰坦尼克號', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p457760035.jpg')
('千與千尋', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2557573348.jpg')
('辛德勒的名單', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p492406163.jpg')
('盜夢空間', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p513344864.jpg')
('忠犬八公的故事', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p524964016.jpg')
('海上鋼琴師', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2574551676.jpg')
('楚門的世界', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p479682972.jpg')
('三傻大鬧寶萊塢', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p579729551.jpg')
('機器人總動員', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1461851991.jpg')
('放牛班的春天', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1910824951.jpg')
('星際穿越', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2206088801.jpg')
('大話西遊之大聖娶親', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2455050536.jpg')
('熔爐', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1363250216.jpg')
('瘋狂動物城', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2315672647.jpg')
('無間道', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2564556863.jpg')
('龍貓', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2540924496.jpg')
('教父', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p616779645.jpg')
('當幸福來敲門', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1312700628.jpg')
('怦然心動', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p501177648.jpg')
('觸不可及', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1454261925.jpg')
('蝙蝠俠:黑暗騎士', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p462657443.jpg')
('控方證人', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1505392928.jpg')
('活着', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2513253791.jpg')
('亂世佳人', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1963126880.jpg')
('尋夢環遊記', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2503997609.jpg')
('末代皇帝', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p452089833.jpg')
('摔跤吧!爸爸', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2457983084.jpg')
('指環王3:王者無敵', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1910825503.jpg')
('少年派的奇幻漂流', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1784592701.jpg')
('何以爲家', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2555295759.jpg')
('飛屋環遊記', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p485887754.jpg')
('十二怒漢', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2173577632.jpg')
('鬼子來了', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2553104888.jpg')
('天空之城', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1446261379.jpg')
('大話西遊之月光寶盒', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2561721372.jpg')
('哈爾的移動城堡', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2174346180.jpg')
('素媛', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2118532944.jpg')
('天堂電影院', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2559577569.jpg')
('羅馬假日', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2189265085.jpg')
('聞香識女人', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2550757929.jpg')
('辯護人', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2158166535.jpg')
('搏擊俱樂部', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1910926158.jpg')
('哈利·波特與魔法石', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2591591494.jpg')
('我不是藥神', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2561305376.jpg')
('死亡詩社', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2575465690.jpg')
('教父2', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2194138787.jpg')
('獅子王', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2277799019.jpg')
('指環王2:雙塔奇兵', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p909265336.jpg')
('竊聽風暴', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1808872109.jpg')
('大鬧天宮', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2184505167.jpg')
('指環王1:魔戒再現', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1354436051.jpg')
('兩杆大煙槍', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p792443418.jpg')
('美麗心靈', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1665997400.jpg')
('飲食男女', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1910899751.jpg')
('飛越瘋人院', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p792238287.jpg')
('貓鼠遊戲', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p453924541.jpg')
('黑客帝國', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p451926968.jpg')
('V字仇殺隊', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1465235231.jpg')
('鋼琴家', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p792376093.jpg')
('本傑明·巴頓奇事', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2192535722.jpg')
('看不見的客人', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2498971355.jpg')
('讓子彈飛', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1512562287.jpg')
('西西里的美麗傳說', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2441988159.jpg')
('海豚灣', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2559579779.jpg')
('小鞋子', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2173580536.jpg')
('拯救大兵瑞恩', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1014542496.jpg')
('情書', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p449897379.jpg')
('穿條紋睡衣的男孩', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1473670352.jpg')
('音樂之聲', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2189265302.jpg')
('美國往事', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p477229647.jpg')
('綠皮書', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2549177902.jpg')
('致命魔術', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p480383375.jpg')
('海蒂和爺爺', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2554525534.jpg')
('低俗小說', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1910902213.jpg')
('七宗罪', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2219586434.jpg')
('沉默的羔羊', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1593414327.jpg')
('蝴蝶效應', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2209066019.jpg')
('春光乍泄', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p465939041.jpg')
('被嫌棄的松子的一生', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p884763596.jpg')
('禁閉島', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1832875827.jpg')
('心靈捕手', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p480965695.jpg')
('布達佩斯大飯店', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2178872593.jpg')
('阿凡達', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2180085848.jpg')
('剪刀手愛德華', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p480956937.jpg')
('勇敢的心', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1374546770.jpg')
('摩登時代', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2263408369.jpg')
('天使愛美麗', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2447590313.jpg')
('喜劇之王', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2579932167.jpg')
('加勒比海盜', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1596085504.jpg')
('致命ID', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2558364386.jpg')
('斷背山', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2154212680.jpg')
('殺人回憶', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2326071698.jpg')
('幽靈公主', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1613191025.jpg')
('狩獵', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1546987967.jpg')
('陽光燦爛的日子', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2564685215.jpg')
('請以你的名字呼喚我', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2505525050.jpg')
('入殮師', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p594972928.jpg')
('哈利·波特與死亡聖器(下)', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p917846733.jpg')
('重慶森林', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p792381411.jpg')
('第六感', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2220184425.jpg')
('小森林 夏秋篇', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2564498893.jpg')
('7號房的禮物', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1816276065.jpg')
('消失的愛人', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2221768894.jpg')
('紅辣椒', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p456825720.jpg')
('愛在黎明破曉前', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2555762374.jpg')
('小森林 冬春篇', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2258078370.jpg')
('瑪麗和馬克思', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2162822165.jpg')
('側耳傾聽', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p456692072.jpg')
('一一', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2567845803.jpg')
('告白', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p689520756.jpg')
('唐伯虎點秋香', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2357915564.jpg')
('大魚', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p692813374.jpg')
('蝙蝠俠:黑暗騎士崛起', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1706428744.jpg')
('陽光姐妹淘', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1374786017.jpg')
('倩女幽魂', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2414157745.jpg')
('超脫', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1305562621.jpg')
('射鵰英雄傳之東成西就', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2535922598.jpg')
('甜蜜蜜', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2223011274.jpg')
('馴龍高手', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2210954024.jpg')
('螢火之森', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1675053073.jpg')
('超能陸戰隊', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2224568669.jpg')
('無人知曉', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p661160053.jpg')
('幸福終點站', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p854757687.jpg')
('菊次郎的夏天', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p751835224.jpg')
('恐怖直播', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2016930906.jpg')
('借東西的小人阿莉埃蒂', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p617533616.jpg')
('愛在日落黃昏時', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1910924221.jpg')
('神偷奶爸', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p792776858.jpg')
('完美的世界', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p792403691.jpg')
('怪獸電力公司', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2513247938.jpg')
('玩具總動員3', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1283675359.jpg')
('風之谷', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1917567652.jpg')
('血戰鋼鋸嶺', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2398141939.jpg')
('傲慢與偏見', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p452005185.jpg')
('上帝之城', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p455677490.jpg')
('功夫', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2219011938.jpg')
('時空戀旅人', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2070153774.jpg')
('教父3', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2169664351.jpg')
('電鋸驚魂', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2565332644.jpg')
('喜宴', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2173713676.jpg')
('諜影重重3', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p792223507.jpg')
('英雄本色', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2504997087.jpg')
('天書奇譚', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2515539487.jpg')
('人生果實', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2544912792.jpg')
('歲月神偷', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p456666151.jpg')
('被解救的姜戈', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1959232369.jpg')
('七武士', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2565471701.jpg')
('哪吒鬧海', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2516566783.jpg')
('我是山姆', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p652417775.jpg')
('瘋狂原始人', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1867084027.jpg')
('縱橫四海', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2272146906.jpg')
('三塊廣告牌', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2510081688.jpg')
('頭號玩家', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2516578307.jpg')
('心迷宮', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2276780256.jpg')
('螢火蟲之墓', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1157334208.jpg')
('真愛至上', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p475600770.jpg')
('達拉斯買傢俱樂部', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2166160837.jpg')
('釜山行', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2360940399.jpg')
('荒蠻故事', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2231250054.jpg')
('東邪西毒', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1982176012.jpg')
('貧民窟的百萬富翁', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2434249040.jpg')
('記憶碎片', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p641688453.jpg')
('爆裂鼓手', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2220776342.jpg')
('黑天鵝', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2549648344.jpg')
('你的名字。', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2395733377.jpg')
('花樣年華', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1910828286.jpg')
('盧旺達飯店', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p470419493.jpg')
('忠犬八公物語', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1576418852.jpg')
('哈利·波特與阿茲卡班的囚徒', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1910812549.jpg')
('頭腦特工隊', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2266293606.jpg')
('黑客帝國3:矩陣革命', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p443461818.jpg')
('模仿遊戲', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2255040492.jpg')
('一個叫歐維的男人決定去死', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2406624993.jpg')
('雨人', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p942376281.jpg')
('你看起來好像很好喫', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p709670262.jpg')
('無敵破壞王', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1735642656.jpg')
('未麻的部屋', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1351050722.jpg')
('戀戀筆記本', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p483604864.jpg')
('冰川時代', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1910895719.jpg')
('哈利·波特與密室', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1082651990.jpg')
('海街日記', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2232247487.jpg')
('新世界', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1903379979.jpg')
('海邊的曼徹斯特', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2421855655.jpg')
('二十二', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2457609817.jpg')
('虎口脫險', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2399597512.jpg')
('房間', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2259715855.jpg')
('恐怖遊輪', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p462470694.jpg')
('驚魂記', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1021883305.jpg')
('人工智能', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p792257137.jpg')
('雨中曲', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1612355875.jpg')
('魔女宅急便', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p456676352.jpg')
('奇蹟男孩', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2507709428.jpg')
('瘋狂的石頭', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p712241453.jpg')
('羅生門', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2564689879.jpg')
('海洋', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2559581324.jpg')
('愛在午夜降臨前', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2074715729.jpg')
('終結者2:審判日', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1910909085.jpg')
('燃情歲月', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1023654037.jpg')
('魂斷藍橋', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2351134499.jpg')
('小偷家族', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2530599636.jpg')
('初戀這件小事', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p767451487.jpg')
('穿越時空的少女', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2079334286.jpg')
('可可西里', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2414771522.jpg')
('綠裏奇蹟', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p767586451.jpg')
('2001太空漫遊', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2560717825.jpg')
('完美陌生人', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2522331945.jpg')
('牯嶺街少年殺人事件', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p848381236.jpg')
('無恥混蛋', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2575043939.jpg')
('阿飛正傳', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2525770523.jpg')
('城市之光', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2170238828.jpg')
('新龍門客棧', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1421018669.jpg')
('源代碼', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p988260245.jpg')
('香水', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2441127736.jpg')
('諜影重重2', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p667644866.jpg')
('青蛇', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p584021784.jpg')
('諜影重重', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1597183981.jpg')
('地球上的星星', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1973489335.jpg')
('戰爭之王', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p792282381.jpg')
('猜火車', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p513567548.jpg')
('血鑽', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1244017073.jpg')
('色,戒', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p453716305.jpg')
('遺願清單', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p708613284.jpg')
('大佛普拉斯', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2505928032.jpg')
('朗讀者', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1140984198.jpg')
('浪潮', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1344888983.jpg')
('步履不停', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2561539680.jpg')
('彗星來的那一夜', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2187896734.jpg')
('瘋狂的麥克斯4:狂暴之路', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2236181653.jpg')
('小蘿莉的猴神大叔', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2510956726.jpg')
('再次出發之紐約遇見你', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2250287733.jpg')
('聚焦', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2263822658.jpg')
('驢得水', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2393044761.jpg')
('東京物語', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1925331564.jpg')
('追隨', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2561545031.jpg')
('一次別離', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2189835254.jpg')
('我愛你', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1075591188.jpg')
('千鈞一髮', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2195672555.jpg')
('黑鷹墜落', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1910900710.jpg')
('九品芝麻官', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p648370300.jpg')
('四個春天', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2540578887.jpg')
('發條橙', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p529908155.jpg')
('網絡謎蹤', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2542848758.jpg')
('E.T. 外星人', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p984732992.jpg')
('哈利·波特與火焰杯', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2220723219.jpg')
('撞車', 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2075132390.jpg')

5. 使用api

所謂的採集網絡數據,並不一定必須從網頁中抓取數據,而api(Application Programming Iterface)的用處就在這裏:API爲開發者提供了方便友好的接口,不同的開發者用不同的語言都能獲取相同的數據。目前API一般會以XML(Extensible Markup Language,可拓展標記語言)或者JSON(JavaScript Object Notation)格式來返回服務器響應,其中JSON數據格式越來越受到人們的歡迎,我們後面的課程也會詳細介紹JSON格式。

5.1 API使用示例

下面以百度地圖提供的API爲例,首先我們打開鏈接:http://lbsyun.baidu.com/apiconsole/key 填寫自己的信息
在這裏插入圖片描述

郵箱激活後點擊申請密鑰,然後填入下圖信息,注意應用類型選擇瀏覽器端

在這裏插入圖片描述

點擊提交後就可以查看AK了,下面以兩個小例子展示一下AK的作用:

首先我們創建一個html文件,例如test.html,複製下面代碼,輸入自己的AK

//在 ak=後面輸入你的ak 地圖展示
然後用Chrome打開這個文件,效果如下圖所示:

在這裏插入圖片描述在這裏插入圖片描述

當然,百度地圖api還有很多作用,關於百度地圖API的其他使用,其官方文檔說的非常詳細和清楚 http://lbsyun.baidu.com/index.php?title=jspopular3.0 ,但是調用更復雜的功能需要有一定的網頁基礎。下面我們介紹如何實現地理編碼功能

import requests

def getUrl(*address):
    ak = ''  ## 填入你的api key
    if len(address) < 1:
        return None
    else:
        for add in address:   
            url = 'http://api.map.baidu.com/geocoding/v3/?address={0}&output=json&ak={1}'.format(add,ak)  
            yield url
            

def getPosition(url):
    '''返回經緯度信息'''
    res = requests.get(url)
    #print(res.text)
    json_data = eval(res.text)
    
    if json_data['status'] == 0:
        lat = json_data['result']['location']['lat'] #緯度
        lng = json_data['result']['location']['lng'] #經度
    else:
        print("Error output!")
        return json_data['status']
    return lat,lng

if __name__ == "__main__":
    address = ['北京市清華大學','北京市北京大學','保定市華北電力大學','上海市復旦大學','武漢市武漢大學']
    for add in address:
        add_url = list(getUrl(add))[0]
        print('url:', add_url)
        try:
            lat,lng = getPosition(add_url)
            print("{0}|經度:{1}|緯度:{2}.".format(add,lng,lat))
        except Error as e:
            print(e)
url: http://api.map.baidu.com/geocoding/v3/?address=北京市清華大學&output=json&ak=ARmmTpkvjEEwVDPgO4w8cddONQnQWY6w
北京市清華大學|經度:116.33337396094367|緯度:40.009645090734296.
url: http://api.map.baidu.com/geocoding/v3/?address=北京市北京大學&output=json&ak=ARmmTpkvjEEwVDPgO4w8cddONQnQWY6w
北京市北京大學|經度:116.31683256328296|緯度:39.99887680537622.
url: http://api.map.baidu.com/geocoding/v3/?address=保定市華北電力大學&output=json&ak=ARmmTpkvjEEwVDPgO4w8cddONQnQWY6w
保定市華北電力大學|經度:115.52130317483764|緯度:38.89477430426888.
url: http://api.map.baidu.com/geocoding/v3/?address=上海市復旦大學&output=json&ak=ARmmTpkvjEEwVDPgO4w8cddONQnQWY6w
上海市復旦大學|經度:121.74295536914276|緯度:31.06665792321301.
url: http://api.map.baidu.com/geocoding/v3/?address=武漢市武漢大學&output=json&ak=ARmmTpkvjEEwVDPgO4w8cddONQnQWY6w
武漢市武漢大學|經度:114.37292090919235|緯度:30.543803317143624.

更多參數可以查看官方文檔: http://lbsyun.baidu.com/index.php?title=webapi/guide/webservice-geocoding

6. JavaScript與AJAX技術


  如果利用Requests庫和BeautifulSoup來採集一些大型電商網站的頁面,可能會發現一個令人疑感的現象,那就是對於同一個URL、同一個頁面,抓取到的內容卻與瀏覽器中看到的內容有所不同。比如有的時候去尋找某一個<div>元素,卻發現Python程序報出異常,查看requests.get()方法的響應數據也沒有看到想要的元素信息。這其實代表着網頁數據抓取的一個關鍵問題——開發者通過程序獲取到的HTTP響應內容都是原始的HTML數據,但瀏覽器中的頁面其實是在HTML的基礎上,經過JavaScript進一步加工和處理後生成的效果。比如淘寶的商品評論就是通過JavaScript獲取JSON數據,然後“嵌入”到原始HTML中並呈現給用戶。這種在頁面中使用JavaScript的網頁對於20世紀90年代的web界面而言幾乎是天方夜測,但在今天,以AJAX(Asynchronous JavaScript and XML,異步JavaScript與XML)技術爲代表的結合JavaScript、CSS、HTML等語言的網頁開發技術已經成爲絕對的主流。

  爲了避免爲每一份要呈現的網頁內容都準備一個HTML,網站開發者們開始考慮對網頁的呈現方式進行變革。在JavaScript問世之初,Google公司的Gmail郵箱網站是第一個大規模使用JavaScript加載網頁數據的產品,在此之前,用戶爲了獲取下一頁的網頁信息,需要訪問新的地址並重新加載整個頁面。但新的Gmail則做出了更加優雅的方案,用戶只需要單擊“下一頁”按鈕,網頁就(實際上是瀏覽器)會根據用戶交互來對下一頁數據進行加載,而這個過程並不需要對整個頁面(HTML)的刷新。換句話說,JavaScript使得網頁可以靈活地加載其中一部分數據。後來,隨着這種設計的流行,“AJAX”這個詞語也成爲一個“術語”,Gmail作爲第一個大規模使用這種模式的商業化網站,也成功引領了被稱之爲“Web2.0”的潮流。

6.1 JavaScript語言

  JavaScript語言一般被定義爲一種“面向對象、動態類型的解釋性語言”,最初由Netscape公司爲Navigator瀏覽器開發,目的是作爲新一代瀏覽器的腳本語言支持。換句話說,不同於PHP或者ASP.NET,JavaScript不是爲“網站服務器”提供的語言,而是爲“用戶瀏覽器”提供的語言。從客戶端——服務器端的角度來說,JavaScript無疑是一種客戶端語言,但是由於JavaScript受到業界和用戶的強烈歡迎,加之開發者社區的活躍,目前的JavaScript已經開始朝向更爲綜合的方問發展。隨着V8引擎(可以提高JavaScript的解釋執行效率)和Node.js等新潮流的出現,JavaScript甚至已經開始涉足“服務器端”。在TIOBE排名(一個針對各類程序設計語言受歡迎度的比較)上,JavaScript穩居前10,井與PHP、Python、C#等分庭抗禮。
有一種說法是,對於今天任何一個正式的網站頁面而言,HTML決定了網頁的基本內容,CSS(Cascading Style Sheets,層疊樣式表)描述了網頁的樣式佈局,JavaScript 則控制了用戶與網頁的交互。

6.2 JavaScript語言的特點

  • 動態語言

    動態語言是指程序在運行時可以改變其結構:新的函數可以被引進,已有的函數可以被刪除等在結構上的變化。JavaScript便是一個動態語言。除此之外如Ruby、Python等也都屬於動態語言,而C、C++等語言則不屬於動態語言。比如在JavaScript中可以在對象定義之後動態的爲其添加屬性和方法

  • 腳本語言

    腳本語言是爲了縮短傳統的編寫-編譯-鏈接-運行(edit-compile-link-run)過程而創建的計算機編程語言,只在被調用時進行解釋或編譯,然後執行。它的命名起源於一個腳本“screenplay”,每次運行都會使對話框逐字重複。早期的腳本語言經常被稱爲批量處理語言或工作控制語言。

  • 弱類型

    弱/強類型指的是語言類型系統的類型檢查的嚴格程度,弱類型的語言在聲明變量的時候不必進行變量類型的確定,語言的運行時會隱式做數據類型轉換,對於弱類型語言來說,不同類型的變量可以進行直接運算,而強類型的則不可以。

6.2.1 如何使用JavaScript

1.通過<script></script>中直接編寫

#### 結果如下:

在這裏插入圖片描述

2.通過<script src=’目標文檔的URL’><script>連接外部Js文件

</body>

document.write(‘hello’);

結果如下:

在這裏插入圖片描述

3.href屬性值

作爲某個元素的時間屬性值或者是超鏈接的href屬性值

    <a href="javascript:confirm('缺人嗎')">報名</a>
    <p onclick="javascript:alert('hello word')">
        click me
    </p>
</body>

結果如下:

在這裏插入圖片描述

6.2.2 JavaScript的基本語法

  • JavaScript的執行順序:按照HTML文件中出現的順序依次執行

  • 大小寫敏感:JavaScript嚴格區分大小寫

  • 忽略空白符和換行符

  • 語句分隔符:使用;結束語句,可以把多個語句寫在一行,最後一個語句的分號可以省略,但儘量不要省略。可以使用{}括成一個語句組,形成一個block

  • 通過\對代碼進行折行操作:document.write(‘hello\world’);

  • //單行註釋  多行註釋/註釋內容/

  • JavaScript中的保留字:abstract,else,instanceof,super,boolean,enum,int,switch,break,export,interface,synchronized,byte,extends,let,this,case,false,long,throw,catch,final,native,throws,char,finally,new,transient,class,float,null,true,const,for,package,try,continue,function,private,typeof,debugger,goto,protected,var,defaut,if,public,void,delete,inplements,return,volatile,do,import,short,while,doble,in,static,width

  • 通過document.write()向文檔書寫內容

  • 通過console.log()向控制檯寫入內容

  • 語法錯誤:通過控制檯進行調試

  • 邏輯錯誤:通過alert() 進行調試

__【提示】__JavaScript的名字使得很多人會將其與Java語言聯繫起來,認爲它是Java的某種派生語言,但實際上JavaScript在設計原則上更多受到了Scheme(一種函數式編程語言)和C語言的影響,除了變量類型和命名規範等細節,JavaScript與Java關係並不大。Netscape公司最初爲之命名“LiveScript”,但當時正與Sun公司合作,加上Java語言所獲得的巨大成功,爲了“蹭熱點”,遂將其名字改爲“JavaScript”。JavaScript推出後受到了業界的一致肯定,對JavaScript的支持也成爲在21世紀出現的現代瀏覽器的基本要求。瀏覽器端的腳本語言還包括用於Flash動畫的ActionScript等。

  爲了在網頁中使用JavaScript,開發者一般會把JavaScript腳本程序寫在HTML的<script>標籤中。在HTML語法裏,<script>標籤用於定義客戶端腳本,如果需要引用外部腳本文件,可以在src屬性中設置其地址,如下圖所示。

在這裏插入圖片描述

  JavaScript在語法結構上比較類似於C++等面向對象的語言,循環語句、條件語句等也都與Python中的寫法有較大的差異,但其弱類型特點會更符合Python開發者的使用習慣。

  一段簡單的JavaScript腳本程序如下:計算a+b和ab。
//JavaScript示例,計算a+b和a
b。
function add(a,b)
{
var sum=a+b;
console.log(’%d + %d equals to %d’,a,b,sum);
}
function mut(a,b)
{
var prod=a * b;
console.log(’%d * %d equals to %d’,a,b,prod);
}
  使用Chrome開發者模式下的“Console”工具(“Console”一般翻譯爲“控制檯”),輸入並執行這個函數,就可以看到Console對應的輸出,如下圖。

在這裏插入圖片描述

  接下來通過下面的例子展示JavaScript的基本概念和語法。
//JavaScript程序,演示JavaScript的基本內容。

var a = 1; //變量都用var關鍵字定義
var myFunction = function(arg1){ //注意這個賦值語句,在JavaScript中,變量和函數的本質是一樣的
arg1 += 1;
return arg1;
}

var myAnotherFunction = function(f, a){ //函數也可作爲另一個函數的參數被傳入
return f(a);
}
console.log(myAnotherFunction(myFunction, 2))

//條件語句
if (a > 0){
a -= 1;
}else if (a == 0){
a -= 2;
}else{
a += 2;
}

//數組
arr = [1, 2, 3];
console.log(arr[1]);

//對象
myAnimal = {
name: “Bob”,
species: “Tiger”,
gender: “Male”,
isAlive: true,
isMammal: true,
}
console.log(myAnimal.gender); //訪問對象的屬性

//匿名函數
myFunctionOp = function(f, a){
return f(a);
}

res = myFunctionOp( //直接再參數處寫上一個函數
function(a){
return a * 2;
},
4)
//可以聯想lambda表達式來理解
console.log(res);//結果爲8
  除了對JavaScript 語法的瞭解,爲了更好地分析和抓取網頁,還需要對目前廣爲流行的JavaScript 第三方庫有簡單的認識。包括jQuery、Prototype、React 等在內的這些JavaScript庫一般會提供豐富的函數和設計完善的使用方法。

  如果要使用jQuery,可以訪問http://jquery.com/download/ ,並將jQuery源碼下載到本地,最後在HTML中引用:

jQuery庫的教程:https://www.runoob.com/jquery/jquery-tutorial.html

  也可使用另一種不必在本地保存.js文件的方法,即使用CDN(見下方代碼)。谷歌、百度、新浪等大型互聯網公司的網站上都會提供常見JavaScript庫的CDN。如果網頁使用了CDN,當用戶向網站服務器請求文件時,CDN會從離用戶最近的服務器上返回響應,這在一定程度上可以提高加載速度。

__【提示】__曾經編寫過網頁的人可能對CDN一詞並不陌生。CDN即Content Delivery Network(內容分發網絡),一般會用於存放供人們共享使用的代碼。Google的API服務器即提供了存放jQuery等JavaScript庫的CDN。這是比較狹義的CDN含義,實際上CDN的用途不止“支持JavaScript腳本”一項。

6.3 AJAX

  AJAX技術與其說是一種“技術”,不如說是一種“方案”。如上文所述,在網頁中使用JavaScript 加載頁面中數據的過程,都可以看作AJAX技術。AJAX技術改變了過去用戶瀏覽網站時一個請求對應一個頁面的模式,允許瀏覽器通過異步請求來獲取數據,從而使得一個頁面能夠呈現並容納更多的內容,同時也就意味着更多的功能。只要用戶使用的是主流的瀏覽器,同時允許瀏覽器執行JavaScript,用戶就能夠享受網頁中的AJAX內容。

  AJAX技術在逐漸流行的同時,也面臨着一些批評和意見。由於JavaScript本身是作爲客戶端腳本語言在瀏覽器的基礎上執行的,因此,瀏覽器兼容性成爲不可忽視的問題。另外,由於JavaScript在某種程度上實現了業務邏輯的分離(此前的業務邏輯統一由服務器端實現),因此在代碼維護上也存在一些效率問題。但總體而言,AJAX技術已經成爲現代網站技術中的中流砥柱,受到了廣泛的歡迎。AJAX目前的使用場景十分廣泛,很多時候普通用戶甚至察覺不到網頁正在使用AJAX技術。
以知乎的首頁信息流爲例,與用戶的主要交互方式就是用戶通過下拉頁面(具體操作可通過鼠標滾輪、拖動滾動條等實現)查看更多動態,而在一部分動態(對於知乎而言包括被關注用戶的點贊和回答等)展示完畢後,就會顯示一段加載動畫並呈現後續的動態內容。此處的頁面動畫其實只是“障眼法”,在這個過程中,JavasScript腳本已向服務器請求發送相關數據,並最終加載到頁面之中。這時頁面顯然沒有進行全部刷新,而是隻“新”刷新了一部分,通過這種異步加載的方式完成了對新內容的獲取和呈現,這個過程就是典型的AJAX應用。

  比較尷尬的是,爬蟲一般不能執行包括“加載新內容”或者“跳到下一頁”等功能在內的各類寫在網頁中的JavaScript代碼。如本節開頭所述,爬蟲會獲取網站的原始HTML頁面,由於它沒有像瀏覽器一樣的執行JavaScript腳本的能力,因此也就不會爲網頁運行JavaScript。最終,爬蟲爬取到的結果就會和瀏覽器裏顯示的結果有所差異,很多時候便不能直接獲得想要的關鍵信息。爲解決這個尷尬處境,基於Python編寫的爬蟲程序可以做出兩種改進,一種是通過分析AJAX內容(需要開發者手動觀察和實驗),觀察其請求目標、請求內容和請求的參數等信息,最終編寫程序來模擬這樣的JavaScript 請求,從而獲取信息(這個過程也可以叫作“逆向工程”)。另外一種方式則比較取巧,那就是直接模擬出瀏覽器環境,使得程序得以通過瀏覽器模擬工具“移花接木”,最終通過瀏覽器渲染後的頁面來獲得信息。這兩種方式的選擇與JavaScript在網頁中的具體使用方法有關。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章