第一天

python基礎

第一天講一些Python入門學習，數據類型、循環的操作以及人工智能概述。

1.基礎操作

age = 20  		# 聲明一個變量age 用來存儲一個數字 20
1+1		        # 基礎數學加法
print('Hello World!')   # 打印Hello World!

2.條件判斷if

if 1 == 2: # 如果 if 跟隨的條件爲 假 那麼不執行屬於if 的語句,然後尋找 else
    print("假的")
else: # 尋找到 else 之後 執行屬於else中的語句
    print("1==2是假的")

3.循環操作—for

for i in range(5):
    print(i)

3.循環操作—while

sum = 0
n = 99
while n > 0:
    sum = sum + n
    n = n - 1
print(sum)

4.break、continue、pass

break語句可以跳出 for 和 while 的循環體

n = 1
while n <= 100:
    if n > 10:
        break
    print(n)
    n += 1

continue語句跳過當前循環，直接進行下一輪循環

n = 1
while n < 10:
    n = n + 1
    if n % 2 == 0:
        continue
    print(n)

pass是空語句，一般用做佔位語句，不做任何事情

 for letter in 'Room':
    if letter == 'o':
        pass
        print('pass')
    print(letter)

5.數據類型—Number(數字)

Python支持int, float, complex三種不同的數字類型

a = 3
b = 3.14
c = 3 + 4j
print(type(a), type(b), type(c))

5.數據類型—String（字符串）

支持字符串拼接、截取等多種運算

a = "Hello"
b = "Python"
print("a + b 輸出結果：", a + b)
print("a[1:4] 輸出結果：", a[1:4])

5.數據類型—Tuple（元組）

tuple與list類似，不同之處在於tuple的元素不能修改。tuple寫在小括號裏，元素之間用逗號隔開。元組的元素不可變，但可以包含可變對象，如list。

t1 = ('abcd', 786 , 2.23, 'runoob', 70.2)
t2 = (1, )
t3 = ('a', 'b', ['A', 'B'])
t3[2][0] = 'X'
print(t3)

5.數據類型—dict（字典）

字典是無序的對象集合，使用鍵-值（key-value）存儲，具有極快的查找速度。
鍵(key)必須使用不可變類型。
同一個字典中，鍵(key)必須是唯一的。

d = {'Michael': 95, 'Bob': 75, 'Tracy': 85}
print(d['Michael'])

5.數據類型—set（集合）

set和dict類似，也是一組key的集合，但不存儲value。由於key不能重複，所以，在set中，沒有重複的key。
set是無序的，重複元素在set中自動被過濾。

s = set([1, 1, 2, 2, 3, 3])
print(s)

第一天作業

答案：

for x in range(1,10):
    for y in range(1,x+1):
        print("%s*%s=%s" % (y,x,x*y),end=" ")
    print("")#print默認參數"換行",沒有此條語句輸出打印時將不會換行

結果展示

1*1=1 
1*2=2 2*2=4 
1*3=3 2*3=6 3*3=9 
1*4=4 2*4=8 3*4=12 4*4=16 
1*5=5 2*5=10 3*5=15 4*5=20 5*5=25 
1*6=6 2*6=12 3*6=18 4*6=24 5*6=30 6*6=36 
1*7=7 2*7=14 3*7=21 4*7=28 5*7=35 6*7=42 7*7=49 
1*8=8 2*8=16 3*8=24 4*8=32 5*8=40 6*8=48 7*8=56 8*8=64 
1*9=9 2*9=18 3*9=27 4*9=36 5*9=45 6*9=54 7*9=63 8*9=72 9*9=81

作業二：查找特定名稱文件

遍歷”Day1-homework”目錄下文件；

找到文件名包含“2020”的文件；

將文件名保存到數組result中；

按照序號、文件名分行打印輸出。

注意：提交作業時要有代碼執行輸出結果。

#導入OS模塊
import os
#待搜索的目錄路徑
path = "Day1-homework"
#待搜索的名稱
filename = "2020"
#定義保存結果的數組
result = []
index =0
def findfiles(path):
    #在這裏寫下您的查找文件代碼吧！
    # 
    
    dir_list =os.listdir(path)
    #print(dir_list)
    for i in dir_list:
        path_temp = i
        path_name = os.path.join(path,i)
        #print(path_temp)
        #print(path_name)
        #print(path)
        global index
        if os.path.isdir(path_name):
            findfiles(path_name)
        elif os.path.isfile(path_name):
            #print(path_temp.find(filename))
            if(path_temp.find(filename) != -1):
                result =[]
                #print(index)
                index = index +1
                result.append(index)
                #print(index)
                result.append(path_name)
                if (len(result) > 0):
                    print(result)
if __name__ == '__main__':
    findfiles(path)

結果展示

[1, 'Day1-homework/4/22/04:22:2020.txt']
[2, 'Day1-homework/26/26/new2020.txt']
[3, 'Day1-homework/18/182020.doc']

第二天

python進階學習

1. Python數據結構

數字

Python Number 數據類型用於存儲數值。

Python Number 數據類型用於存儲數值，包括整型、長整型、浮點型、複數。

（1）Python math 模塊：Python 中數學運算常用的函數基本都在 math 模塊

（2）Python隨機數 :首先import random，使用random()方法即可隨機生成一個[0,1)範圍內的實數

字符串

字符串連接：+
重複輸出字符串：
通過索引獲取字符串中字符[]
判斷字符串中是否包含給定的字符: in, not in
字符串截取[:] 牢記：左開右閉
join():以字符作爲分隔符，將字符串中所有的元素合併爲一個新的字符串
三引號讓程序員從引號和特殊字符串的泥潭裏面解脫出來，自始至終保持一小塊字符串的格式是所謂的WYSIWYG（所見即所得）格式的。

列表

作用：類似其他語言中的數組

聲明一個列表

names = ['jack','tom','tonney','superman','jay']

通過下標或索引獲取元素

print(names[0])
print(names[1])

獲取最後一個元素

print(names[-1])
print(names[len(names)-1])

獲取第一個元素

print(names[-5])

遍歷列表，獲取元素

for name in names:
    print(name)

查詢names裏面有沒有superman

for name in names:
    if name == 'superman':
        print('有超人')
        break
else:
    print('有超人')

更簡單的方法,來查詢names裏有沒有superman

if 'superman' in names:
    print('有超人')
else:
    print('有超人')

列表元素添加

聲明一個空列表

girls = []

append(),末尾追加

girls.append('楊超越')
print(girls)

extend(),一次添加多個。把一個列表添加到另一個列表，列表合併。

models = ['劉雯','奚夢瑤']
girls.extend(models)
#girls = girls + models
print(girls)

insert():指定位置添加

girls.insert(1,'虞書欣')
print(girls)

列表元素修改,通過下標找到元素，然後用=賦值

fruits = ['apple','pear','香蕉','pineapple','草莓']
print(fruits)
fruits[-1] = 'strawberry'
print(fruits)

將fruits列表中的‘香蕉’替換爲‘banana’

for fruit in fruits:
    if '香蕉' in fruit:
        fruit = 'banana'
print(fruits)

for i in range(len(fruits)):
    if '香蕉' in fruits[i]:
        fruits[i] = 'banana'
        break
print(fruits)

列表元素刪除

words = ['cat','hello','pen','pencil','ruler']
del words[1]
print(words)
words = ['cat','hello','pen','pencil','ruler']
words.remove('cat')
print(words)
words = ['cat','hello','pen','pencil','ruler']
words.pop(1)
print(words)

列表切片

在Python中處理列表的部分元素，稱之爲切片。

創建切片，可指定要使用的第一個元素和最後一個元素的索引。注意：左開右閉

將截取的結果再次存放在一個列表中，所以還是返回列表

animals = ['cat','dog','tiger','snake','mouse','bird']
print(animals[2:5])
print(animals[-1:])
print(animals[-3:-1])
print(animals[-5:-1:2])
print(animals[::2])

列表排序

生成10個不同的隨機整數，並存至列表中

import  random
random_list = []
for i in range(10):
    ran = random.randint(1,20)
    if ran not in  random_list:
        random_list.append(ran)
print(random_list)

上述代碼存在什麼問題嗎？

import  random

random_list = []
i = 0
while i < 10:
    ran = random.randint(1,20)
    if ran not in  random_list:
        random_list.append(ran)
        i+=1
print(random_list)

默認升序

new_list = sorted(random_list)
print(new_list)

降序

new_list = sorted(random_list,reverse =True)
print(new_list)

元組

與列表類似，元祖中的內容不可修改

tuple1 = ()
print(type(tuple1))
tuple2 = ('hello')
print(type(tuple2))

注意：元組中只有一個元素時，需要在後面加逗號！

tuple3 = ('hello',)
print(type(tuple3))

元組不能修改，所以不存在往元組裏加入元素。
那作爲容器的元組，如何存放元素？

import random
random_list = []
for i in range(10):
    ran = random.randint(1,20)
    random_list.append(ran)
print(random_list)
random_tuple = tuple(random_list)
print(random_tuple)

元組訪問

print(random_tuple)
print(random_tuple[0])
print(random_tuple[-1])
print(random_tuple[1:-3])
print(random_tuple[::-1])

元組的修改：

t1 = (1,2,3)+(4,5)
print(t1)
t2 = (1,2) * 2
print(t2)

元組的一些函數：

print(max(random_tuple))
print(min(random_tuple))
print(sum(random_tuple))
print(len(random_tuple))

統計元組中4的個數

print(random_tuple.count(4))

元組中4所對應的下標，如果不存在，則會報錯

print(random_tuple.index(4))

判斷元組中是否存在1這個元素

print(4 in random_tuple)

返回元組中4所對應的下標,不會報錯

if(4 in random_tuple):
    print(random_tuple.index(4))

元組的拆包與裝包

定義一個元組

t3 = (1,2,3)

將元組賦值給變量a,b,c

a,b,c = t3

打印a,b,c

print(a,b,c)

當元組中元素個數與變量個數不一致時

定義一個元組，包含5個元素

t4 = (1,2,3,4,5)

將t4[0],t4[1]分別賦值給a,b;其餘的元素裝包後賦值給c

a,b,*c = t4
print(a,b,c)
print(c)
print(*c)

字典

#定義一個空字典
dict1 = {}
dict2 = {'name':'楊超越','weight':45,'age':25}
print(dict2['name'])
#list可以轉成字典，但前提是列表中元素都要成對出現
dict3 = dict([('name','楊超越'),('weight',45)])
print(dict3)
dict4 = {}
dict4['name'] = '虞書欣'
dict4['weight'] = 43
print(dict4)
dict4['weight'] = 44
print(dict4)
#字典裏的函數 items()  keys() values()
dict5 = {'楊超越':165,'虞書欣':166,'上官喜愛':164}
print(dict5.items())
for key,value in dict5.items():
    if value > 165:
        print(key)
#values() 取出字典中所有的值,保存到列表中
results = dict5.values()
print(results)
#求小姐姐的平均身高
heights = dict5.values()
print(heights)
total = sum(heights)
avg = total/len(heights)
print(avg)
names = dict5.keys()
print(names)
#print(dict5['趙小棠'])       
print(dict5.get('趙小棠'))
print(dict5.get('趙小棠',170)) #如果能夠取到值，則返回字典中的值，否則返回默認值170
dict6 = {'楊超越':165,'虞書欣':166,'上官喜愛':164}
del dict6['楊超越']
print(dict6)
result = dict6.pop('虞書欣')
print(result)
print(dict6)

Python面向對象

定義一個類Animals:

(1)init()定義構造函數，與其他面嚮對象語言不同的是，Python語言中，會明確地把代表自身實例的self作爲第一個參數傳入

(2)創建一個實例化對象 cat，init()方法接收參數

(3)使用點號 . 來訪問對象的屬性。

class Animal:

    def __init__(self,name):
        self.name = name
        print('動物名稱實例化')
    def eat(self):
        print(self.name +'要吃東西啦！')
    def drink(self):
        print(self.name +'要喝水啦！')
cat =  Animal('miaomiao')
print(cat.name)
cat.eat()
cat.drink()
class Person:        
    def __init__(self,name):
        self.name = name
        print ('調用父類構造函數')

    def eat(self):
        print('調用父類方法')
class Student(Person):  # 定義子類
   def __init__(self):
      print ('調用子類構造方法')
 
   def study(self):
      print('調用子類方法')
s = Student()          # 實例化子類
s.study()              # 調用子類的方法
s.eat()                # 調用父類方法

Python JSON

JSON(JavaScript Object Notation) 是一種輕量級的數據交換格式，易於人閱讀和編寫。

json.dumps 用於將 Python 對象編碼成 JSON 字符串。

import json
data = [ { 'b' : 2, 'd' : 4, 'a' : 1, 'c' : 3, 'e' : 5 } ]
json = json.dumps(data)
print(json)

爲了提高可讀性，dumps方法提供了一些可選的參數。

sort_keys=True表示按照字典排序(a到z)輸出。
indent參數，代表縮進的位數
separators參數的作用是去掉,和:後面的空格，傳輸過程中數據越精簡越好

import json
data = [ { 'b' : 2, 'd' : 4, 'a' : 1, 'c' : 3, 'e' : 5 } ]
json = json.dumps(data, sort_keys=True, indent=4,separators=(',', ':'))
print(json)

json.loads 用於解碼 JSON 數據。該函數返回 Python 字段的數據類型。

import json
jsonData = '{"a":1,"b":2,"c":3,"d":4,"e":5}'
text = json.loads(jsonData)  #將string轉換爲dict
print(text)

Python異常處理

當Python腳本發生異常時我們需要捕獲處理它，否則程序會終止執行。

捕捉異常可以使用try/except語句。

try/except語句用來檢測try語句塊中的錯誤，從而讓except語句捕獲異常信息並處理。

try:
    fh = open("/home/aistudio/data/testfile01.txt", "w")
    fh.write("這是一個測試文件，用於測試異常!!")
except IOError:
    print('Error: 沒有找到文件或讀取文件失敗')
else:
    print ('內容寫入文件成功')
    fh.close()
finally中的內容，退出try時總會執行

try:
    f = open("/home/aistudio/data/testfile02.txt", "w")
    f.write("這是一個測試文件，用於測試異常!!")
finally:
    print('關閉文件')
    f.close()

常見Linux命令

!ls /home
!ls ./
ls  -l
!pwd
cp ：複製文件或目錄
!cp test.txt ./test_copy.txt
mv:移動文件與目錄，或修改文件與目錄的名稱
!mv /home/aistudio/work/test_copy.txt /home/aistudio/data/
rm :移除文件或目錄
!rm /home/aistudio/data/test_copy.txt

很多大型文件或者數據從服務器上傳或者下載的時候都需要打包和壓縮解壓，這時候知道壓縮和解壓的各種命令是很有必要的。

常見的壓縮文件後綴名有.tar.gz，.gz，和.zip，下面來看看在Linux上它們分別的解壓和壓縮命令。

gzip:

linux壓縮文件中最常見的後綴名即爲.gz，gzip是用來壓縮和解壓.gz文件的命令。

常用參數:

-d或–decompress或–uncompress：解壓文件；
-r或–recursive：遞歸壓縮指定文件夾下的文件（該文件夾下的所有文件被壓縮成單獨的.gz文件）；
-v或–verbose：顯示指令執行過程。
注：gzip命令只能壓縮單個文件，而不能把一個文件夾壓縮成一個文件（與打包命令的區別）。
#會將文件壓縮爲文件 test.txt.gz，原來的文件則沒有了，解壓縮也一樣

!gzip /home/aistudio/work/test.txt
In[10]
!gzip -d /home/aistudio/test.gz

tar:

tar本身是一個打包命令，用來打包或者解包後綴名爲.tar。配合參數可同時實現打包和壓縮。

常用參數:

-c或–create：建立新的備份文件；
-x或–extract或–get：從備份文件中還原文件；
-v：顯示指令執行過程；
-f或–file：指定備份文件；
-C：指定目的目錄；
-z：通過gzip指令處理備份文件；
-j：通過bzip2指令處理備份文件。
最常用的是將tar命令與gzip命令組合起來，直接對文件夾先打包後壓縮：

!tar -zcvf /home/aistudio/work/test.tar.gz /home/aistudio/work/test.txt
In[ ]
!tar -zxvf /home/aistudio/work/test.tar.gz

zip和unzip
zip命令和unzip命令用在在Linux上處理.zip的壓縮文件。

zip:

-v：顯示指令執行過程；
-m：不保留原文件；
-r：遞歸處理。
unzip:

-v：顯示指令執行過程；
-d：解壓到指定目錄。

!zip -r /home/aistudio/work/test.zip /home/aistudio/work/test.txt
!unzip  /home/aistudio/work/test.zip

作業
request模塊：

requests是python實現的簡單易用的HTTP庫，官網地址：http://cn.python-requests.org/zh_CN/latest/

requests.get(url)可以發送一個http get請求，返回服務器響應內容。

BeautifulSoup庫：

BeautifulSoup 是一個可以從HTML或XML文件中提取數據的Python庫。網址：https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/

BeautifulSoup支持Python標準庫中的HTML解析器,還支持一些第三方的解析器,其中一個是 lxml。

BeautifulSoup(markup, “html.parser”)或者BeautifulSoup(markup, “lxml”)，推薦使用lxml作爲解析器,因爲效率更高。

一、爬取百度百科中《青春有你2》中所有參賽選手信息，返回頁面數據

import json
import re
import requests
import datetime
from bs4 import BeautifulSoup
import os

#獲取當天的日期,並進行格式化,用於後面文件命名，格式:20200420
today = datetime.date.today().strftime('%Y%m%d')    

def crawl_wiki_data():
    """
    爬取百度百科中《青春有你2》中參賽選手信息，返回html
    """
    headers = { 
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
    }
    url='https://baike.baidu.com/item/青春有你第二季'                         

    try:
        response = requests.get(url,headers=headers)
        print(response.status_code)

        #將一段文檔傳入BeautifulSoup的構造方法,就能得到一個文檔的對象, 可以傳入一段字符串
        soup = BeautifulSoup(response.text,'lxml')
        
        #返回的是class爲table-view log-set-param的<table>所有標籤
        tables = soup.find_all('table',{'class':'table-view log-set-param'})

        crawl_table_title = "參賽學員"

        for table in  tables:           
            #對當前節點前面的標籤和字符串進行查找
            table_titles = table.find_previous('div').find_all('h3')
            for title in table_titles:
                if(crawl_table_title in title):
                    return table       
    except Exception as e:
        print(e)

二、對爬取的頁面數據進行解析，並保存爲JSON文件

def parse_wiki_data(table_html):
    '''
    從百度百科返回的html中解析得到選手信息，以當前日期作爲文件名，存JSON文件,保存到work目錄下
    '''
    bs = BeautifulSoup(str(table_html),'lxml')
    all_trs = bs.find_all('tr')

    error_list = ['\'','\"']

    stars = []

    for tr in all_trs[1:]:
         all_tds = tr.find_all('td')

         star = {}

         #姓名
         star["name"]=all_tds[0].text
         #個人百度百科鏈接
         star["link"]= 'https://baike.baidu.com' + all_tds[0].find('a').get('href')
         #籍貫
         star["zone"]=all_tds[1].text
         #星座
         star["constellation"]=all_tds[2].text
         #身高
         star["height"]=all_tds[3].text
         #體重
         star["weight"]= all_tds[4].text

         #花語,去除掉花語中的單引號或雙引號
         flower_word = all_tds[5].text
         for c in flower_word:
             if  c in error_list:
                 flower_word=flower_word.replace(c,'')
         star["flower_word"]=flower_word 
         
         #公司
         if not all_tds[6].find('a') is  None:
             star["company"]= all_tds[6].find('a').text
         else:
             star["company"]= all_tds[6].text  

         stars.append(star)

    json_data = json.loads(str(stars).replace("\'","\""))   
    with open('work/' + today + '.json', 'w', encoding='UTF-8') as f:
        json.dump(json_data, f, ensure_ascii=False)

三、爬取每個選手的百度百科圖片，並進行保存

def crawl_pic_urls():
    '''
    爬取每個選手的百度百科圖片，並保存
    ''' 
    with open('work/'+ today + '.json', 'r', encoding='UTF-8') as file:
         json_array = json.loads(file.read())
    headers = { 
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36' 
     }
    for star in json_array:

        name = star['name']
        link = star['link']
        #！！！請在以下完成對每個選手圖片的爬取，將所有圖片url存儲在一個列表pic_urls中！！！
        response = requests.get(link,headers=headers)
        bs=BeautifulSoup(response.txt,'lxml') ##獲取網頁數據
        pic_list_url =bs.select(".summary-pic a")[0].get('href')
        pic_list_url = "https://baike.baidu.com"+pic_list_url
        response1 =requests.get(pic_list_url,headers=headers)
        bs1=BeautifulSoup(response1.txt,'lxml')
        pic_hmtls = soup.select('.pic-list img')

        #！！！根據圖片鏈接列表pic_urls, 下載所有圖片，保存在以name命名的文件夾中！！！
        pic_urls= []
        for pic_html in pic_hmtls:
            pic_url = pic_html.get('src')
            pic_urls.append(pic_url)
        down_pic(name,pic_urls)

def down_pic(name,pic_urls):
    '''
    根據圖片鏈接列表pic_urls, 下載所有圖片，保存在以name命名的文件夾中,
    '''
    path = 'work/'+'pics/'+name+'/'

    if not os.path.exists(path):
      os.makedirs(path)

    for i, pic_url in enumerate(pic_urls):
        try:
            pic = requests.get(pic_url, timeout=15)
            string = str(i + 1) + '.jpg'
            with open(path+string, 'wb') as f:
                f.write(pic.content)
                print('成功下載第%s張圖片: %s' % (str(i + 1), str(pic_url)))
        except Exception as e:
            print('下載第%s張圖片時失敗: %s' % (str(i + 1), str(pic_url)))
            print(e)
            continue

四、打印爬取的所有圖片的路徑

def show_pic_path(path):
    '''
    遍歷所爬取的每張圖片，並打印所有圖片的絕對路徑
    '''
    pic_num = 0
    for (dirpath,dirnames,filenames) in os.walk(path):
        for filename in filenames:
           pic_num += 1
           print("第%d張照片：%s" % (pic_num,os.path.join(dirpath,filename)))           
    print("共爬取《青春有你2》選手的%d照片" % pic_num)

if __name__ == '__main__':

     #爬取百度百科中《青春有你2》中參賽選手信息，返回html
     html = crawl_wiki_data()

     #解析html,得到選手信息，保存爲json文件
     parse_wiki_data(html)

     #從每個選手的百度百科頁面上爬取圖片,並保存
     crawl_pic_urls()

     #打印所爬取的選手圖片路徑
     show_pic_path('/home/aistudio/work/pics/')

     print("所有信息爬取完成！")