如果你覺得幫到了你,請點個贊
首先說說,因爲列表元素裏面包含反斜槓,所以在輸出的時候存在亂碼現象,於是我就拿字符串替換了一下反斜槓,因爲真正的目的是拿列表裏的元素對應字典的鍵,用來得到鍵值。但是在利用for循環的時候發現不能無縫銜接,只能添加最後一個元素,所以後來我查看我的博客裏面的字符串,查看到用join方法將列表裏的元素連接起來
原理:
list = ['2','0','2','0','-','4','-','2','8']
date="".join(list)
print(date)
效果圖:
接下來通過爬取一個網頁的價格(這個價格)來查看這個原理的應用:
原代碼:這是獲取一個頁面內的價格的python代碼,建立了價格字典,但是字典內的鍵會經常變,因爲網頁會更新,應該是防爬吧,用到了xpath模塊,想了解xpath的使用方法,可以看看這篇博主的博客
# -*- coding: utf-8 -*-
import scrapy
import requests
import random
from bs4 import BeautifulSoup
from lxml import etree
from fake_useragent import UserAgent
user_agent=UserAgent().random
headers={'User-Agent':user_agent}
url = 'http://www.dianping.com/beijing/ch65/g180'
res = requests.get(url,headers=headers)
resource = res.text
soup =BeautifulSoup(resource,'lxml')
book = etree.HTML(str(soup))
lis = book.xpath("//div[@id='shop-all-list']//li")
price_dict = {
"1": "1",
"uf2f6": "2",
"uf48e": "3",
"ue0f0": "4",
"uf1ba": "5",
"uec71": "6",
"ue208": "7",
"ue0fb": "8",
"uf5b0": "9",
"uf0f3": "0",
}
for li in lis:
price = li.xpath(".//a[@class='mean-price']/b//text()")
#price = str(price).encode('ISO-8859-1').decode('gbk')
#print(len(price))
#price_list = []
if price:
#print(price)
if len(price) == 1:
print(price)
else:
for i in range(1,len(price)):
text=str(price[i].encode('raw_unicode_escape').replace(b'\\',b''),'utf-8')
if text in price_dict.keys():
text = price_dict[text]
newPrice = "".join(price_list)
print(price[0]+newPrice)
我發現這樣的代碼,在通過使用join方法連接之後總是連接最後一個數字,出來的價格總是殘缺的,即使改成下面這樣也不行
# -*- coding: utf-8 -*-
import scrapy
import requests
import random
from bs4 import BeautifulSoup
from lxml import etree
from fake_useragent import UserAgent
user_agent=UserAgent().random
headers={'User-Agent':user_agent}
url = 'http://www.dianping.com/beijing/ch65/g180'
res = requests.get(url,headers=headers)
resource = res.text
soup =BeautifulSoup(resource,'lxml')
book = etree.HTML(str(soup))
lis = book.xpath("//div[@id='shop-all-list']//li")
price_dict = {
"1": "1",
"uf2f6": "2",
"uf48e": "3",
"ue0f0": "4",
"uf1ba": "5",
"uec71": "6",
"ue208": "7",
"ue0fb": "8",
"uf5b0": "9",
"uf0f3": "0",
}
for li in lis:
newPrice = ""
price = li.xpath(".//a[@class='mean-price']/b//text()")
#price = str(price).encode('ISO-8859-1').decode('gbk')
#print(len(price))
price_list = []
if price:
#print(price)
if len(price) == 1:
print(price)
else:
for i in range(1,len(price)):
text=str(price[i].encode('raw_unicode_escape').replace(b'\\',b''),'utf-8')
if text in price_dict.keys():
text = price_dict[text]
price_list.append(text)
newPrice = newPrice.join(price_list)
print(price[0]+newPrice)
正確代碼: 其中還涉及到了轉碼類型
# -*- coding: utf-8 -*-
import scrapy
import requests #引入requests下載模塊
import random #引入rondom隨機模塊
from bs4 import BeautifulSoup #引入解析模塊
from lxml import etree #引入lxml解析模塊
from fake_useragent import UserAgent #請求頭模塊
user_agent=UserAgent().random
headers={'User-Agent':user_agent}
url = 'http://www.dianping.com/beijing/ch65/g180'
res = requests.get(url,headers=headers) #傳入url,以及請求頭
resource = res.text
soup =BeautifulSoup(resource,'lxml') #解析爲lxml格式的文本
book = etree.HTML(str(soup))
lis = book.xpath("//div[@id='shop-all-list']//li")
price_dict = {
"1": "1",
"ue7cd": "2",
"ueb07": "3",
"uecfc": "4",
"ue64f": "5",
"ue314": "6",
"uf701": "7",
"uf839": "8",
"uf2fb": "9",
"ue9d4": "0",
}
item = {}
for li in lis:
#newPrice = ""
price = li.xpath(".//a[@class='mean-price']/b//text()")
#price = str(price).encode('ISO-8859-1').decode('gbk')
#print(len(price))
price_list = []
if price:
if len(price) == 1:
print(price[0])
else:
for i in range(1,len(price)):
text=str(price[i].encode('raw_unicode_escape').replace(b'\\',b''),'utf-8')
if text in price_dict.keys():
text = price_dict[text]
price_list.append(text)
newPrice = "".join(price_list)
print(price[0]+newPrice)
下面是運行結果圖: