本項目採用的是https://github.com/Yixiaohan/show-me-the-code中所提供的練習項目,所有代碼均爲原創,轉載請註明,謝謝。
問題描述:給出一個html文件,找出裏面的鏈接,其具體代碼如下:
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 10 13:18:55 2017
@author: sky
"""
import urllib2
from bs4 import BeautifulSoup
url = "http://www.baidu.com"
page = urllib2.urlopen(url)
file = open('result.txt','w')
soup = BeautifulSoup(page)
links = soup.findAll('a')
for link in links:
a = link['href']
file.write(str(a))
print(link['href'])
file.close()