之前寫了一個拉取nba球員的腳本,是針對hupu網站上的數據進行拉取。由於水平一般,代碼寫的簡單粗暴
hupu球隊\球員信息的鏈接
可以看到球員球隊信息。爬出所有球隊的鏈接
teamList = []
response = urllib2.urlopen("http://g.hupu.com/nba/players/")
html = response.read()
def getTeams():
Items = re.findall('<span class="team_name"><a href=".*?</a></span>',html,re.S)
for item in Items:
link = item.replace('<span class="team_name"><a href="','')
team = re.findall('">.*?</a></span>',link,re.S)[0]
link = 'http://g.hupu.com/'+link.replace(team,'')
team = team.replace('">','').replace('</a></span>','')
teamList.append(teamLink(team,link))
然後再爬出每個球員的詳細頁面
for team in teamList:
getPlayers(team)
並且獲得數據,存入數據庫