爬蟲實(shí)戰(zhàn)4-Selenium和bs4聯(lián)合使用

摘要:使用Selenium模擬瀏覽器行為完成網(wǎng)站登錄,使用bs4解析html文件,取得所需文本,最后寫入csv文集,使用excel查看。

00 導(dǎo)入相關(guān)庫

import time
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from bs4 import BeautifulSoup

01 打開網(wǎng)站

dri=webdriver.Firefox(executable_path=r'D:\geckodriver.exe')
url='https://account.jishulink.com/login'
dri.get(url)
time.sleep(1)

02 完成登錄

dri.find_element_by_css_selector(".login-list > li:nth-child(1) > input:nth-child(2)").clear()
dri.find_element_by_css_selector(".login-list > li:nth-child(1) > input:nth-child(2)").send_keys('xxxx')
dri.find_element_by_css_selector(".login-list > li:nth-child(2) > input:nth-child(2)").clear()                                
dri.find_element_by_css_selector(".login-list > li:nth-child(2) > input:nth-child(2)").send_keys('xxxx')
time.sleep(0.1)
dri.find_element_by_css_selector(".login-btnStyle").click()
time.sleep(1) 

03 進(jìn)入目的頁面

mouse=dri.find_element_by_css_selector('#personParent > img:nth-child(1)')
ActionChains(dri).move_to_element(mouse).perform()                                      
time.sleep(1)
dri.find_element_by_css_selector('.top-p-link01 > a:nth-child(1)').click()
time.sleep(1)
dri.find_element_by_css_selector('.myContent-tab > li:nth-child(2) > a:nth-child(1)').click()
time.sleep(0.2)

04 從當(dāng)前頁面獲取文本

html=dri.page_source
soup=BeautifulSoup(html,'lxml')
tlist0=soup.find_all('h2',attrs={'ng-if':'post.subject'})
tt=[]
for tlist1 in tlist0:
    tlist2=tlist1.find('a',attrs={'ng-href':True})
    tt.append(tlist2.string.strip())

05 重復(fù)獲取剩下16個(gè)頁面的文本

for i in range(16):
    dri.find_element_by_css_selector('.page > div:nth-child(1) > a:nth-child(12)').click()
    time.sleep(1)
    html=dri.page_source
    soup=BeautifulSoup(html,'lxml')
    tlist0=soup.find_all('h2',attrs={'ng-if':'post.subject'})
    for tlist1 in tlist0:
        tlist2=tlist1.find('a',attrs={'ng-href':True})
        tt.append(tlist2.string.strip())

06 運(yùn)行結(jié)果如下

爬蟲實(shí)戰(zhàn)4-Selenium和bs4聯(lián)合使用的圖1

07 寫入csv文件

import csv
with open('jsl.csv','w',newline='',encoding='GB2312') as csvobj:
    csvfile=csv.writer(csvobj)
    csvfile.writerow(tt)
    csvobj.close()

08 使用excel查看csv文件

爬蟲實(shí)戰(zhàn)4-Selenium和bs4聯(lián)合使用的圖2

登錄后免費(fèi)查看全文
立即登錄
App下載
技術(shù)鄰APP
工程師必備
  • 項(xiàng)目客服
  • 培訓(xùn)客服
  • 平臺(tái)客服

TOP

1