Kotlin+Selenium爬取 文山學院 的 通知公告
0
IntelliJ IDEA + Kotlin + AdoptOpenJDK(12)
1
- Create New Project
- Java>JVM/Kotlin>Project Name>[Type you project name]
2
3
- Ctrl+Alt+Shift+S
- Modules>Dependencies>+>Library>From Maven
- Search the following and click "Ok"
org.seleniumhq.selenium:selenium-java:3.141.59
Or edit your iml file>
<?xml version="1.0" encoding="UTF-8"?>
<module type="JAVA_MODULE" version="4">
<component name="NewModuleRootManager" inherit-compiler-output="true">
<exclude-output />
<content url="file://$MODULE_DIR$">
<sourceFolder url="file://$MODULE_DIR$/src" isTestSource="false" />
</content>
<orderEntry type="inheritedJdk" />
<orderEntry type="sourceFolder" forTests="false" />
<orderEntry type="library" name="KotlinJavaRuntime" level="project" />
<orderEntry type="library" name="org.seleniumhq.selenium:selenium-java:3.141.59" level="project" />
</component>
</module>
4 My code is here
package hello
import org.openqa.selenium.By
import org.openqa.selenium.WebDriver
import org.openqa.selenium.chrome.ChromeDriver
fun main() {
var values=Values()
/**
* You need to install Chrome or use another driver of your browser
*/
var browser=ChromeDriver()
// Test
browser.get(values.noticeUrl)
// println(browser.pageSource)
for (a in browser.findElementsByTagName("img")){
println(a.getAttribute("src"))
}
// Test finish, it's work well
// You can optimize it here(custom you finish page)
for (i in 1..16){
var url=if (i==16) values.noticeUrl else values.noticeWithPage(i)
browser.get(values.noticeWithPage(i))
for (li in browser.findElementsByXPath("/html/body/div[5]/div[2]/div[2]/ul/li")){
var a=li.findElement(By.tagName("a"))
println(li.findElement(By.tagName("i")).getAttribute("innerText")
+
a.getAttribute("innerText"))
println("\t"+a.getAttribute("href"))
}
}
}
//All values
class Values(){
var url="http://www.wsu.edu.cn/"
var noticeUrl="http://www.wsu.edu.cn/index/rdwz.htm"
// The lambda expression
var noticeWithPage={page:Int->"http://www.wsu.edu.cn/index/rdwz/$page.htm"}
}
Any other questions?
- My email: [email protected]