(上)python3 selenium3 從框架實現代碼學習selenium讓你事半功倍

本文感謝以下文檔或說明提供的參考。
Selenium-Python中文文檔
Selenium Documentation
Webdriver 參考

如有錯誤歡迎在評論區指出,作者將即時更改。

環境說明

  • 操作系統:Windows7 SP1 64
  • python 版本:3.7.7
  • 瀏覽器:谷歌瀏覽器
  • 瀏覽器版本: 80.0.3987 (64 位)
  • 谷歌瀏覽器驅動:驅動版本需要對應瀏覽器版本,不同的瀏覽器使用對應不同版本的驅動,點擊下載
  • 如果是使用火狐瀏覽器,查看火狐瀏覽器版本,點擊 GitHub火狐驅動下載地址 下載(英文不好的同學右鍵一鍵翻譯即可,每個版本都有對應瀏覽器版本的使用說明,看清楚下載即可)

簡介

Selenium是一個涵蓋了一系列工具和庫的總體項目,這些工具和庫支持Web瀏覽器的自動化。並且在執行自動化時,所進行的操作會像真實用戶操作一樣。

Selenium有3個版本,分別是 Selenium 1.0、Selenium2.0、Selenium3.0;

Selenium 1.0 主要是調用JS注入到瀏覽器;最開始Selenium的作者Jason Huggins開發了JavaScriptTestRunner作爲測試工具,當時向多位同事進行了展示(這個作者也是個很有趣的靈魂)。從這個測試工具的名字上可以看出,是基於JavaScript進行的測試。這個工具也就是Selenium的“前身”。

Selenium 2.0 基於 WebDriver 提供的API,進行瀏覽器的元素操作。WebDriver 是一個測試框架也可以說是一個集成的API接口庫。

Selenium 3.0 基於 Selenium 2.0 進行擴展,基本差別不大;本文將以Selenium 3.0 版本進行技術說明。

在官方介紹中介紹了有關支持瀏覽器的說明:“通過WebDriver,Selenium支持市場上所有主流瀏覽器,例如Chrom(ium),Firefox,Internet Explorer,Opera和Safari。

簡單開始

安裝好環境後,簡單的使用selenium讓瀏覽器打開CSDN官網。
在環境配置時需要注意:必須把驅動給配置到系統環境,或者丟到你python的根目錄下。

首先引入 webdriver :

from selenium.webdriver import Chrome

當然也可以:

from selenium import webdriver

引入方式因人而異,之後使用不同的方法新建不同的實例。

from selenium.webdriver import Chrome
driver = Chrome()

或者

from selenium import webdriver
driver = webdriver.Chrome()

一般性的python語法將不會在下文贅述。
之前所提到,需要把驅動配置到系統環境之中,但不外乎由於其它原因導致的不能驅動路徑不能加入到系統環境中,在這裏提供一個解決方法:

from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'F:\python\dr\chromedriver_win32\chromedriver.exe')

這裏使用 executable_path 指定驅動地址,這個地址是我驅動所存放的位置。當然這個位置可以根據自己需求制定,並且以更加靈活;本文爲了更好說明,所以使用了絕對路徑傳入。

火狐瀏覽器:

from selenium import webdriver

driver = webdriver.Firefox()
driver.get("http://www.csdn.net")

谷歌瀏覽器:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("http://www.csdn.net")

火狐瀏覽器與谷歌瀏覽器只有實例化方法不同,其它的操作方法均一致。

在代碼最開頭引入 webdriver ,在代碼中實例化瀏覽器對象後,使用get方法請求網址,打開所需要的網址。

實現剖析

查看 webdriver.py 實現(from selenium import webdriver):

import warnings

from selenium.webdriver.remote.webdriver import WebDriver as RemoteWebDriver
from .remote_connection import ChromeRemoteConnection
from .service import Service
from .options import Options


class WebDriver(RemoteWebDriver):
    """
    Controls the ChromeDriver and allows you to drive the browser.

    You will need to download the ChromeDriver executable from
    http://chromedriver.storage.googleapis.com/index.html
    """

    def __init__(self, executable_path="chromedriver", port=0,
                 options=None, service_args=None,
                 desired_capabilities=None, service_log_path=None,
                 chrome_options=None, keep_alive=True):
        """
        Creates a new instance of the chrome driver.

        Starts the service and then creates new instance of chrome driver.

        :Args:
         - executable_path - path to the executable. If the default is used it assumes the executable is in the $PATH
         - port - port you would like the service to run, if left as 0, a free port will be found.
         - options - this takes an instance of ChromeOptions
         - service_args - List of args to pass to the driver service
         - desired_capabilities - Dictionary object with non-browser specific
           capabilities only, such as "proxy" or "loggingPref".
         - service_log_path - Where to log information from the driver.
         - chrome_options - Deprecated argument for options
         - keep_alive - Whether to configure ChromeRemoteConnection to use HTTP keep-alive.
        """
        if chrome_options:
            warnings.warn('use options instead of chrome_options',
                          DeprecationWarning, stacklevel=2)
            options = chrome_options

        if options is None:
            # desired_capabilities stays as passed in
            if desired_capabilities is None:
                desired_capabilities = self.create_options().to_capabilities()
        else:
            if desired_capabilities is None:
                desired_capabilities = options.to_capabilities()
            else:
                desired_capabilities.update(options.to_capabilities())

        self.service = Service(
            executable_path,
            port=port,
            service_args=service_args,
            log_path=service_log_path)
        self.service.start()

        try:
            RemoteWebDriver.__init__(
                self,
                command_executor=ChromeRemoteConnection(
                    remote_server_addr=self.service.service_url,
                    keep_alive=keep_alive),
                desired_capabilities=desired_capabilities)
        except Exception:
            self.quit()
            raise
        self._is_remote = False

    def launch_app(self, id):
        """Launches Chrome app specified by id."""
        return self.execute("launchApp", {'id': id})

    def get_network_conditions(self):
        return self.execute("getNetworkConditions")['value']

    def set_network_conditions(self, **network_conditions):
        self.execute("setNetworkConditions", {
            'network_conditions': network_conditions
        })

    def execute_cdp_cmd(self, cmd, cmd_args):
        return self.execute("executeCdpCommand", {'cmd': cmd, 'params': cmd_args})['value']

    def quit(self):
        try:
            RemoteWebDriver.quit(self)
        except Exception:
            # We don't care about the message because something probably has gone wrong
            pass
        finally:
            self.service.stop()

    def create_options(self):
        return Options()

從註釋中表明這是 “創建chrome驅動程序的新實例,並且創建chrome驅動程序的實例”

在此只列出本篇文章使用到的參數:

  • executable_path:可執行文件的路徑。如果使用默認值,則假定可執行文件位於PATH中;其中的PATH爲系統環境根目錄

在 selenium 實現自動化過程中,必要的一步是啓動服務,查看 init初始化方法中,發現了以下代碼:

self.service = Service(
            executable_path,
            port=port,
            service_args=service_args,
            log_path=service_log_path)
self.service.start()

以上代碼實例化了Service類,並且傳入相關參數,之後啓動服務;在這裏最主要的參數爲 executable_path,也就是啓動驅動。查看 Service 類(selenium.service):

from selenium.webdriver.common import service


class Service(service.Service):
    """
    Object that manages the starting and stopping of the ChromeDriver
    """

    def __init__(self, executable_path, port=0, service_args=None,
                 log_path=None, env=None):
        """
        Creates a new instance of the Service

        :Args:
         - executable_path : Path to the ChromeDriver
         - port : Port the service is running on
         - service_args : List of args to pass to the chromedriver service
         - log_path : Path for the chromedriver service to log to"""

        self.service_args = service_args or []
        if log_path:
            self.service_args.append('--log-path=%s' % log_path)

        service.Service.__init__(self, executable_path, port=port, env=env,
                                 start_error_message="Please see https://sites.google.com/a/chromium.org/chromedriver/home")

    def command_line_args(self):
        return ["--port=%d" % self.port] + self.service_args

查看基類 start 方法實現(由於基類過長不全部展出,基類在selenium.webdriver.common import service 中):

def start(self):
        """
        Starts the Service.

        :Exceptions:
         - WebDriverException : Raised either when it can't start the service
           or when it can't connect to the service
        """
        try:
            cmd = [self.path]
            cmd.extend(self.command_line_args())
            self.process = subprocess.Popen(cmd, env=self.env,
                                            close_fds=platform.system() != 'Windows',
                                            stdout=self.log_file,
                                            stderr=self.log_file,
                                            stdin=PIPE)
        except TypeError:
            raise
        except OSError as err:
            if err.errno == errno.ENOENT:
                raise WebDriverException(
                    "'%s' executable needs to be in PATH. %s" % (
                        os.path.basename(self.path), self.start_error_message)
                )
            elif err.errno == errno.EACCES:
                raise WebDriverException(
                    "'%s' executable may have wrong permissions. %s" % (
                        os.path.basename(self.path), self.start_error_message)
                )
            else:
                raise
        except Exception as e:
            raise WebDriverException(
                "The executable %s needs to be available in the path. %s\n%s" %
                (os.path.basename(self.path), self.start_error_message, str(e)))
        count = 0
        while True:
            self.assert_process_still_running()
            if self.is_connectable():
                break
            count += 1
            time.sleep(1)
            if count == 30:
                raise WebDriverException("Can not connect to the Service %s" % self.path)

其中發現:

try:
      cmd = [self.path]
      cmd.extend(self.command_line_args())
      self.process = subprocess.Popen(cmd, env=self.env,
                                      close_fds=platform.system() != 'Windows',
                                      stdout=self.log_file,
                                      stderr=self.log_file,
                                      stdin=PIPE)
except TypeError:
            raise
        except OSError as err:
            if err.errno == errno.ENOENT:
                raise WebDriverException(
                    "'%s' executable needs to be in PATH. %s" % (
                        os.path.basename(self.path), self.start_error_message)
                )
            elif err.errno == errno.EACCES:
                raise WebDriverException(
                    "'%s' executable may have wrong permissions. %s" % (
                        os.path.basename(self.path), self.start_error_message)
                )
            else:
                raise
        except Exception as e:
            raise WebDriverException(
                "The executable %s needs to be available in the path. %s\n%s" %
                (os.path.basename(self.path), self.start_error_message, str(e)))
        count = 0
        while True:
            self.assert_process_still_running()
            if self.is_connectable():
                break
            count += 1
            time.sleep(1)
            if count == 30:
                raise WebDriverException("Can not connect to the Service %s" % self.path)

啓動子進程開啓驅動。在出現異常時接收拋出異常並且報錯。開啓驅動打開瀏覽器。

在異常拋出檢測到此已知道了selenium如何啓動服務。接下來查看get請求網址的實現流程。
查看webdriver基類(selenium.webdriver.remote.webdriver),找到get方法:

def get(self, url):
    """
    Loads a web page in the current browser session.
    """
    self.execute(Command.GET, {'url': url})

def execute(self, driver_command, params=None):
        """
        Sends a command to be executed by a command.CommandExecutor.

        :Args:
         - driver_command: The name of the command to execute as a string.
         - params: A dictionary of named parameters to send with the command.

        :Returns:
          The command's JSON response loaded into a dictionary object.
        """
        if self.session_id is not None:
            if not params:
                params = {'sessionId': self.session_id}
            elif 'sessionId' not in params:
                params['sessionId'] = self.session_id

        params = self._wrap_value(params)
        response = self.command_executor.execute(driver_command, params)
        if response:
            self.error_handler.check_response(response)
            response['value'] = self._unwrap_value(
                response.get('value', None))
            return response
        # If the server doesn't send a response, assume the command was
        # a success
        return {'success': 0, 'value': None, 'sessionId': self.session_id}

通過get方法得知,調用了 execute 方法,傳入了 Command.GET 與 url。
查看Command.GET的類Command(selenium.webdriver.remote.command)得知,Command爲標準WebDriver命令的常量;找到GET常量:

GET = "get"

從文件上,應該是執行命令方式的類文件。
首先整理一下流程:

  • 啓動服務→調用get方法

其中get方法具體流程:

  • get方法調用execute方法,傳入參數爲 Command.GET與url,查看Command的值是標準常量。 在execute方法中,

其中 execute 的實現爲:

def execute(self, driver_command, params=None):
        """
        Sends a command to be executed by a command.CommandExecutor.

        :Args:
         - driver_command: The name of the command to execute as a string.
         - params: A dictionary of named parameters to send with the command.

        :Returns:
          The command's JSON response loaded into a dictionary object.
        """
        if self.session_id is not None:
            if not params:
                params = {'sessionId': self.session_id}
            elif 'sessionId' not in params:
                params['sessionId'] = self.session_id

        params = self._wrap_value(params)
        response = self.command_executor.execute(driver_command, params)
        if response:
            self.error_handler.check_response(response)
            response['value'] = self._unwrap_value(
                response.get('value', None))
            return response
        # If the server doesn't send a response, assume the command was
        # a success
        return {'success': 0, 'value': None, 'sessionId': self.session_id}

其中核心代碼爲:

params = self._wrap_value(params)
response = self.command_executor.execute(driver_command, params)
if response:
    self.error_handler.check_response(response)
    response['value'] = self._unwrap_value(
        response.get('value', None))
    return response

主要查看:

self.command_executor.execute(driver_command, params)

其中 command_executor 爲初始化後實例,查看派生類 webdriver(selenium import webdriver) command_executor 的實例化爲:

RemoteWebDriver.__init__(
                self,
                command_executor=ChromeRemoteConnection(
                    remote_server_addr=self.service.service_url,
                    keep_alive=keep_alive),
                desired_capabilities=desired_capabilities)

查看 ChromeRemoteConnection 類(selenium import remote_connection):

from selenium.webdriver.remote.remote_connection import RemoteConnection


class ChromeRemoteConnection(RemoteConnection):

    def __init__(self, remote_server_addr, keep_alive=True):
        RemoteConnection.__init__(self, remote_server_addr, keep_alive)
        self._commands["launchApp"] = ('POST', '/session/$sessionId/chromium/launch_app')
        self._commands["setNetworkConditions"] = ('POST', '/session/$sessionId/chromium/network_conditions')
        self._commands["getNetworkConditions"] = ('GET', '/session/$sessionId/chromium/network_conditions')
        self._commands['executeCdpCommand'] = ('POST', '/session/$sessionId/goog/cdp/execute')

得知調用的是基類初始化方法,查看得知 execute 方法實現爲:

def execute(self, command, params):
        """
        Send a command to the remote server.

        Any path subtitutions required for the URL mapped to the command should be
        included in the command parameters.

        :Args:
         - command - A string specifying the command to execute.
         - params - A dictionary of named parameters to send with the command as
           its JSON payload.
        """
        command_info = self._commands[command]
        assert command_info is not None, 'Unrecognised command %s' % command
        path = string.Template(command_info[1]).substitute(params)
        if hasattr(self, 'w3c') and self.w3c and isinstance(params, dict) and 'sessionId' in params:
            del params['sessionId']
        data = utils.dump_json(params)
        url = '%s%s' % (self._url, path)
        return self._request(command_info[0], url, body=data)

    def _request(self, method, url, body=None):
        """
        Send an HTTP request to the remote server.

        :Args:
         - method - A string for the HTTP method to send the request with.
         - url - A string for the URL to send the request to.
         - body - A string for request body. Ignored unless method is POST or PUT.

        :Returns:
          A dictionary with the server's parsed JSON response.
        """
        LOGGER.debug('%s %s %s' % (method, url, body))

        parsed_url = parse.urlparse(url)
        headers = self.get_remote_connection_headers(parsed_url, self.keep_alive)
        resp = None
        if body and method != 'POST' and method != 'PUT':
            body = None

        if self.keep_alive:
            resp = self._conn.request(method, url, body=body, headers=headers)

            statuscode = resp.status
        else:
            http = urllib3.PoolManager(timeout=self._timeout)
            resp = http.request(method, url, body=body, headers=headers)

            statuscode = resp.status
            if not hasattr(resp, 'getheader'):
                if hasattr(resp.headers, 'getheader'):
                    resp.getheader = lambda x: resp.headers.getheader(x)
                elif hasattr(resp.headers, 'get'):
                    resp.getheader = lambda x: resp.headers.get(x)

        data = resp.data.decode('UTF-8')
        try:
            if 300 <= statuscode < 304:
                return self._request('GET', resp.getheader('location'))
            if 399 < statuscode <= 500:
                return {'status': statuscode, 'value': data}
            content_type = []
            if resp.getheader('Content-Type') is not None:
                content_type = resp.getheader('Content-Type').split(';')
            if not any([x.startswith('image/png') for x in content_type]):

                try:
                    data = utils.load_json(data.strip())
                except ValueError:
                    if 199 < statuscode < 300:
                        status = ErrorCode.SUCCESS
                    else:
                        status = ErrorCode.UNKNOWN_ERROR
                    return {'status': status, 'value': data.strip()}

                # Some of the drivers incorrectly return a response
                # with no 'value' field when they should return null.
                if 'value' not in data:
                    data['value'] = None
                return data
            else:
                data = {'status': 0, 'value': data}
                return data
        finally:
            LOGGER.debug("Finished Request")
            resp.close()

從以上實現得知,execute 爲向遠程服務器發送請求;execute中調用的_request方法爲發送http請求並且返回相關結果,請求結果通過瀏覽器進行響應。

官方說明中說明了請求原理:

At its minimum, WebDriver talks to a browser through a driver.
Communication is two way: WebDriver passes commands to the browser through the driver, and receives information back via the same route.
在這裏插入圖片描述
The driver is specific to the browser, such as ChromeDriver for Google’s Chrome/Chromium, GeckoDriver for Mozilla’s Firefox, etc. Thedriver runs on the same system as the browser. This may, or may not be, the same system where the tests themselves are executing.
This simple example above is direct communication. Communication to the browser may also be remote communication through Selenium Server or RemoteWebDriver. RemoteWebDriver runs on the same system as the driver and the browser.

言而總之我們通過webdriver與瀏覽器進行對話,從而瀏覽器進行響應。

通過以上實例得知,使用 execute 向遠程服務器發送請求會通過 webdriver 與瀏覽器交互,且發送已定義的命令常量可獲得一些相關信息。

由於在代碼中我們實例的是 webdriver 實例,去 webdriver基類(selenium.webdriver.remote.webdriver)中查詢相關信息,是否有相關函數可以獲取信息。發現以下函數:

def title(self):
    """Returns the title of the current page.

    :Usage:
        title = driver.title
    """
    resp = self.execute(Command.GET_TITLE)
    return resp['value'] if resp['value'] is not None else ""
@property
def current_url(self):
    """
    Gets the URL of the current page.

    :Usage:
        driver.current_url
    """
    return self.execute(Command.GET_CURRENT_URL)['value']
@property
def page_source(self):
    """
    Gets the source of the current page.

    :Usage:
        driver.page_source
    """
    return self.execute(Command.GET_PAGE_SOURCE)['value']

以上並沒有列全,我們簡單的嘗試以上函數的使用方法,使用方法在函數中已經說明。嘗試獲取 title(標題)、current_url(當前url)、page_source(網頁源代碼):

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://www.csdn.net")
print(driver.title)
print(driver.current_url)
print('作者博客:https://blog.csdn.net/A757291228')
#支持原創,轉載請貼上原文鏈接
# print(driver.page_source)

結果成功獲取到網頁標題以及當前網址:
在這裏插入圖片描述
試試 page_source:

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://www.csdn.net")
print(driver.title)
print(driver.current_url)
print('作者博客:https://blog.csdn.net/A757291228')
#支持原創,轉載請貼上鍊接
print(driver.page_source)

成功獲取:
在這裏插入圖片描述
原創不易,看到這裏點個贊支持一下唄!謝謝

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章