要是說哪個Web開發者不知道URL，可以說是天方夜譚了。但是要是問哪位詳細的瞭解過URL，可能真就剩下寥寥數人了。

老張實際工作中發現有些同事真的從來沒有去主動了解過URL。URL歷史悠久，URL應用廣泛，URL形式多樣且標準寬泛，URL熟悉且陌生。

今天老張把URL的講解放在《Web開發進階》系列的第一篇，給大家介紹一下URL。

URI：URL和URN

我們常說的URL（Uniform Resource Locator，統一資源定位符）其實是URI（Uniform Resource Identifier，統一資源標識符）的子集。除了URL，URI還有另一種形式——URN（Uniform Resource Name，統一資源名）。

通過URI，客戶端就可以指定他們想要獲取的互聯網資源，但是URL和URN的本質是有區別的。

URL描述了特定服務器上某資源的特定位置。
URN與特定的服務器無關，僅需通過資源名即可定位並訪問資源。

比起URN，URL更爲我們所熟知。作爲一個開發者可以很容易的分辨出下面三個URL指向不同的資源位置：

http://www.example.com:8080/1.htmlhttp://www.example.com/1.htmlhttp://www.example.com/2.htm

習慣了URL，可能很多人都不知道URN的存在，更好奇爲什麼URN不需要指定服務器位置。其實，下載用的磁力鏈接就是URN，下面的示例應該能夠很好幫助理解：

magnet:?xt=urn:btih:EP3XFJ7BOAFA6GFJTNZKRQ6CIN7A5AB5

只需要有了這段神祕代碼，我們就可以下載相應的資源，而不需要關心資源實際存在於互聯網的哪個角落。

URL的常見形式

除了HTTP之外，還有多種多樣的協議也使用URL（比如FTP）來定位資源。大多數協議使用的URL格式都可以滿足以下格式：

<scheme>://<user>:<password>@<host>:<post>/<path>;<params>?<query>#<frag>

scheme：協議名。指明客戶端訪問服務器時使用的協議類型，常見的有HTTP、HTTPS、FTP、mailto以及telnet等。
user：用戶名。常見於FTP協議。
password：密碼。和user一起用於鑑權。
host：主機地址。可以是ip，也可以域名。
port：端口。缺省時使用默認值，不同協議的默認值有所區別。
path：路徑。一般來說符合UNIX文件路徑規範。
params：參數。多個參數之間同樣使用 ”:” 分割。
query：查詢參數。不同協議之間其形式可能有所區別。
frag：片段。主要用於客戶端。

以上是URL通用形式的介紹，幾乎囊括了請求互聯網資源所需要的所有信息。具體到HTTP，其格式就要簡單許多。

HTTP協議的URL形式

RCF1945（《超文本傳輸協議——HTTP/1.0》）給出了HTTP_URL的標準形式：

http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]

其中port是可以省略的，如果省略，則使用默認值80。並且該文檔指明瞭協議實現需要支持ip形式的host，見過帶有IPV4地址的URL，不知道大家見過帶有IPV6地址的URL嗎？

注意，雖然標準描述並沒有提到frag，但是實際各瀏覽器都是支持錨點的，甚至有的還支持user。

實現一個URL解析和組裝函數

文本形式的URL雖然擴展性很強，但是同HTML一樣，其對機器的友好性卻遠不如二進制形式，加上RFC屬於規範，本身並不包含強制性，所以HTTP_URL具體實現之間會有所差別。

綜上，司空見慣的URL解析起來就顯得沒那麼簡單了。老張在這裏用Python實現了一個玩具版的URL解析和組裝函數，僅用於幫助大家理解本篇文章，請勿用於實際開發。

"""
@auther: zhang3
"""
__all__ = ["parse_http_url", "unparse_http_url"]


def unparse_http_url(scheme, host, port=80, path="", query="", frag=""):
    """
    根據入參拼接http_URL
    """
    if scheme.lower() != "http":
        raise ValueError("only support http scheme")
    url = "%s://" % scheme

    if not host:
        raise ValueError("host is needed")
    url += host

    if not port or port in [80, "80"]:
        pass
    else:
        url += ":%s" % port

    if path:
        if not path.startswith("/"):
            raise ValueError("illegal path")
        url += path

    if query:
        url += "?" + query
    if frag:
        url += "#" + frag

    return url


def parse_http_url(url):
    """
    將url解析爲 http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query [ "#" frag ]]]
    僅支持http協議格式

    >>> parse_http_url("http://www.example.com:8080/1.html")
    ('http', 'www.example.com', 8080, '/1.html', '', '')

    >>> parse_http_url("http://www.example.com/1.html?name=zhang3")
    ('http', 'www.example.com', 80, '/1.html', 'name=zhang3', '')

    >>> parse_http_url("http://www.example.com/2.html#anchor")
    ('http', 'www.example.com', 80, '/2.html', '', 'anchor')
    """
    if not url.lower().startswith("http://"):
        raise ValueError("scheme must be http")
    scheme, url = url.split("://")
    loc, url = _split_loc(url)
    host, port = _split_host_port(loc)
    query = frag = ""
    if "#" in url:
        url, frag = url.split("#")

    if "?" in url:
        url, query = url.split("?")

    path = url

    return scheme, host, port, path, query, frag



def _split_loc(url):
    delim_index = len(url)
    for delim in "/?#":
        i = url.find(delim)
        if i >= 0:
            delim_index = min(i, delim_index)
    return url[:delim_index], url[delim_index:]


def _split_host_port(loc):
    host, port_ = "", ""
    if loc.startswith("["):
        i = loc.find("]")
        if i < 0:
            raise ValueError("illegal IPV6 host")
        host, port_ = loc[:i+1], loc[i+2:]
    elif ":" in loc:
        host, port_ = loc.split(":")
    else:
        host = loc

    if port_:
        if not port_.isdigit():
            raise ValueError("illegal port")
        port = int(port_)
    else:
        port = 80

    return host, port


if __name__=='__main__':
    import doctest
    doctest.testmod()

在命令行執行幾條測試命令，效果如下：

>>>url = "http://www.example.com/1.html?name=zhang3"

>>>parse_http_url(url)
('http', 'www.example.com', 80, '/1.html', 'name=zhang3', '')

>>>unparse_http_url(*parse_http_url(url))
'http://www.example.com/1.html?name=zhang3'

備周則意怠，常見則不疑。

——《三十六計 · 瞞天過海》

常見則不疑：URL

URI：URL和URN

URL的常見形式

HTTP協議的URL形式

實現一個URL解析和組裝函數

《Python進階》學習筆記

Leetcode 3161. 物塊放置查詢

leetcode 60 排列序列

一個docker容器暴露多個端口

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

redis-py連接池的實現

用數據告訴你程序員都在看什麼

喫透FTP

【譯】urllib.parse文檔--這裏有關於URL的一切

Nginx之父被抓，《硅谷》劇情走進現實

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結