python中字符串和文本的處理技巧

原創

2020-02-26 12:28

一、使用多個界定符分割字符串

string 對象的split() 方法只適應於非常簡單的字符串分割情形，它並不允許有
多個分隔符或者是分隔符周圍不確定的空格。當你需要更加靈活的切割字符串的時候，
最好使用re.split() 方法：

首先記錄一下正則表達式的規則
# 正則表達式
規則：
    單字符：
            .  ： 除換行以外所有字符
            [] ： 匹配集合中任意一個字符
            \d ： 數字
            \D ： 非數字
            \w ： 數字、字母、下劃線、中文
            \W ： 非數字、字母、下劃線、中文
            \s ： 空格
            \S ： 非空格
    數量修飾：
             * ： 任意多次
             + ： 至少1次
             ？： 非貪婪方式，可有可無
           {m} ： 固定m次
          {m+} ： 至少m次
         {m,n} ： m到n次
    起始：
             ^ : 以啥啥開頭
             $ : 以啥啥結尾
    常用組合和函數：
            .* : 貪婪方式任意字符任意次數
           .*? : 非貪婪方式任意字符任意次數
           r = re.compile(r'正則表達式',re.S) : 
                              最常用：將規則傳遞給某個參數以便反覆使用
           re.match\re.search\（字符串）
           re.findall（字符串）
           re.sub(正則表達式，替換內容，字符串)

下面介紹一下多個界定符分割字符串

###來源於 python  cookbook
line ='asdf fjdk;  afed, fjek,asdf, foo'

import re

re.split(r'[;,\s]\s*',line)
Out[87]: ['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']

二、字符串開頭或結尾匹配

檢查字符串開頭或結尾的一個簡單方法是使用str.startswith() 或者是
str.endswith() 方法。比如：


filename = 'spam.txt'     #定義變量

filename.endswith('.txt')  #識別結尾
Out[95]: True

filename.startswith('file:')  #識別開頭
Out[96]: False

url = 'http://www.python.org' ##定義一個新變量
url.startswith('http:')       #識別開頭
Out[98]: True


#如果想檢查多種匹配可能，只需要將所有的匹配項放入到一個元組中去，然後傳
#給startswith() 或者endswith() 方法：

filenames=[ 'Makefile', 'foo.c', 'bar.py', 'spam.c', 'spam.h' ]

filenames
Out[103]: ['Makefile', 'foo.c', 'bar.py', 'spam.c', 'spam.h']
[name for name in filenames if name.endswith(('.c', '.h')) ]
Out[104]: ['foo.c', 'spam.c', 'spam.h']

any(name.endswith('.py') for name in filenames)
Out[105]: True


#另一個例子

from urllib.request import urlopen
def read_data(name):
    if name.startswith(('http:', 'https:', 'ftp:')):
        return urlopen(name).read()
    else:
        with open(name) as f:
        return f.read()


url = 'http://www.python.org'

read_data(url)

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python中字符串和文本的處理技巧

致遠OA及相關OA系統集成與二次開發

EXCEL公式使用總結

System.Object未被引用的程序集中定義

Java 信號量（semaphore）搭配CountDownLatch 實現多線程處理循環內邏輯並限制創建線程數

[轉帖]linux命令top內存顯示M兆或者G

【面試準備】項目經驗——接口自動化項目

python 中字典使用的一些小技巧

python中 merage函數與concat函數的用法

pandas 一些常用的數據分析技巧

python中字符串和文本的處理技巧

集成學習之Adaboost算法原理小結

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結