python基礎-字符串格式化(str.format)

前面一篇文章介紹了python基礎-字符串格式化(printf-style),我們知道目前官方推薦使用的字符串格式化方法是使用format函數,接下來將非常詳細的介紹format字符串格式化,同時結合實際的代碼來加深理解。

format()字符串格式化


什麼是str.format呢?

str.format()就是字符串類型的一個函數,它用來執行字符串格式化操作。

既然format是一個函數,那麼就會涉及到函數的定義,函數的調用,函數的輸入,函數的輸出

接下來分四點來解讀str.format()

str.format(*args, **kwargs)

Perform a string formatting operation. The string on which this method is called can contain literal text or replacement fields delimited by braces {}. Each replacement field contains either the numeric index of a positional argument, or the name of a keyword argument. Returns a copy of the string where each replacement field is replaced with the string value of the corresponding argument.

str.format()執行字符串格式化操作。

  1. str 是單純的字符串字面量(string literals) 或者 是包含一個或多個替代字段(replacement field)的字符串字面量;
  2. 替代字段(replacement field):一對花括號代表一個替代字段;
  3. 每個替代字段 對應 位置參數(positional argument)的數字索引 或者 關鍵詞參數(keyword argument)的keyname;
  4. 返回值,如果有替代字段,那麼返回一個新的被格式化的字符串;如果沒有替代字段,返回的還是原字符串;
str1 = "I'm string literal"
str1_new = str1.format()
print('str1 id:{}'.format(id(str1)))
print('str1_new id:{}'.format(id(str1_new)))

# somebody want to eat something
str2 = "{} want to eat {}"
str2_new = str2.format('漁道', '蘋果')
print('str2 id:{}, content:{}'.format(id(str2), str2))
print('str2_new id:{}, content:{}'.format(id(str2_new), str2_new))

str3 = "{1} want to eat {0}"
str3_new = str3.format('漁道', '蘋果')
print('str3 id:{}, content:{}'.format(id(str3), str3))
print('str3_new id:{}, content:{}'.format(id(str3_new), str3_new))

dict1 = {'name':'漁道', 'fruit':'蘋果'}
str4 = "{name} want to eat {fruit}"
str4_new = str4.format(name=dict1['name'], fruit=dict1['fruit'])
print('str4 id:{}, content:{}'.format(id(str4), str4))
print('str4_new id:{}, content:{}'.format(id(str4_new), str4_new))
# print("{name} want to eat {fruit}".format(fruit=dict1['fruit'], name=dict1['name']))

Format String Syntax

Format strings contain “replacement fields” surrounded by curly braces {}. Anything that is not contained in braces is considered literal text, which is copied unchanged to the output. If you need to include a brace character in the literal text, it can be escaped by doubling: {{ and }}.

format string = string literals + replacement fields

格式字符串(format string) 由 字符串字面量(string literals) 或 替代字段(replacement fields)構成。

替代字段(replacement field)是由一對花括號括起來的內容;

非替代字段的字符都被作爲字符串字面量(string literals);

如果字符串字面量(string literal)中僅單純的表示一對花括號字符, 可通過雙花括號轉義。

str4 = '{{}}, {}'
print(str4.format('漁道'))

# nested 
name_width = 10
price_width = 10
nested_fmt = '{{:<{}}}{{:>{}}}'.format(name_width, price_width)
print(nested_fmt)
print(nested_fmt.format("蘋果",5.98))

前面我們提到,替代字段是指由一對花括號括起來的"內容",但是這個"內容"到底是什麼,沒有做進一步的闡述。下面,我們來看看"內容"的具體定義。

Replacement Fields Syntax

replacement_field ::= "{" [field_name] ["!" conversion] [":" format_spec] "}"
field_name ::= arg_name ("." attribute_name | "[" element_index "]")*
arg_name ::= [identifier | digit+]
attribute_name ::= identifier
element_index ::= digit+ | index_string
index_string ::= <any source character except "]"> +
conversion ::= "r" | "s" | "a"
format_spec ::= <described in the next section>

從上面的語法定義我們可以看到,替代字段的"內容"主要由3部分構成:field_name,conversion,format_spec。3個部分都是可選的,可以只使用一個,或者三個都使用,或者一個都不使用。

field_name的作用是 與位置參數或關鍵詞參數相對應,最終字段名(field_name)會被相應的參數值所替換。

conversion的所用是 使用三種不同的字符串顯示函數 表示字符串。conversion前必須有一個感嘆號(exclamation point)

format_spec就是格式限定符,format_spec前必須要有個冒號(colon)

替代字段特點

  1. field_name本身由arg_name開頭,arg_name可以是數字也可以是關鍵字;如果arg_name是數字(digit),那麼它指的是一個位置參數(positional argument);如果arg_name是標識符((identifier),那麼它指的是一個關鍵詞參數(keyword argument)。如果格式字符串中的數字arg_name依次爲0、1、2、 …,那它們都可以被省略不寫,format函數的位置參數將會依次插入替換。

    print("arg_name is number:{}".format(1)) 
    # 僅有一個field_name時,表示format函數的第0個位置參數, 一般默認就是0, 所以可不寫
    print('arg_name is number:{0}'.format(1))
    name = 'keyword'
    print('arg_name is keyword:{}'.format(name))
    
    print('{0},{1},{2},{3},{4}'.format(1,2,3,4,5))
    print('{},{},{},{},{}'.format(1,2,3,4,5))
    
  2. arg_name是由identifier或digit組成,不是由引號引起來的,所以arg_name不可能是任意類型的字典鍵,例如,‘12’,’==’。

    dict1 = {'name':'漁道', 'fruit':'蘋果'}
    print("{name},{fruit}".format(name=dict1['name'], fruit=dict1['fruit']))
    
    dict2 = {"12":'a', "==":'b'}
    print("{},{}".format(dict2['12'], dict2['==']))
    # print("{'12'},{'=='}".format(dict1)) # arg_name不能指定任意類型的字典鍵
    # print("{12},{==}".format(dict1)) # arg_name不能指定任意類型的字典鍵
    
  3. arg_name 後可以跟索引表達式(index expression)或屬性表達式(attribute expression)。

    屬性表達式由 ‘.’ + 屬性名(attribute_name) 組成

    索引表達式由 [element_index] 組成

    # arg_name 後可以跟任意數量的索引或屬性表達式。
    fruit1 = ['apple','banana','grape','pear']
    print("{0[0]}, {0[1]}, {0[2]}, {0[3]}".format(fruit1))
    print("{0[1]}, {0[3]}".format(fruit1))
    print("{0[1]}, {0[3]}, {params[0]}, {params[2]}".format(fruit1, params=fruit1))
    
    class Fruit:
        def __init__(self,name,weight,price):
            self.name = name
            self.weight = weight
            self.price = price
     
    apple = Fruit('apple', '0.23', '5.98')
    print("{0.name}'s weight is {0.weight}, {0.name}'s price is {0.price}".format(apple))
    print("{fruit.name}'s weight is {fruit.weight}, {fruit.name}'s price is {fruit.price}".format(fruit=apple))
    
  4. conversion產生格式化前的強制類型轉換,將某個對象強制轉換爲可打印的字符串。支持3種轉換標誌:!s!r!a

    str2 = "漁道"
    tuple1 = (1,2)
    dict1 = {"name":"漁道", "fruit":"蘋果"}
    
    # str() 返回一個對象的可打印字符串
    print('{0!s}'.format(str2))
    print('{0!s}'.format(tuple1))
    print('{0!s}'.format(tuple))
    print('{0!s}'.format(dict))
    print('{0!s}'.format(dict1))
    
    print("")
    # repr(), 對於字符串或者可以轉換成字符串的對象,將返回帶有單引號的字符串;對類而言,將返回帶有尖括號的字符串,顯示該類的相關信息
    print('{0!r}'.format(str2))
    print('{0!r}'.format(tuple1))
    print('{0!r}'.format(tuple))
    print('{0!r}'.format(dict))
    print('{0!r}'.format(dict1))
    
    print("")
    # ascii() 返回一個對象的可打印字符串, 但會將非ascii字符 轉爲 \u、\U、\x對應的編碼
    print('{0!a}'.format(str2))
    print('{0!a}'.format(tuple1))
    print('{0!a}'.format(tuple))
    print('{0!a}'.format(dict))
    print('{0!a}'.format(dict1))
    

format_spec 與 printf-style中的format_spec大體上是相同的,所以這裏我們簡單的過一遍。

standard format specifier

format_spec     ::=  [[fill]align][sign][#][0][width][grouping_option][.precision][type]
fill            ::=  <any character>
align           ::=  "<" | ">" | "=" | "^"
sign            ::=  "+" | "-" | " "
width           ::=  digit+
grouping_option ::=  "_" | ","
precision       ::=  digit+
type            ::=  "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"

格式規範(format specification)用於包含在格式字符串(format string)中的替換字段(replacement field)中,以定義各個值的顯示方式。

通常的約定是,空格式規範 產生的結果 和 直接調用str() 產生的結果是一樣的。所以,對於一般的打印輸出來說,只用指定"{}"即可。

接下來,就具體解釋一下format_spec中每個字段的用法。

如果指定了有效的對齊值(align),則可以在其前面加上一個填充(fill)字符,填充字符可以是任何字符,如果省略,則默認爲空格。字符 ‘{’ 或 ‘}’ 不能作爲填充字符

對齊(align)

4中對齊選項的含義:

Option Meaning
‘<’ Forces the field to be left-aligned within the available space (this is the default for most objects). 左對齊(大多數對象默認是左對齊)
‘>’ Forces the field to be right-aligned within the available space (this is the default for numbers). 右對齊(數字默認是右對齊)
‘=’ Forces the padding to be placed after the sign (if any) but before the digits. This is used for printing fields in the form ‘+000000120’. This alignment option is only valid for numeric types. It becomes the default when ‘0’ immediately precedes the field width. 在正負號(如果存在的話)和數字之前插入填充字符
‘^’ Forces the field to be centered within the available space. 居中對齊

注意,除非定義了最小字段寬度,否則字段寬度將始終與填充它的數據大小相同,因此在這種情況下對齊選項沒有意義。

# 對齊,要指定左對齊、右對齊和居中,可分別使用<、 >和^
# 數字默認是右對齊
# 字符串默認是左對齊

print("{:10}".format(123))  # 默認右對齊
print("{:<10}".format(123)) # 左對齊
print("{:^10}".format(123)) # 居中對齊

print("{:30}".format('helloworld')) # 默認左對齊
print("{:>30}".format('helloworld')) # 右對齊
print("{:^30}".format('helloworld')) # 居中對齊

符號(sign)

符號選項:(符號選項僅對數值類型有效)

Option Meaning
‘+’ indicates that a sign should be used for both positive as well as negative numbers.
‘-’ indicates that a sign should be used only for negative numbers (this is the default behavior).
space indicates that a leading space should be used on positive numbers, and a minus sign on negative numbers.
# 符號選項
print("{},{}".format(123,-123))
print("{:+},{:+}".format(123,-123)) # + 表示 正負數都要顯示相應的符號
print("{:-},{:-}".format(123,-123)) # - 表示 僅負數顯示相應的符號
print("{: },{: }".format(123,-123)) # space 表示 整數顯示前導空格, 負數顯示負號

#號選項

The ‘#’ option causes the “alternate form” to be used for the conversion. The alternate form is defined differently for different types. This option is only valid for integer, float, complex and Decimal types. For integers, when binary,octal, or hexadecimal output is used, this option adds the prefix respective ‘0b’, ‘0o’, or ‘0x’ to the output value. For floats, complex and Decimal the alternate form causes the result of the conversion to always contain a decimal-point character, even if no digits follow it. Normally, a decimal-point character appears in the result of these conversions only if a digit follows it. In addition, for ‘g’ and ‘G’ conversions, trailing zeros are not removed from the result.

‘#’號選項一般和’alternate form’結合使用。該選項僅對 integer,float,complex 和 decimal types有效。

對integer來說,當輸出二進制、八進制、十六進制時,‘#’的作用是在輸出顯示的數字前加上前導符 ’0b‘,‘0o’,’0x’。

對float,complex,decimal來說,’#'的作用是使輸出的數值總是有小數點符號,即使小數點後沒有數字。

對’g’和’G’ conversions來說,尾零不會被省略。

# #號選項
#一般和 %o, %x, %X 結合使用, 可以標識 進制,方便閱讀
conversion_flag1 = '#b: {:#b}; #b: {:b}'
conversion_flag2 = '#o: {:#o}; #o: {:o}'
conversion_flag3 = '#x: {:#x}; #x: {:x}'
conversion_flag4 = '#f: {:#f}; #f: {:f}'
conversion_flag5 = '#e: {:#e}; #e: {:e}'
conversion_flag6 = '#g: {:#g}; #g: {:g}'
print(conversion_flag1.format(16,16))
print(conversion_flag2.format(16,16))
print(conversion_flag3.format(16,16))
print(conversion_flag4.format(16,16))
print(conversion_flag5.format(16,16))
print(conversion_flag6.format(16,16))

千位分隔符(thousand separator)

The ‘,’ option signals the use of a comma for a thousands separator.

使用’,'作爲千位分隔符

# 千位符
print("{:,}".format(12345678))

The ‘_’ option signals the use of an underscore for a thousands separator for floating point presentation types and for integer presentation type ‘d’. For integer presentation types ‘b’, ‘o’, ‘x’, and ‘X’, underscores will be inserted every 4 digits. For other presentation types, specifying this option is an error.

對於整數和浮點數, ‘_’ 是千分位分隔符

對於’b’,‘o’,‘x’,‘X’, '_'是4位數字分隔

其他顯示類型指定‘_'都會產生錯誤

# 下劃線
print("{:_}".format(12345678)) # 對於整數和浮點數, '_' 是千分位分隔符
print("{:_f}".format(1.2345678))
print("{:_f}".format(1234567.8))
print("{:_b}".format(64))   # 對於'b','o','x','X', '_'是4位數字分隔
print("{:_o}".format(6400))
print("{:_x}".format(640000))
print("{:_X}".format(640000))
# print("{:_n}".format(123456.78))
# print("{:_c}".format(123456.78))

最小域寬(field width)

width is a decimal integer defining the minimum total field width, including any prefixes, separators, and other formatting characters. If not specified, then the field width will be determined by the content.

width是一個十進制整數,定義了最小域寬,不僅僅是表示數字的寬度,任何前綴,分隔符,字符都包含在內。如果沒有指定width,域寬則有顯示的內容的長度決定。

精度(precision)

The precision is a decimal number indicating how many digits should be displayed after the decimal point for a floating point value formatted with ‘f’ and ‘F’, or before and after the decimal point for a floating point value formatted with ‘g’ or ‘G’. For non-number types the field indicates the maximum field size - in other words, how many characters will be used from the field content. The precision is not allowed for integer values.

對於’f’和’F’格式化類型,precision就是定義浮點數的保留小數點後幾位。

對於’g’和‘G’格式化類型,precision就是浮點數所有的數字位數。

precision不能作用於整數類型,包括二進制、十進制、十六進制。

# 精度
print("{:.2f}".format(1.234567))
print("{:.4g}".format(1.234456))
print("{:.2s}".format("helloworld"))
# print("{:.2d}".format(16))
# print("{:.2b}".format(16))

類型(type)

字符串顯示類型:

Type Meaning
‘s’ String format. This is the default type for strings and may be omitted.如果沒有指定type,那麼默認的type就是’s’。
None The same as ‘s’. 也就是說,輸入源是字符串,默認以字符串進行顯示
# 字符串
print("{}".format('hello'))
print("{:s}".format('world'))

整數顯示類型:

Type Meaning
‘b’ Binary format. Outputs the number in base 2.
‘c’ Character. Converts the integer to the corresponding unicode character before printing.
‘d’ Decimal Integer. Outputs the number in base 10.
‘o’ Octal format. Outputs the number in base 8.
‘x’ Hex format. Outputs the number in base 16, using lower-case letters for the digits above 9.
‘X’ Hex format. Outputs the number in base 16, using upper-case letters for the digits above 9.
‘n’ Number. This is the same as ‘d’, except that it uses the current locale setting to insert the appropriate number separator characters.
None The same as ‘d’.也就是說,輸入源是數字,默認以數字進行顯示
# 整數
print("{}".format(123))
print("{:b}".format(16))
print("{:c}".format(96))
print("{:d}".format(96))
print("{:o}".format(96))
print("{:x}".format(196))
print("{:X}".format(196))
print("{:n}".format(196))

浮點數、小數顯示類型:

Type Meaning
‘e’ Exponent notation. Prints the number in scientific notation using the letter ‘e’ to indicate the exponent. The default precision is 6. 默認精度是6
‘E’ Exponent notation. Same as ‘e’ except it uses an upper case ‘E’ as the separator character.
‘f’ Fixed-point notation. Displays the number as a fixed-point number. The default precision is 6.默認精度是6
‘F’ Fixed-point notation. Same as ‘f’, but converts nan to NAN and inf to INF.
‘g’ General format. For a given precision p >= 1, this rounds the number to p significant digits and then formats the result in either fixed-point format or in scientific notation, depending on its magnitude. The precise rules are as follows: suppose that the result formatted with presentation type ‘e’ and precision p-1 would have exponent exp. Then if -4 <= exp < p, the number is formatted with presentation type ‘f’ and precision p-1-exp. Otherwise, the number is formatted with presentation type ‘e’ and precision p-1. In both cases insignificant trailing zeros are removed from the significand, and the decimal point is also removed if there are no remaining digits following it, unless the ‘#’ option is used. Positive and negative infinity, positive and negative zero, and nans, are formatted as inf, -inf, 0, -0 and nan respectively, regardless of the precision. A precision of 0 is treated as equivalent to a precision of 1. The default precision is 6.
默認精度是6.
指定了精度,指數大於等於-4,那麼小數點後的非零數字顯示的個數和指定的精度大小相同,同時還有四捨五入;
如果指數小於-4,那麼按指數形式顯示;
‘G’ General format. Same as ‘g’ except switches to ‘E’ if the number gets too large. The representations of infinity and NaN are uppercased, too.
‘n’ Number. This is the same as ‘g’, except that it uses the current locale setting to insert the appropriate number separator characters.
‘%’ Percentage. Multiplies the number by 100 and displays in fixed (‘f’) format, followed by a percent sign.
None Similar to ‘g’, except that fixed-point notation, when used, has at least one digit past the decimal point. The default precision is as high as needed to represent the particular value. The overall effect is to match the output of str() as altered by the other format modifiers.
# 浮點數、小數
print("{:e}".format(1234567.89))
print("{:E}".format(1234567.89))
print("{:f}".format(1234567.89))
print("{:F}".format(1234567.89))

print("{:g}".format(0.02))
print("{:.1g}".format(0.002345678)) # 指定了精度,指數大於等於-4,那麼 小數點後的非零數字顯示的個數和指定的精度大小相同

print("{:g}".format(0.0000012345678)) # 指數小於-4, 以指數形式顯示數字
print("{:.3g}".format(0.0000012345678)) # 指數小於-4, 以指數形式顯示數字

print("{:.0%}".format(0.35))
print("{:.2%}".format(0.35))
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章