Go字符串拼接方式深入比较

前言

Go中字符串的拼接主要有"+"、fmt.Sprintf+%s、strings.Join等方式，已经有很多人从耗时的角度比较这些方式的性能，本文则从源码的角度去分析下这些方式的实现方式，再去比较性能。

拼接字符串方式

`"+"`

"+"是Go中支持的最直接的字符串拼接符。

str := "a"+"b"+"c"
func contact(list []string) string{
    r := ""
    for _,v :=range list{
        r += v
    }
    return r
}

关于"+"，我们可以在runtime.go中找到相关的func。其调用的具体细节在cmd/compile/internal/gc/walk.go文件中，对应操作符OADDSTR，其处理func是addstr。在拼接的字符串个数小于等于5个时，会直接调用对应的个数的处理concatstring%n func，这些func均在/runtime/string.go中，然后会调用concatstring；大于5个时则会直接调用concatstring。有兴趣的朋友可以去看下详细的调用处理。此处主要关注concatstring，它负责字符串的具体拼接过程。

// The constant is known to the compiler.
// There is no fundamental theory behind this number.
const tmpStringBufSize = 32

type tmpBuf [tmpStringBufSize]byte
// concatstrings implements a Go string concatenation x+y+z+...
// The operands are passed in the slice a.
// If buf != nil, the compiler has determined that the result does not
// escape the calling function, so the string data can be stored in buf
// if small enough.
func concatstrings(buf *tmpBuf, a []string) string {
    idx := 0
    l := 0
    count := 0
    for i, x := range a {
        n := len(x)
        if n == 0 {
            continue
        }
        if l+n < l {
            throw("string concatenation too long")
        }
        l += n
        count++
        idx = i
    }
    if count == 0 {
        return ""
    }

    // If there is just one string and either it is not on the stack
    // or our result does not escape the calling frame (buf != nil),
    // then we can return that string directly.
    if count == 1 && (buf != nil || !stringDataOnStack(a[idx])) {
        return a[idx]
    }
    s, b := rawstringtmp(buf, l)
    for _, x := range a {
        copy(b, x)
        b = b[len(x):]
    }
    return s
}
func rawstringtmp(buf *tmpBuf, l int) (s string, b []byte) {
    if buf != nil && l <= len(buf) {
        b = buf[:l]
        s = slicebytetostringtmp(b)
    } else {
        s, b = rawstring(l)
    }
    return
}
func slicebytetostringtmp(b []byte) string {
    ...
    return *(*string)(unsafe.Pointer(&b))
}
func rawstring(size int) (s string, b []byte) {
    p := mallocgc(uintptr(size), nil, false)

    stringStructOf(&s).str = p
    stringStructOf(&s).len = size

    *(*slice)(unsafe.Pointer(&b)) = slice{p, size, size}

    return
}

根据func的注释，也可以看出concatstrings就是实现"+"的func。参数a []string是将多个+连接的字符串组装成slice传入。

看下处理过程：

计算所有字符串的总长度l，记录非空字符串的个数，记录字符串的位置，当总长溢出时报错。
若非空字符串个数为0，返回空字符""。
若只有一个非空字符串，且没有存储在buf中或数组还存储在当前goroutine的栈中，则根据字符的位置直接返回对应位置的字符串。
创建字符串s及字符串指向的字节数组b，修改b则改变s的值。

如果buf！=nil且总长度小于32位，则取b=buf[:l]即可存储所有数据，s指向字节数组b;
否则，直接根据总长度分配内存创建字符串，并将地址指向字节数组b.

逐个将数据拷贝至b中，返回s即可。

需要注意的是：
当一个表达式中存在多个'+'时，会封装参数至slice中，再调用concatstrings处理，而不是每个'+'都调用一遍。
对于静态的字符串，如str := x+ “a”+“b”+“c”，在编译后直接合并，会处理成str:=x+“abc”
buf在结果不会逃逸出调用func时才不会为nil，且其长度为32个字节，仅能存储长度较小的字符串
concatstrings最多重新分配内存一次

`fmt.Sprintf`

fmt.Sprintf是fmt包中根据格式符将数据转换为string，拼接字符串时使用的格式符为%s，用以连接字符串。

具体源码如下，本文仅关注%s的部分，无关的源码部分已忽略。

// Sprintf formats according to a format specifier and returns the resulting string.
func Sprintf(format string, a ...interface{}) string {
    p := newPrinter()
    p.doPrintf(format, a)
    s := string(p.buf)
    p.free()
    return s
}

func (p *pp) doPrintf(format string, a []interface{}) {
    end := len(format)
    argNum := 0         // we process one argument per non-trivial format
    afterIndex := false // previous item in format was an index like [3].
    p.reordered = false
formatLoop:
    for i := 0; i < end; {
        p.goodArgNum = true
        lasti := i
        for i < end && format[i] != '%' {
            i++
        }
        if i > lasti {
            p.buf.writeString(format[lasti:i])//写入'%'前的字符串
        }
        if i >= end {//结束
            // done processing format string
            break
        }

        // Process one verb
        i++

        // Do we have flags?
        p.fmt.clearflags()
    simpleFormat:
        for ; i < end; i++ {
            c := format[i]
            switch c {
            ...
            default:
                // Fast path for common case of ascii lower case simple verbs
                // without precision or width or argument indices.
                if 'a' <= c && c <= 'z' && argNum < len(a) {
                    if c == 'v' {
                        // Go syntax
                        p.fmt.sharpV = p.fmt.sharp
                        p.fmt.sharp = false
                        // Struct-field syntax
                        p.fmt.plusV = p.fmt.plus
                        p.fmt.plus = false
                    }
                    p.printArg(a[argNum], rune(c))
                    argNum++
                    i++
                    continue formatLoop
                }
                // Format is more complex than simple flags and a verb or is malformed.
                break simpleFormat
            }
        }
    ...
}

func (p *pp) printArg(arg interface{}, verb rune) {
    ...
        case string:
        p.fmtString(f, verb)
    ...
}

func (p *pp) fmtString(v string, verb rune) {
    switch verb {
    ...
    case 's':
        p.fmt.fmtS(v)
    ...
    }
}

func (f *fmt) fmtS(s string) {
    s = f.truncateString(s)//转换精度，仅用于number，字符串可忽略
    f.padString(s)
}

// padString appends s to f.buf, padded on left (!f.minus) or right (f.minus).
func (f *fmt) padString(s string) {
    if !f.widPresent || f.wid == 0 {//仅在format number时使用
        f.buf.writeString(s)
        return
    }
    width := f.wid - utf8.RuneCountInString(s)//仅用%s，f.width=0，因此width<0
    if !f.minus {//f.minus仅在存在负数时为true
        // left padding
        f.writePadding(width)
        f.buf.writeString(s)
    } else {
        // right padding
        f.buf.writeString(s)//写入
        f.writePadding(width)//此处无padding
    }
}

func (b *buffer) writeString(s string) {
    *b = append(*b, s...)
}

// writePadding generates n bytes of padding.
func (f *fmt) writePadding(n int) {
    if n <= 0 { // No padding bytes needed.
        return
    }
    ...
}

对于仅拼接字符串的处理过程为：

依次查找'%'的位置，'%'前的数据append至buf中
根据其后的format，确认处理过程，拼接字符串使用的是%s，处理过程一个%s对应一个string
append追加字符串至buf中（会面临频繁扩容的问题）
将buf转为string

注意：fmt.Sprintf并没有计算字符串的总长度，而是针对每个%s进行处理，每个%s的处理最终都会调用append，而使用append可能会出现扩容的问题，尤其是多个字符串时，可能会出现多次扩容的情况。

`strings.Join`

strings.Join是strings包中针对字符串数组拼接的func，Join支持指定字符串slice间的分隔符。

// Join concatenates the elements of a to create a single string. The separator string
// sep is placed between elements in the resulting string.
func Join(a []string, sep string) string {
    switch len(a) {
    case 0:
        return ""
    case 1:
        return a[0]
    }
    n := len(sep) * (len(a) - 1)
    for i := 0; i < len(a); i++ {
        n += len(a[i])
    }

    var b Builder
    b.Grow(n)
    b.WriteString(a[0])
    for _, s := range a[1:] {
        b.WriteString(sep)
        b.WriteString(s)
    }
    return b.String()
}
// A Builder is used to efficiently build a string using Write methods.
// It minimizes memory copying. The zero value is ready to use.
// Do not copy a non-zero Builder.
type Builder struct {
    addr *Builder // of receiver, to detect copies by value
    buf  []byte
}
// Grow grows b's capacity, if necessary, to guarantee space for
// another n bytes. After Grow(n), at least n bytes can be written to b
// without another allocation. If n is negative, Grow panics.
func (b *Builder) Grow(n int) {
    b.copyCheck()
    if n < 0 {
        panic("strings.Builder.Grow: negative count")
    }
    if cap(b.buf)-len(b.buf) < n {
        b.grow(n)
    }
}
// grow copies the buffer to a new, larger buffer so that there are at least n
// bytes of capacity beyond len(b.buf).
func (b *Builder) grow(n int) {
    buf := make([]byte, len(b.buf), 2*cap(b.buf)+n)
    copy(buf, b.buf)
    b.buf = buf
}
// WriteString appends the contents of s to b's buffer.
// It returns the length of s and a nil error.
func (b *Builder) WriteString(s string) (int, error) {
    b.copyCheck()
    b.buf = append(b.buf, s...)
    return len(s), nil
}

// String returns the accumulated string.
func (b *Builder) String() string {
    return *(*string)(unsafe.Pointer(&b.buf))
}

Join的处理过程：

判断字符串个数，为0返回空字符串；为1返回第一个字符串。
计算分隔符的总长度，再计算拼接后字符串的总长度
如果buf的cap不足以容纳所有字符串，进行扩容（创建容量为2*cap(b.buf)+n的新slice，拷贝旧数据至其中)，此时buf足以容纳所有数据，后期append无需扩容
依次将数据、分隔符append到buf中
通过指针将buf转换为string

append仅扩容一次

比较

下面比较三种拼接字符串的优缺点：

`"+"`拼接字符串

优点：

使用简单
对短字符串的拼接有性能优势（结果或参数不escape，总长度不大于32位时会提前分配32的buf，这时数据可以存储在buf中）
一个表达式中有多个"+"仍只处理一次（会将多个拼接的字符串组成成slice再调用concatstrings）

缺点：

当数据很多时，多个"+"可能会导致代码的不简洁
对于需要多个表达式才能拼接所有字符串的数据，意味着每次都需要调用concatstrings，需要重新计算并分配内存，一旦数据很多，性能就会变差

`fmt.Sprintf`拼接字符串

优点：

适用范围广，可以将其他类型转换为字符串
在表示带有具体意义的数据时更直观，尤其是带有描述性前缀

缺点：

处理过程相对复杂，多类型的判断甚至调用反射，影响效率
拼接字符串中并没有提前计算总长，每次拼接字符串都是使用的append完成，调用append意味着扩容时的内存再分配及数据拷贝等处理，一旦数据较多时，明显影响性能

`strings.Join`拼接字符串

优点：

一次计算总长度，只需分配一次总内存，后续无需重新分配内存
对于同一分隔符时的拼接有很大的便利性

缺点：

对于零散的数据需要主动组装成slice才能处理
对于不同的分隔符不能直接处理

整体比较

从源码实现的角度，我们可以得出以下结论:

对于拼接字符串，如果一个表达式可以全部使用'+'的方式，则使用'+'与strings.Join的性能接近，否则其性能不如strings.Join，而fmt.Sprintf需要经过反射及append的处理，其性能相对来说可能最差。

原因是：三者在拼接字符串过程中，尤其是多个字符串、长度较长的字符串时，strings.Join仅需分配一次内存，'+'因使用方式会分配一次或多次，fmt.Sprintf则针对每个%s会调用一次append，可能会分配多次。每次重新分配都需要进行数据的重新拷贝，都会影响其性能。

当然，对于拼接数据量很少或很短的数据，尤其是零散的数据（strings.Join需要组装数据至slice），三者的效率差异不大，可以按照需求自行决定使用。

整体来说三者的性能：strings.Join~=单次'+'>>多次'+'>fmt.Sprintf

总结

本文主要对常见的3种字符串拼接方式，从其实现的角度分析其在使用时的优缺点，进而协助我们在不同情形使用时，选择合适的字符串拼接方式。

作为建议：

对于零散的少量数据，可以使用'+'来拼接数据；
对于少量数据且数据间有解释性的前缀或后缀，可以使用fmt.Sprintf；
对于多数据或者slice数据，可以使用strings.Join

公众号

鄙人刚刚开通了公众号，专注于分享Go开发相关内容，望大家感兴趣的支持一下，在此特别感谢。

Go字符串拼接方式深入比较

前言

拼接字符串方式

`"+"`

`fmt.Sprintf`

`strings.Join`

比较

`"+"`拼接字符串

`fmt.Sprintf`拼接字符串

`strings.Join`拼接字符串

整体比较

总结

公众号

使用c#强大的表达式树实现对象的深克隆之解决循环引用的问题

GPT-4o 引领人机交互新风向，向量数据库赛道沸腾了

free AI online tools All In One

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU启动那些事（12.A）- uSDHC eMMC启动时间(RT1170)

基于Ubuntu-22.04安装K8s-v1.28.2实验（二）使用kube-vip实现集群VIP访问

企业大模型如何成为自己数据的“百科全书”？

本地SSL证书过期输入命令在IIS自动生成

.NET周刊【5月第2期 2024-05-12】

基于Ubuntu-22.04安装K8s-v1.28.2实验（一）部署K8s

基于Ubuntu-22.04安装K8s-v1.28.2实验（三）数据卷挂载NFS（网络文件系统）

從main入口開始談golang

記一次gin PostForm bug

golang map轉json的順序問題

深入瞭解Go flag

深入瞭解gorm Scan的使用

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Go字符串拼接方式深入比较

前言

拼接字符串方式

"+"

fmt.Sprintf

strings.Join

比较

"+"拼接字符串

fmt.Sprintf拼接字符串

strings.Join拼接字符串

整体比较

总结

公众号

`"+"`

`fmt.Sprintf`

`strings.Join`

`"+"`拼接字符串

`fmt.Sprintf`拼接字符串

`strings.Join`拼接字符串