關於golang pprof的使用記錄

The post is to illustrate issues encountered during a web service debug/inspection process using pprof. Even though your http api server may not has obvious latency, it is worth digging deeper of it so as to understand more about your app.

It may helps avoiding memory leak and make the web service pure without unnecessary overhead.


Getting started
  • For most of the web services developed by golang :
import _ "net/http/pprof"
...
http.ListenAndServe("localhost:6060", nil)
import "github.com/go-chi/chi/middleware"
....
r = chi.NewRouter()
...
r.Mount("/debug", middleware.Profiler())


Then you can access http://ip:port/debug/pprof/ directly using browser as your app runs. For heap inpsectation:

pprof behind the scenes is using the runtime.MemProfile function, which by default collects allocation information on each 512KB of allocated bytes. It can be adjust using runtime.MemProfileRate


  • [Optional] Ensure thatGC will actually run, since GC may not be triggered (Or you could disable GC by setting GOGC=off to test some extreme cases, which has not been covered in this post yet):
c := make(chan os.Signal, 1)
signal.Notify(c, os.Interrupt)
	go func() {
		t := time.NewTicker(time.Second * 5)
		for {
			select {
			case <-c:
				log.Println("GC routine exited")
			case <-t.C:
				runtime.GC()
				log.Println("GC runned")
			}
		}
	}()

On the bottom of http://ip:port/debug/pprof/heap, there will be NumGC to indicate number of real-time GC made.


Then you can mock some http requests and check:

  • If goroutine keep increasing, and will not drops as mock request stopped(instead of dropping to a static number like 30), there is possibility that goroutine leak may occurs.
  • If allocs ,heap keep increasings, it does not necessarily mean memory leak. It just shows the number of samples that have been collected so far, so it actually includes garbage-collected object/bytes. The following sections of this post has covered more about it.
  • If NumGC also dramatically raises despite the manually GC made, it indicates frequent GC has occurs, which may brings to latency.

There are several aspects to inspect:


Inspect in initialization

While web service is freshly deployed and initialized without any upcoming request, check :

/debug/pprof/heap

Note that heap profile on the top show heap summary :

heap profile: xxxx(inused objects): yyyy(inused bytes) [zzzz(alloc objects): cccc(alloc bytes)] @ heap/vvvv(2*MemProfileRate)

And the beginning 4 number for each of the remaining objects also has the same meaning as the heap summary.

It is suggested to :

  • Make the heap allocation to be a static number contributed by:
    • Imported lib which runs init() func and may introduces object/dependency to our program. Know more about these kind of libs and to check if it brings inefficiency.

    • Explicitly created dependencies like *DB *logger, etc. If such dependencies has underlying pooling implementation, just keep it runs in singleton pattern.

/debug/pprof/goroutine

It shows stack traces of all current goroutines

It is suggested to :

  • Minimize instance/dependency like db connection, grpc connection and so on, which may requires some watch goroutines to keep track of its status.

Also references from Dmitry Vyukov post:

  • Combine objects into larger objects. For example, replace *bytes.Buffer struct member with bytes.Buffer
    (you can preallocate buffer for writing by calling bytes.Buffer.Grow later). This will reduce number of memory allocations (faster) and also reduce pressure on garbage collector (faster garbage collections).

  • Local variables that escape from their declaration scope get promoted into heap allocations. Compiler generally can’t prove that several variables have the same life time, so it allocates each such variable separately. So you can use the above advise for local variables as well. For example, replace:

for k, v := range m {
   k, v := k, v   // copy for capturing by the goroutine
   go func() {
       // use k and v
   }()
}

with:

for k, v := range m {
  x := struct{ k, v string }{k, v}   // copy for capturing by the goroutine
  go func() {
      // use x.k and x.v
  }()
}

This replaces two memory allocations with a single allocation.However, this optimization usually negatively affects code readability, so use it reasonably.

  • A special case of allocation combining is slice array preallocation. If you know a typical size of the slice, you can preallocate a backing array for it as follows:
type X struct {
    buf      []byte
    bufArray [16]byte // Buf usually does not grow beyond 16 bytes.
}

func MakeX() *X {
    x := &X{}
    // Preinitialize buf with the backing array.
    x.buf = x.bufArray[:0]
    return x
}
  • If possible use smaller data types. For example, use int8 instead of int.

  • Objects that do not contain any pointers (note that strings, slices, maps and chans contain implicit pointers), are not scanned by garbage collector. For example, a 1GB byte slice virtually does not affect garbage collection time. So if you remove pointers from actively used objects, it can positively impact garbage collection time. Some possibilities are: replace pointers with indices, split object into two parts one of which does not contain pointers.

  • Use freelists to reuse transient objects and reduce number of allocations. Standard library contains sync.Pool type that allows to reuse the same object several times in between garbage collections. However, be aware that, as any manual memory management scheme, incorrect use of sync.Pool can lead to use-after-free bugs.


Inspect while running

For this scenario, you are suggested to :

  • Mock concurrent requests to your http apis one by one to find the where this issue is.

  • go tool pprof -web http://ip:port/debug/pprof/heap to view visualization of heap

  • Try some helpful options for go tool pprof -option, and input top in cmd prompt:

  -inuse_space      Display in-use memory size: amount of memory allocated and not released yet                
  -inuse_objects    Display in-use object counts: amount of objects allocated and not released yet
  -alloc_space      Display allocated memory size: total amount of memory allocated (regardless of released)
  -alloc_objects    Display allocated object counts: total amount of objects allocated (regardless of released)

It shows filtered summary focused different aspects,the raw data is actually derived from http://ip:port/debug/pprof/heap

Furthermore inuse means active memory, memory the runtime believes is in use by the go program (ie: hasn’t been collected by the garbage collector). When the GC does collect memory the profile shrinks, but no memory is returned to the system. Your future allocations will try to use memory from the pool of previously collected objects before asking the system for more. And this may leads to what alloc means. It is the Resident Size of your program, which is the number of bytes of RAM is assigned to your program whether it’s holding in-use go values or collected ones. Reference here

Useful Links
  • https://rakyll.org/archive/
  • https://jvns.ca/blog/2017/09/24/profiling-go-with-pprof/
  • https://blog.golang.org/profiling-go-programs
  • https://www.freecodecamp.org/news/how-i-investigated-memory-leaks-in-go-using-pprof-on-a-large-codebase-4bec4325e192/
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章