The post is to illustrate issues encountered during a web service debug/inspection process using pprof. Even though your http api server may not has obvious latency, it is worth digging deeper of it so as to understand more about your app.
It may helps avoiding memory leak and make the web service pure without unnecessary overhead.
Getting started
- For most of the web services developed by
golang
:
import _ "net/http/pprof"
...
http.ListenAndServe("localhost:6060", nil)
- For web framework: go-chi
import "github.com/go-chi/chi/middleware"
....
r = chi.NewRouter()
...
r.Mount("/debug", middleware.Profiler())
Then you can access http://ip:port/debug/pprof/
directly using browser as your app runs. For heap inpsectation:
pprof
behind the scenes is using theruntime.MemProfile
function, which by default collects allocation information on each512KB
of allocated bytes. It can be adjust usingruntime.MemProfileRate
- [Optional] Ensure that
GC
will actually run, sinceGC
may not be triggered (Or you could disableGC
by settingGOGC=off
to test some extreme cases, which has not been covered in this post yet):
c := make(chan os.Signal, 1)
signal.Notify(c, os.Interrupt)
go func() {
t := time.NewTicker(time.Second * 5)
for {
select {
case <-c:
log.Println("GC routine exited")
case <-t.C:
runtime.GC()
log.Println("GC runned")
}
}
}()
On the bottom of http://ip:port/debug/pprof/heap
, there will be NumGC
to indicate number of real-time GC
made.
Then you can mock some http requests and check:
- If
goroutine
keep increasing, and will not drops as mock request stopped(instead of dropping to a static number like30
), there is possibility thatgoroutine leak
may occurs. - If
allocs
,heap
keep increasings, it does not necessarily mean memory leak. It just shows the number of samples that have been collected so far, so it actually includes garbage-collected object/bytes. The following sections of this post has covered more about it. - If
NumGC
also dramatically raises despite the manuallyGC
made, it indicates frequentGC
has occurs, which may brings to latency.
There are several aspects to inspect:
Inspect in initialization
While web service is freshly deployed and initialized without any upcoming request, check :
/debug/pprof/heap
Note that heap profile
on the top show heap summary :
heap profile: xxxx(inused objects): yyyy(inused bytes) [zzzz(alloc objects): cccc(alloc bytes)] @ heap/vvvv(2*MemProfileRate)
And the beginning 4 number for each of the remaining objects also has the same meaning as the heap summary.
It is suggested to :
- Make the heap allocation to be a static number contributed by:
-
Imported lib which runs
init() func
and may introduces object/dependency to our program. Know more about these kind of libs and to check if it brings inefficiency. -
Explicitly created dependencies like
*DB
*logger
, etc. If such dependencies has underlying pooling implementation, just keep it runs in singleton pattern.
-
/debug/pprof/goroutine
It shows stack traces of all current goroutines
It is suggested to :
- Minimize instance/dependency like
db
connection,grpc
connection and so on, which may requires somewatch
goroutines to keep track of its status.
Also references from Dmitry Vyukov post:
-
Combine objects into larger objects. For example, replace
*bytes.Buffer
struct member with bytes.Buffer
(you can preallocate buffer for writing by calling bytes.Buffer.Grow later). This will reduce number of memory allocations (faster) and also reduce pressure on garbage collector (faster garbage collections). -
Local variables that escape from their declaration scope get promoted into heap allocations. Compiler generally can’t prove that several variables have the same life time, so it allocates each such variable separately. So you can use the above advise for local variables as well. For example, replace:
for k, v := range m {
k, v := k, v // copy for capturing by the goroutine
go func() {
// use k and v
}()
}
with:
for k, v := range m {
x := struct{ k, v string }{k, v} // copy for capturing by the goroutine
go func() {
// use x.k and x.v
}()
}
This replaces two memory allocations with a single allocation.However, this optimization usually negatively affects code readability, so use it reasonably.
- A special case of allocation combining is slice array preallocation. If you know a typical size of the slice, you can preallocate a backing array for it as follows:
type X struct {
buf []byte
bufArray [16]byte // Buf usually does not grow beyond 16 bytes.
}
func MakeX() *X {
x := &X{}
// Preinitialize buf with the backing array.
x.buf = x.bufArray[:0]
return x
}
-
If possible use smaller data types. For example, use int8 instead of int.
-
Objects that do not contain any pointers (note that strings, slices, maps and chans contain implicit pointers), are not scanned by garbage collector. For example, a 1GB byte slice virtually does not affect garbage collection time. So if you remove pointers from actively used objects, it can positively impact garbage collection time. Some possibilities are: replace pointers with indices, split object into two parts one of which does not contain pointers.
-
Use freelists to reuse transient objects and reduce number of allocations. Standard library contains sync.Pool type that allows to reuse the same object several times in between garbage collections. However, be aware that, as any manual memory management scheme, incorrect use of sync.Pool can lead to use-after-free bugs.
Inspect while running
For this scenario, you are suggested to :
-
Mock concurrent requests to your http apis one by one to find the where this issue is.
-
go tool pprof -web http://ip:port/debug/pprof/heap
to view visualization ofheap
-
Try some helpful options for
go tool pprof -option
, and inputtop
in cmd prompt:
-inuse_space Display in-use memory size: amount of memory allocated and not released yet
-inuse_objects Display in-use object counts: amount of objects allocated and not released yet
-alloc_space Display allocated memory size: total amount of memory allocated (regardless of released)
-alloc_objects Display allocated object counts: total amount of objects allocated (regardless of released)
It shows filtered summary focused different aspects,the raw data is actually derived from http://ip:port/debug/pprof/heap
Furthermore inuse
means active memory, memory the runtime believes is in use by the go program (ie: hasn’t been collected by the garbage collector). When the GC does collect memory the profile shrinks, but no memory is returned to the system. Your future allocations will try to use memory from the pool of previously collected objects before asking the system for more. And this may leads to what alloc
means. It is the Resident Size
of your program, which is the number of bytes of RAM is assigned to your program whether it’s holding in-use go values or collected ones. Reference here
Useful Links
- https://rakyll.org/archive/
- https://jvns.ca/blog/2017/09/24/profiling-go-with-pprof/
- https://blog.golang.org/profiling-go-programs
- https://www.freecodecamp.org/news/how-i-investigated-memory-leaks-in-go-using-pprof-on-a-large-codebase-4bec4325e192/