diff options
Diffstat (limited to 'content')
-rw-r--r-- | content/go15gc.article | 114 |
1 files changed, 114 insertions, 0 deletions
diff --git a/content/go15gc.article b/content/go15gc.article new file mode 100644 index 0000000..56f7568 --- /dev/null +++ b/content/go15gc.article @@ -0,0 +1,114 @@ +Go GC: Prioritizing low latency and simplicity +31 Aug 2015 + +Richard Hudson +rlh@golang.org + +* The Setup + +Go is building a garbage collector (GC) not only for 2015 but for 2025 and +beyond: A GC that supports today’s software development and scales along with +new software and hardware throughout the next decade. Such a future has no +place for stop-the-world GC pauses, which have been an impediment to broader +uses of safe and secure languages such as Go. + +Go 1.5, the first glimpse of this future, achieves GC latencies well below the +10 millisecond goal we set a year ago. We presented some impressive numbers +in [[https://talks.golang.org/2015/go-gc.pdf][a talk at Gophercon]]. +The latency improvements have generated a lot of attention; +Robin Verlangen’s blog post +[[https://medium.com/@robin.verlangen/billions-of-request-per-day-meet-go-1-5-362bfefa0911][_Billions_of_requests_per_day_meet_Go_1.5_]] +validates our direction with end to end results. +We also particularly enjoyed +[[https://twitter.com/inconshreveable/status/620650786662555648][Alan Shreve’s production server graphs]] +and his "Holy 85% reduction" comment. + +Today 16 gigabytes of RAM costs $100 and CPUs come with many cores, each with +multiple hardware threads. In a decade this hardware will seem quaint but the +software being built in Go today will need to scale to meet expanding needs and +the next big thing. Given that hardware will provide the power to increase +throughput, Go’s garbage collector is being designed to favor low latency and +tuning via only a single knob. Go 1.5 is the first big step down this path and +these first steps will forever influence Go and the applications it best +supports. This blog post gives a high-level overview of what we have done for +the Go 1.5 collector. + +* The Embellishment + +To create a garbage collector for the next decade, we turned to an algorithm +from decades ago. Go's new garbage collector is a _concurrent_, _tri-color_, +_mark-sweep_ collector, an idea first proposed by +[[http://dl.acm.org/citation.cfm?id=359655][Dijkstra in 1978]]. +This is a deliberate divergence from most "enterprise" grade garbage collectors +of today, and one that we believe is well suited to the properties of modern +hardware and the latency requirements of modern software. + +In a tri-color collector, every object is either white, grey, or black and we +view the heap as a graph of connected objects. At the start of a GC cycle all +objects are white. The GC visits all _roots_, which are objects directly +accessible by the application such as globals and things on the stack, and +colors these grey. The GC then chooses a grey object, blackens it, and then +scans it for pointers to other objects. When this scan finds a pointer to a +white object, it turns that object grey. This process repeats until there are +no more grey objects. At this point, white objects are known to be unreachable +and can be reused. + +This all happens concurrently with the application, known as the _mutator_, +changing pointers while the collector is running. Hence, the mutator must +maintain the invariant that no black object points to a white object, lest the +garbage collector lose track of an object installed in a part of the heap it +has already visited. Maintaining this invariant is the job of the +_write_barrier_, which is a small function run by the mutator whenever a +pointer in the heap is modified. Go’s write barrier colors the now-reachable +object grey if it is currently white, ensuring that the garbage collector will +eventually scan it for pointers. + +Deciding when the job of finding all grey objects is done is subtle and can be +expensive and complicated if we want to avoid blocking the mutators. To keep +things simple Go 1.5 does as much work as it can concurrently and then briefly +stops the world to inspect all potential sources of grey objects. Finding the +sweet spot between the time needed for this final stop-the-world and the total +amount of work that this GC does is a major deliverable for Go 1.6. + +Of course the devil is in the details. When do we start a GC cycle? What +metrics do we use to make that decision? How should the GC interact with the Go +scheduler? How do we pause a mutator thread long enough to scan its stack? + How do we represent white, grey, and black so we can efficiently find and scan +grey objects? How do we know where the roots are? How do we know where in an +object pointers are located? How do we minimize memory fragmentation? How do we +deal with cache performance issues? How big should the heap be? And on and on, +some related to allocation, some to finding reachable objects, some related to +scheduling, but many related to performance. Low-level discussions of each of +these areas are beyond the scope of this blog post. + +At a higher level, one approach to solving performance problems is to add GC +knobs, one for each performance issue. The programmer can then turn the knobs +in search of appropriate settings for their application. The downside is that +after a decade with one or two new knobs each year you end up with the GC Knobs +Turner Employment Act. Go is not going down that path. Instead we provide a +single knob, called GOGC. This value controls the total size of the heap +relative to the size of reachable objects. The default value of 100 means that +total heap size is now 100% bigger than (i.e., twice) the size of the reachable +objects after the last collection. 200 means total heap size is 200% bigger +than (i.e., three times) the size of the reachable objects. If you want to +lower the total time spent in GC, increase GOGC. If you want to trade more GC +time for less memory, lower GOGC. + +More importantly as RAM doubles with the next generation of hardware, simply +doubling GOGC will halve the number of GC cycles. On the other hand since GOGC +is based on reachable object size, doubling the load by doubling the reachable +objects requires no retuning. The application just scales. +Furthermore, unencumbered by ongoing support for dozens of knobs, the runtime +team can focus on improving the runtime based on feedback from real customer +applications. + +* The Punchline + +Go 1.5’s GC ushers in a future where stop-the-world pauses are no longer a +barrier to moving to a safe and secure language. It is a future where +applications scale effortlessly along with hardware and as hardware becomes +more powerful the GC will not be an impediment to better, more scalable +software. It’s a good place to be for the next decade and beyond. +For more details about the 1.5 GC and how we eliminated latency issues see the +[[https://www.youtube.com/watch?v=aiv1JOfMjm0][Go GC: Latency Problem Solved presentation]] +or [[https://talks.golang.org/2015/go-gc.pdf][the slides]]. |