Bounds Check Elimination (BCE) in Golang 1.7

Animats · on Aug 22, 2016

This is an area of optimization that was figured out in the Pascal era, then forgotten during the C era. It's good to see it coming back. Work on Pascal was able to optimize out about 95% of bounds checks. Here's an overview that has a literature survey.[1]

A big win is hoisting bounds checks out of loops. I think the Go compiler already optimized out the check for loops where the loop iteration variable is bounded by the range of the array. But that's a special case. The general case has a problem. Consider

    func copy(src []int, dst []int) {
	for i := range(src) {
            dst[i] = src[i]
        }
    }

This has a subscript out of range error if dst is shorter than src. It's the classic C buffer overflow. For optimization purposes, we can look at this as

    func copy(src []int, dst []int) {
	for i := range(src) {
            assert(i < len(src)) // subscript check
            assert(i < len(dst)) // subscript check 
            dst[i] = src[i]
        }
    }

The FOR statement insures that i < len(src), so a prover would see this as (i < len(src) implies (i < len(src)), which reduces to True, and we can discard that assert. We still have the assert for "dst". For that, it's possible to hoist it out of the loop. We generate the check

    func copy(src []int, dst []int) {
        assert(len(src) <= len(dst)) // generated
	for i := range(src) {
            assert(i < len(src)) // optimized out
            assert(i < len(dst)) // subscript check 
            dst[i] = src[i]
        }
    }

which gives us

    len(src) <= len(dst) and i < len(src) implies (i < len(dst))

which simplifies to True. So assert(i < len(dst) can be optimized out. We have one run-time check remaining, and it's outside the loop, done once. This is hoisting a bounds check out of a loop, and it's a huge win for inner loops. Good test for a compiler: write a matrix multiply and make sure it can do this.

[1] http://www.cri.ensmp.fr/classement/doc/E-235.ps

kevhito · on Aug 22, 2016

If out of bounds errors were undefined behavior, lifting the bounds check out of the loop would be legal. But in go, it isn't an error. It is equivalent to panic(), which can be caught. If the caller recovers, they should see the partially-completed loop, but lifting it out of the loop would change that behavior.

Here is an example, using your copy function:

https://play.golang.org/p/Cc3Oc3le-9

kbenson · on Aug 22, 2016

I think the point is that in the cases where you can prove it's not possible, you can optimize it out. This is functionally the same, because if it's incapable of panicing because of out of bounds, you don't need to check for it. I think what you likely get in practice is something like:

    func copy(src []int, dst []int) {
        if (len(src) <= len(dst) {
            for i := range(src) {
                dst[i] = src[i]
        } else {
            for i := range(src) {
                assert(i < len(dst)) // subscript check 
                dst[i] = src[i]
            }
        }
    }

That way you still get panics in the case where they are applicable, but not in the cases where you can safely abstract the bounds check away (because a panic for that reason should not be possible). Although this is likely at the code generation level, so you can consider the if statement to be a sort of macro branching on what code is actually generated.

Animats · on Aug 22, 2016

Go backed into exceptions. Originally, "panic" was always fatal.[1] Then came "recover".[2] But it was considered bad form to use "recover" for anything other than doing a final cleanup and exit. Now, the library JSON parser can panic and recover due to invalid input, then return an error code. That's exception handling.

Now we have to give up optimizations so that Go's exception handling will work. Every function call, every subscripted access, and every map reference is a potential loop exit. Bleah.

[1] http://dave.cheney.net/2012/01/18/why-go-gets-exceptions-rig... [2] https://groups.google.com/forum/#!topic/golang-nuts/HOXNBQu5...

weberc2 · on Aug 24, 2016

Thanks for the informative post! To clarify, does the Go compiler do this hoisting today? Or is this a "we should be able to do this theoretically?"

iand · on Aug 22, 2016

This doc, written by one of the Go team at Google, enumerates the state of BCE as of a few months ago:

https://docs.google.com/document/d/1vdAEAjYdzjnPA9WDOQ1e4e05...

jjnoakes · on Aug 22, 2016

Is Go not optimizing out the dead code in those examples because of those debug flags? Or does it not do that in general?

Or is it just doing the bounds check elimination (and debug messages about it) before the dead code elimination?

kevhito · on Aug 22, 2016

In C/C++, out-of-bounds memory access is in undefined behavior, so the compiler can assume it is a nop, so it is dead code and eliminate it.

In go, out-of-bounds array/slice access is equivalent to calling panic. So _ = a[i] isn't dead code unless you can be sure that a[i] is in bounds. So yes, I imagine BCE should come before dead code elimination.

e3b0c · on Aug 22, 2016

I don't think bounds checking can be done at compile-time, especially in C/C++.

XMPPwocky · on Aug 22, 2016

Not in the general case.

    char x[4];
    printf("%c", x[4]);

That's pretty easy to detect.

ithkuil · on Aug 22, 2016

here you're talking about compile time bound check _elimination_, i.e. statically determine when you can remove a runtime bound check which would have been added by the compiler. I think e3b0c was talking about completely static bound checks.

codys · on Aug 22, 2016

No.

We can look at the example given above and clearly know/prove that the access is out of bounds. ie: a C compiler could emit a diagnostic (or decide it's undefined behavior and treat it as unreachable).

There aren't any bounds checks above to eliminate.

IshKebab · on Aug 22, 2016

I think he means that the result isn't used.

nothrabannosir · on Aug 22, 2016

Yes, but that doesn't matter: array indexing is not a "pure" function because it can panic. That's why you can't optimise it away, even if the result isn't used.

IshKebab · on Aug 23, 2016

Ah I see.

JamesMcMinn · on Aug 22, 2016

Go handles dead code by not allowing you to compile something with dead code. In the examples the "_ =" is needed to force it to compile and do the work without actually assigning the value to anything.

junke · on Aug 22, 2016

Suppose you "go generate" code which ends up having something along the lines of:

    x := true
    var y = 0
    if x {
        y = 10
    } else {
        y = 20
    }

In that case dead code elimination might require a little more analysis than checking for unassigned expressions and unused variables.

So the question might be: what kind of dead code elimination exist in the compiler, apart from what you describe which has existed since the beginning of Go? (if I understand correctly)

chrisseaton · on Aug 22, 2016

> Go handles dead code by not allowing you to compile something with dead code.

This is obviously not true and is easily disproved.

Here's a program that compiles but has statically provably dead code on line 15 https://play.golang.org/p/_XcxezHV-6.

But even if I couldn't give you an example like that where the dead code is obvious, dead code is produced by the compiler itself as it optimises, and by code generation tools.

fizzbatter · on Aug 22, 2016

Perhaps i misunderstand you, but isn't simply calling a function the same as what you quoted? Eg, `_ := Foo()` is the same as `Foo()`. No assignment needed.

`_` mainly exists to allow ignoring single values, within multiple. Eg, dropping an error: `val, _ := MightError()`

JamesMcMinn · on Aug 22, 2016

Foo() can have side-effects, taking a value from an array cannot, so Foo() is not necessarily dead code, whilst a[0] certainly is.

_ serves multiple purposes, not just for ignoring a return value. It's often used when you care about the side effect but not the return value, such as importation a driver for a database . You'll never use the library, so you import it as _ "db.driver.name/path/".

euyyn · on Aug 22, 2016

Can't taking a value from an array result in a panic? Quite the side effect.

sfrailsdev · on Aug 22, 2016

tapirl, just fyi this references line numbers like line 22, but doesn't display any numbers.

jonesetc · on Aug 22, 2016

There is a comment noting the number on all of the numbers referenced in text.

tapirl · on Aug 22, 2016

? you mean there is not line numbers at the beginning of code lines? yes, my blog server code is too simple to show line numbers.

packetslave · on Aug 22, 2016

you could use something like http://alexgorbatchev.com/SyntaxHighlighter/ to do it in JavaScript.