Canceling I/O in Go Cap'n Proto
This report details an experience I had while writing an RPC system in Go. While Go’s standard I/O libraries make a great many things simple, I found cancellation to be more complex than I would have liked. Parts of this situation have improved in the last couple of Go releases (as I have noted below). I hope this positive trend continues in a way that allows the Go ecosystem to easily propagate cancellation, deadlines, and request values. My intent in this report — as well as the proposal I created back in May 2017 — is to give background and feedback to inform future design decisions. Suggestions for solutions welcome!
(Thanks to Ian Lance Taylor, Damien Neil, Cassandra Salisbury, and Andrew Bonventre for reviewing this report for accuracy and clarity.)
An Overview
For several years, I have been the maintainer of the Go Cap’n Proto library in my spare time. Cap’n Proto specifies both a binary serialization format and an RPC protocol. While this library has shown me a number of places where I think Go can improve (and this may be the first of many experience reports), I’d like to focus on a particular problem that can be explained without any knowledge of the library.
The basic building block of the RPC library is the Conn object. Simplifying a
bit, there are two concurrent tasks that operate on Conn:
- A goroutine that reads messages from the wire, processes them, then sends
back zero or more messages in response. For example, receiving an “RPC
return” message consults some internal
Connstate, then sends back an “RPC return acknowledged” message before reading the next message. I call this the receive goroutine. - The application sends RPCs, which translate to messages being sent on the wire. The responses from the remote peer are later read by the receive goroutine.
The Closing Act
The problem at hand is how to stop a Conn’s receive loop once it has started. A
naive approach would be:
However, this approach has two problems:
- In the Cap’n Proto RPC specification, implementations are supposed to send
an explicit abort message as the last message before intentionally closing
a connection. Calling
Closeshuts down both the reading and writing end of theio.ReadWriteCloser, making it impossible to send the abort message. - Calling
Closeconcurrently withReadon a genericio.ReadCloseris not explicitly declared safe. For example, until Go 1.9, callingCloseconcurrently withReadon an*os.File(such as with a Unix pipe) would result in a data race (#7970). However, types that implementnet.Connexplicitly allow callingCloseconcurrently withRead.
Another approach could be to close the reading half of the connection first
using CloseRead, with the intent to interrupt the Read. This is a bit
unwieldy: CloseRead and CloseWrite are only available on TCPConn and
UnixConn, and the semantics on how they interact with concurrent operations
is not documented as of Go 1.9. However, Read is not the only I/O call in the
receive goroutine. Remember that the receive goroutine doesn’t just read
messages from the wire: it also sends them. When the CloseRead comes in, the
receive goroutine may be in the middle of sending a response to an already
received message. It would be desirable to stop it from sending more messages
while shutting down.
This is a classic example of what Context is supposed to be used for: propagating cancellation down the call stack. Ideally, I would write:
Plumbing the Context through helper function is tedious but possible. In
cases like ReadFull, I would likely have to reimplement the function. The
crucial part is actually interrupting the I/O operation. io.Reader and
io.Writer do not take in a Context, nor do they provide a simple way to
cancel the operation. So how can I accomplish this in Go 1.9?
Starting Simple: Canceling a Write
In the scope of Cap’n Proto, cancelable writes are easy to graft on top of
Context-unaware io.Writers, the reason being that partial writes corrupt
the stream. A cancel signal should be ignored once bytes have hit the wire.
Therefore, checking for cancelation before calling Write is enough for this
use case. For writers that respect SetWriteDeadline, I can spin up a separate
goroutine that listens for the Done signal and sets an immediate deadline to
interrupt the Write.
Don’t Interrupt Me; I’m Reading
Canceling a read is much more complicated. Often, I want to cancel the read
when there isn’t any data available. io.Readers conventionally return what is
buffered instead of waiting for more, so Read returns quickly in those
circumstances. For readers that implement SetReadDeadline, I can employ the
same technique as for writing, but I’m left in a strange place if the
io.Reader does not implement SetReadDeadline.
One way I can simulate interrupting the Read call is by always calling Read
in another goroutine, and then selecting on Context.Done and a channel that
produces the result of the Read call. The caveat is that at some point, the
goroutine calling Read needs to be waited upon or else resources leak. In the
Cap’n Proto RPC case, canceling the read will likely occur shortly before
Conn closes the io.ReadWriteCloser, so any abandoned Read will not need
to stay around for long. However, I still have a problem: given a generic
io.ReadCloser, I cannot guarantee that it is safe to call Close
concurrently with Read. There is fundamentally no way to address this: such
an io.Reader cannot be stopped safely.
There’s one other wrinkle: Read can’t be wrapped in a single function like
Write. Write’s contract is to block until its input is written, which means
that I can gather up all that I need to write into one byte slice then call the
above function. Read may intentionally return less than requested, so often
multiple Read calls are necessary. This is usually handled by routines like
io.ReadFull, which I don’t want to give up or duplicate to do
context-plumbing. To support my existing io.Reader-based code, as well as to
maintain the state of the abandoned goroutine, I had to create an io.Reader
that curries the Context:
What Works
SetReadDeadline and SetWriteDeadline are flexible enough to allow me to
graft cancelation and deadline awareness onto io.Readers and io.Writers.
However, when I first started looking at this problem (around Go 1.7 and 1.8),
this meant pipes (being *os.File) had to be excluded. Starting with Go 1.9,
it is now possible to interrupt an *os.File.Read call with *os.File.Close
in a safe manner, making the leaky io.Reader fallback safe for any type of
file. In Go 1.10, *os.File gains the SetReadDeadline and SetWriteDeadline
methods (#22114), which will make pipes work with the timeout
interrupt approach. (Go 1.10 also adds os.IsTimeout.)
What Doesn’t Work
My solution works for the narrow problem of propagating Context cancellation
and deadlines to specific io.Reader and io.Writer types that I needed, but
at the cost of an additional goroutine and complexity. The goroutine seems
unnecessary: one could imagine a version of the Go runtime poller that takes in
a Done channel instead of a deadline. It’s also not intuitive that you can
set an immediate deadline to interrupt a concurrent call. In the first draft of
this post, I set deadlines in the future and checked for Context.Done in a
loop (thanks to Ian Lance Taylor for pointing out the more efficient
implementation above). On the complexity side, there’s a large amount of my
custom io.Reader implementation dedicated to handling the abandon-able
goroutine. The check for deadline support is fairly boilerplate, and it would
be nice to eliminate it.
There’s also complexity must necessarily be pushed onto users of the library: I
have to document that io.Readers passed to the connection must either have a
SetReadDeadline method or be safe to call Read concurrently with Close.
This stomps on one of the key benefits of Go’s I/O interfaces: anything that
implements the interface should just work. Library users now must carefully
inspect the io.ReadWriteClosers they pass into my RPC library. And further,
it makes it harder to compose I/O types. If the user wanted to create a custom
io.ReadWriteCloser that uses *bufio.Reader on another io.Reader, they
have to know to wrap SetReadDeadline as well, and it has to “poke through”
*bufio.Reader to the underlying io.Reader’s SetReadDeadline. If the I/O
operations had a standard way of propagating Context, then this wouldn’t be
necessary.
A Note on Context values
One further observation: the solution I used in Cap’n Proto does not work for
Context values. If I needed to propagate a Context value into the
io.ReadWriteCloser (say for observability purposes), then I would be out of
luck. While I don’t have a concrete need for this within this library, I have
seen other places where this would be useful — notably in Google Cloud
Storage’s storage.Reader and storage.Writer. The GCS package works
around the lack of Context by currying the Context, similar to what this
package does. It would be nice to address this use case, but I haven’t thought
about it in as much depth.