Canceling I/O in Go Cap'n Proto
This report details an experience I had while writing an RPC system in Go. While Go’s standard I/O libraries make a great many things simple, I found cancellation to be more complex than I would have liked. Parts of this situation have improved in the last couple of Go releases (as I have noted below). I hope this positive trend continues in a way that allows the Go ecosystem to easily propagate cancellation, deadlines, and request values. My intent in this report — as well as the proposal I created back in May 2017 — is to give background and feedback to inform future design decisions. Suggestions for solutions welcome!
(Thanks to Ian Lance Taylor, Damien Neil, Cassandra Salisbury, and Andrew Bonventre for reviewing this report for accuracy and clarity.)
An Overview
For several years, I have been the maintainer of the Go Cap’n Proto library in my spare time. Cap’n Proto specifies both a binary serialization format and an RPC protocol. While this library has shown me a number of places where I think Go can improve (and this may be the first of many experience reports), I’d like to focus on a particular problem that can be explained without any knowledge of the library.
The basic building block of the RPC library is the Conn
object. Simplifying a
bit, there are two concurrent tasks that operate on Conn
:
- A goroutine that reads messages from the wire, processes them, then sends
back zero or more messages in response. For example, receiving an “RPC
return” message consults some internal
Conn
state, then sends back an “RPC return acknowledged” message before reading the next message. I call this the receive goroutine. - The application sends RPCs, which translate to messages being sent on the wire. The responses from the remote peer are later read by the receive goroutine.
The Closing Act
The problem at hand is how to stop a Conn
’s receive loop once it has started. A
naive approach would be:
However, this approach has two problems:
- In the Cap’n Proto RPC specification, implementations are supposed to send
an explicit abort message as the last message before intentionally closing
a connection. Calling
Close
shuts down both the reading and writing end of theio.ReadWriteCloser
, making it impossible to send the abort message. - Calling
Close
concurrently withRead
on a genericio.ReadCloser
is not explicitly declared safe. For example, until Go 1.9, callingClose
concurrently withRead
on an*os.File
(such as with a Unix pipe) would result in a data race (#7970). However, types that implementnet.Conn
explicitly allow callingClose
concurrently withRead
.
Another approach could be to close the reading half of the connection first
using CloseRead
, with the intent to interrupt the Read
. This is a bit
unwieldy: CloseRead
and CloseWrite
are only available on TCPConn
and
UnixConn
, and the semantics on how they interact with concurrent operations
is not documented as of Go 1.9. However, Read
is not the only I/O call in the
receive goroutine. Remember that the receive goroutine doesn’t just read
messages from the wire: it also sends them. When the CloseRead
comes in, the
receive goroutine may be in the middle of sending a response to an already
received message. It would be desirable to stop it from sending more messages
while shutting down.
This is a classic example of what Context is supposed to be used for: propagating cancellation down the call stack. Ideally, I would write:
Plumbing the Context
through helper function is tedious but possible. In
cases like ReadFull
, I would likely have to reimplement the function. The
crucial part is actually interrupting the I/O operation. io.Reader
and
io.Writer
do not take in a Context
, nor do they provide a simple way to
cancel the operation. So how can I accomplish this in Go 1.9?
Starting Simple: Canceling a Write
In the scope of Cap’n Proto, cancelable writes are easy to graft on top of
Context
-unaware io.Writer
s, the reason being that partial writes corrupt
the stream. A cancel signal should be ignored once bytes have hit the wire.
Therefore, checking for cancelation before calling Write
is enough for this
use case. For writers that respect SetWriteDeadline
, I can spin up a separate
goroutine that listens for the Done
signal and sets an immediate deadline to
interrupt the Write
.
Don’t Interrupt Me; I’m Reading
Canceling a read is much more complicated. Often, I want to cancel the read
when there isn’t any data available. io.Reader
s conventionally return what is
buffered instead of waiting for more, so Read
returns quickly in those
circumstances. For readers that implement SetReadDeadline
, I can employ the
same technique as for writing, but I’m left in a strange place if the
io.Reader
does not implement SetReadDeadline
.
One way I can simulate interrupting the Read
call is by always calling Read
in another goroutine, and then selecting on Context.Done
and a channel that
produces the result of the Read
call. The caveat is that at some point, the
goroutine calling Read
needs to be waited upon or else resources leak. In the
Cap’n Proto RPC case, canceling the read will likely occur shortly before
Conn
closes the io.ReadWriteCloser
, so any abandoned Read
will not need
to stay around for long. However, I still have a problem: given a generic
io.ReadCloser
, I cannot guarantee that it is safe to call Close
concurrently with Read
. There is fundamentally no way to address this: such
an io.Reader
cannot be stopped safely.
There’s one other wrinkle: Read
can’t be wrapped in a single function like
Write
. Write
’s contract is to block until its input is written, which means
that I can gather up all that I need to write into one byte slice then call the
above function. Read
may intentionally return less than requested, so often
multiple Read
calls are necessary. This is usually handled by routines like
io.ReadFull
, which I don’t want to give up or duplicate to do
context-plumbing. To support my existing io.Reader
-based code, as well as to
maintain the state of the abandoned goroutine, I had to create an io.Reader
that curries the Context
:
What Works
SetReadDeadline
and SetWriteDeadline
are flexible enough to allow me to
graft cancelation and deadline awareness onto io.Reader
s and io.Writer
s.
However, when I first started looking at this problem (around Go 1.7 and 1.8),
this meant pipes (being *os.File
) had to be excluded. Starting with Go 1.9,
it is now possible to interrupt an *os.File.Read
call with *os.File.Close
in a safe manner, making the leaky io.Reader
fallback safe for any type of
file. In Go 1.10, *os.File
gains the SetReadDeadline
and SetWriteDeadline
methods (#22114), which will make pipes work with the timeout
interrupt approach. (Go 1.10 also adds os.IsTimeout
.)
What Doesn’t Work
My solution works for the narrow problem of propagating Context
cancellation
and deadlines to specific io.Reader
and io.Writer
types that I needed, but
at the cost of an additional goroutine and complexity. The goroutine seems
unnecessary: one could imagine a version of the Go runtime poller that takes in
a Done
channel instead of a deadline. It’s also not intuitive that you can
set an immediate deadline to interrupt a concurrent call. In the first draft of
this post, I set deadlines in the future and checked for Context.Done
in a
loop (thanks to Ian Lance Taylor for pointing out the more efficient
implementation above). On the complexity side, there’s a large amount of my
custom io.Reader
implementation dedicated to handling the abandon-able
goroutine. The check for deadline support is fairly boilerplate, and it would
be nice to eliminate it.
There’s also complexity must necessarily be pushed onto users of the library: I
have to document that io.Readers passed to the connection must either have a
SetReadDeadline
method or be safe to call Read
concurrently with Close
.
This stomps on one of the key benefits of Go’s I/O interfaces: anything that
implements the interface should just work. Library users now must carefully
inspect the io.ReadWriteCloser
s they pass into my RPC library. And further,
it makes it harder to compose I/O types. If the user wanted to create a custom
io.ReadWriteCloser
that uses *bufio.Reader
on another io.Reader
, they
have to know to wrap SetReadDeadline
as well, and it has to “poke through”
*bufio.Reader
to the underlying io.Reader
’s SetReadDeadline
. If the I/O
operations had a standard way of propagating Context
, then this wouldn’t be
necessary.
A Note on Context
values
One further observation: the solution I used in Cap’n Proto does not work for
Context
values. If I needed to propagate a Context
value into the
io.ReadWriteCloser
(say for observability purposes), then I would be out of
luck. While I don’t have a concrete need for this within this library, I have
seen other places where this would be useful — notably in Google Cloud
Storage’s storage.Reader
and storage.Writer
. The GCS package works
around the lack of Context
by currying the Context
, similar to what this
package does. It would be nice to address this use case, but I haven’t thought
about it in as much depth.