The following is my notes from learning about the coroutine facilities added to c++ in c++20.
Disclaimer: I have yet to implement anything for production using coroutines. I have, however, spent a lot of time watching talks and reading up on this stuff, and a bit of time with toy examples. Take what I’ve documented here with a handful of salt. My goal is to understand how coroutines work under the hood, not present best practices for using them.
Motivation
Three things collided at the end of 2020 to spark my interest in coroutines.
First, a move from explicit futures to coroutines based on folly::coro
was again highlighed in the guidance at work around coding style for concurrency in c++, as it has for the last couple years.
Second, the concept of structured concurrency, well explained in Eric Niebler’s recent article, the executors proposal for c++23 and the connection between sender/receiver and coroutines was present in social media and podcasts I listen to.
Third, I tossed some c++20 coroutines into the compiler explorer, what I saw did not make intuitive sense. I wanted to know more.
tl;dr
- A coroutine is a generalization of a subroutine, it retains the
call
/return
operations and adds asuspend
and aresume
operation. - There are many design choices around coroutines, c++20 gave us coroutines that are: stackless, first-class, and offer asymmetric (or symmetric) transfer. “The most efficient, scalable, open ended, versatile coroutines.” – Gor Nishanov
- Coroutines are not executors; there is nothing inherently parallel about using them.
- Coroutines are not even implementations in the c++20 standard, they are simply definitions of concepts1 for building higher level abstractions!
- Great implementations of higher level abstractions exist already to play with: libunifex, folly::coro, cppcoro, etc.
What is a coroutine (in general)
Put simply, coroutines are a generalization of subroutines that additionally have operations to be suspended and resumed.
There are several agreed upon classifications of trade offs in coroutine designs:2
- First-class / Constrained: First-class can be stored in a data structure and passed around as a parameter, constrained cannot.
- Symmetric/Asymmetric control transfer: Suspend/resume are symmetric, if you suspend, you must resume to another coroutine. With asymmetric coroutines, control is always transferred back to the coroutine’s invoker.
- Stackless / Stackful: Save the whole stack on suspend (stackful), or save only the frame (stackless).
Coroutines allow developers to write code that looks linear and logically is, however, the execution flow is a cooperative multitasking between the coroutines.
Natural use cases of coroutines include: Implementing lazy generators to compute results on demand. Implementing structured concurrency. Hiding latency of blocking operations. Reactive stream programming. Implementing state machines. And much much more.
What is a coroutine (c++20)
What were the design goals of c++ in adopting a language feature for coroutines?
“The most efficient, most scalable, most open/customizable coroutines of any programming language in existence.” - Gor Nishanov (source: every talk by Nishanov on the subject 2014-20193 🙂 ):
- Scalable (to billions of coroutines)
- Efficient (resume and suspend operations comparable in cost to function call overhead)
- Open ended coroutine machinery allowing library designers to develop coroutine libraries exposing high-level semantics, such as generators, green threads, tasks, and more.
- Usable in environments where exceptions are forbidden or not available
With the above design goals in mind, what did we get in c++20?
We got an asymmetric (but also symmetric4), first-class, stackless coroutine design.
A simple c++20 coroutine
How does one write a coroutine in c++?
Definition: A coroutine is any5 function using any of the coroutine keywords: co_await
6, co_return
, co_yield
For example, the simplest function to coroutine conversion might be:
void my_func() { return; } // a function
class task; // forward declare, will implement later..
task my_coro() { co_return; } // now a coroutine!
You will notice two things changed: 1. the return
became co_return
, and since we now use one of the coroutine keywords the compiler will treat this as a coroutine. 2. we no longer return void
, but instead, some class called task
.
Great, now let’s run it. Well, that’s the kicker, see that task
, we need an implementation for it.
For c++20 several concepts, the interaction between these concepts, and many customization points are defined, but an implementation is not provided7. Without more work (or a third party library8) our coroutine cannot be used.
Before we implement task
, however, lets take an detour to explore what c++20 standard did provide us.
Classes and interactions defined
promise_type
std::coroutine_handle<promise_type>
std::coroutine_traits<promise_type>
- Awaitable – not defined in standard
- Awaiter – not defined in standard
I am adding two concepts not in the standard: Awaitable and Awaiter. Lewis Baker uses Awaitable and Awaiter in his writings9 to make understanding the interactions more clear. And note: The coroutine object can also be its own Awaitable / Awaiter.
One more terminology clarification before continuing: The term “coroutine” is overloaded for both the “coroutine factory” (call operation on the coroutine) and the “coroutine object” created by that factory. Thanks to Rainer Grimm’s draft c++20 book10 for the insight, it was causing me confusion as well. Keep it in mind as you reason about coroutines.
Promise
The promise object is the primary way a coroutine’s behavior can be customized and how it communicates with its invoker.
Example of the customization points:
template<typename T>
struct promise_type {
class Awaitable; // ...
using coro_handle = std::experimental::coroutine_handle<promise_type>;
// required
auto get_return_object() { return coro_handle::from_promise(*this); }
Awaitable initial_suspend() noexcept;
Awaitable final_suspend() noexcept;
// result handling -- one required
void return_void();
//void return_value(T v);
//void yield_value(T v);
// error handling
void unhandled_exception();
// memory handling
static void* operator new (std::size_t size, const std::nothrow_t& tag) noexcept;
static void operator delete (void* ptr);
static task get_return_object_on_allocation_failure();
// ...
//template<typename U> Awaiter await_transform(U);
};
Awaitable
We need this concept separate from an Awaiter to enable talking about several customization points. Often the Awaitable and Awaiter will be the same object, however, it is possible to implement them as separate classes.
To go from an Awaitible to an Awaiter we would use the Awaitible’s operator co_await() -> awaiter
or a free operator co_await(awaitable) -> awaiter
To add more complexity to the mix, the Awaitable can also be passed through promise.await_transform(awaitable)
before the co_await operator – NOTE: this promise here would be the promise object of the currently executing coroutine that is performing a co_await
operation on an Awaitable
– await_transform
enables what Lewis refers to as “Contextually Awaitable”, e.g. we could wait on some class that is not otherwise an Awaitable if our promise can transform it to something that is an Awaitable. Don’t worry if you don’t follow along, this piece is very confusing, but not necessary for the basics.
class awaitable {
//awaiter operator co_await();
};
//awaiter operator co_await(awaitable);
Awaiter
The Awaiter is the object that implements the special functions for suspending and resuming a coroutine.
Again, often this can be the coroutine object itself.
From the standard: “await-suspend is the expression e.await_suspend(h)
, which shall be a prvalue of type void
, bool
, or std::experimental::coroutine_handle<Z>
for some type Z
.”
That legalese is saying, the return value from await_suspend
has various return options used to specify where control is transferred to next: true
/void
-> resume caller, false
-> resume of this coroutine, another coroutine_handle
-> resume that coroutine (e.g. can be used to symmetrically transfer control to another coroutine; a continuation).
template<typename T>
struct awaiter
{
bool await_ready() noexcept;
void await_suspend(std::experimental::coroutine_handle<>) noexcept;
//bool await_suspend(std::experimental::coroutine_handle<>) noexcept;
//stdx::coroutine_handle<Z> await_suspend(std::experimental::coroutine_handle<>) noexcept;
void await_resume() noexcept;
//T await_resume() noexcept;
};
coroutine_handle
Type erased coroutine handle. Interface to talk to a coroutine, also handles marshaling to a void *
and back.
template<> struct coroutine_handle < void > {
coroutine_handle() noexcept = default;
coroutine_handle(nullptr_t) noexcept;
coroutine_handle & operator=(nullptr_t) noexcept;
explicit operator bool ( ) const noexcept;
static coroutine_handle from_address (void * a) noexcept;
void * to_address() const noexcept;
void operator()() const;
void resume() const;
void destroy();
bool done() const;
};
template < typename T >
struct coroutine_handle : coroutine_handle < void >
{
Type erased coroutine handle for promise type T
T & promise();
static coroutine_handle from_promise( T &) noexcept;
};
coroutine_traits
Provides an alternative way to look up the promise_type
from the return type of a coroutine. For instance, if you cannot modify the type e.g. to make std::future
a coroutine compatible type.
template<class, class...>
struct coroutine_traits {};
template<class R, class... Args>
requires requires { typename R::promise_type }
struct coroutine_traits<R, Args...> {
using promise_type = R::promise_type;
};
Minimal coroutine object
From those interactions, we now have the context to dive in and implement our task<int>
object required to get our example running.
We will make it “lazy” by returning stdx::suspend_always
from our initial suspend.
// forward decl some promise types
template<typename U> struct promise_type_impl_base;
template<typename U> struct promise_type_impl;
template<typename T>
class task {
public:
using promise_type = promise_type_impl<T>;
using handle_type = std::experimental::coroutine_handle<promise_type>;
task(handle_type handle) : handle_(handle) { }
task(task&) = delete; task(task&&) = delete; // disable copy/move
bool await_ready() { return handle_.done(); }
bool await_resume() {
if (!handle_.done())
handle_.resume();
return !handle_.done();
}
template<typename PROMISE>
void await_suspend(std::experimental::coroutine_handle<PROMISE> coroutine) {}
~task() { handle_.destroy(); }
private:
handle_type handle_;
};
template<typename U>
struct promise_type_impl_base {
auto get_return_object() { return task<U>::handle_type::from_promise(*static_cast<typename task<U>::promise_type*>(this));}
auto initial_suspend() noexcept { return std::experimental::suspend_always{}; }
auto final_suspend() noexcept { return std::experimental::suspend_always{}; }
void unhandled_exception() { std::terminate(); };
#if 0
static void* operator new (std::size_t size, const std::nothrow_t& tag) noexcept {
void * rv = malloc(size)/*nullptr*/; // set nullptr to hit get_return_object_on_allocation_failure
fmt::print("new {} at {}\n", size, rv);
return rv;
}
static void operator delete (void* ptr) { fmt::print("delete {}\n", ptr); free(ptr); }
static task<U> get_return_object_on_allocation_failure() { throw std::bad_alloc(); };
#endif
};
template<typename U>
struct promise_type_impl : public promise_type_impl_base<U> {
void return_value(U v) { _v = v; }
U& result() { return _v; }
// void yield_value(U v) { _v = v; }
U _v;
};
template<>
struct promise_type_impl<void> : public promise_type_impl_base<void> {
void return_void() {}
};
That’s it! Now we can compile and execute our coroutine example.
Transformations
Finally with a working coroutine, we can look into what the compiler is doing to our code.
Await Flow
Lets map out the flow through an Awaiter when we do co_await awaitable
graph TD Caller[caller] Start[co_await awaitable] Await_Ready{awaiter.await_ready} Await_Suspend{awaiter.await_suspend} Await_Resume[awaiter.await_resume] Promise_await_transform["awaiter =
operator co_await( promise.await_transform( awaitable ) )"] %% Await_operator_co_await[awaiter = operator co_await(awaitable)] Running[Run to next suspension point ...] Coro_Resume[coro_handle.resume] %% Return_To_Caller[return to caller] Another_Coro_Resume["another_coro_handle<P>.resume"] %% Promise_Return_X[promise.return_value / promise.return_void] %% style Promise_Return_X color:#fff,fill:#bbf style Caller fill:#79A2A1,color:#fff style Running fill:#79A2A1,color:#fff %% style Coro_Resume fill:#A2798F,color:#fff %% style Another_Coro_Resume fill:#A2798F,color:#fff style Await_Ready fill:#A2798F,color:#fff style Await_Suspend fill:#A2798F,color:#fff style Await_Resume fill:#A2798F,color:#fff style Start fill:#A2798F,color:#fff style Promise_await_transform fill:#798fa2,color:#fff Caller --> Start Start ---> Promise_await_transform Promise_await_transform --> Await_Ready Await_Ready -- false --> Await_Suspend Await_Suspend -- true / void --> Caller %% Return_To_Caller --> Caller Await_Suspend -- "another_coro_handle<P>" --> Another_Coro_Resume %% Another_Coro_Resume --> Await_Ready Await_Suspend -- false ---> Await_Resume Await_Ready -- true ---> Await_Resume Coro_Resume --> Await_Ready Await_Resume --> Running Running --> Start %% Running -- co_return --> Promise_Return_X
Promise Flow
In context of the promise workflow
graph TD Caller[caller] Coro_Factory[factory] Coro_Factory_new[promise_type::new sizeof_frame] Promise_get_return_object[promise.get_return_object] Promise_initial_suspend[co_await promise.initial_suspend] Promise_return_X[promise.return_value
promise.return_void] Promise_final_suspend[co_await promise.final_suspend] Promise_unhandled_exception[promise.unhandled_exception] AwaitableFlow_initial((awaitable flow)) AwaitableFlow_final((awaitable flow)) AwaitableFlow_susp((awaitable flow)) Running((coroutine's
code
running)) %% Coro_Destroy[coro_handle.destroy] Promise_get_return_object_on_alloc_failure[promise_type::
get_return_object_on_allocation_failure] style Caller fill:#79A2A1,color:#fff style Promise_get_return_object fill:#798fa2,color:#fff style Promise_return_X fill:#798fa2,color:#fff style Promise_get_return_object_on_alloc_failure fill:#798fa2,color:#000 style Coro_Factory_new fill:#798fa2,color:#000 style Promise_initial_suspend fill:#798fa2,color:#fff style Promise_unhandled_exception fill:#798fa2,color:#fff style Promise_final_suspend fill:#798fa2,color:#fff style AwaitableFlow_initial fill:#A2798F,color:#fff style AwaitableFlow_final fill:#A2798F,color:#fff style AwaitableFlow_susp fill:#A2798F,color:#fff Caller --> Coro_Factory Coro_Factory --> Coro_Factory_new Coro_Factory_new -- nullptr ---> Promise_get_return_object_on_alloc_failure Coro_Factory_new --> Promise_get_return_object Promise_get_return_object --> Promise_initial_suspend Promise_initial_suspend --> AwaitableFlow_initial AwaitableFlow_initial -- resume --> Running Running --> Promise_return_X Running --> Promise_unhandled_exception Running -- suspension point --> AwaitableFlow_susp AwaitableFlow_susp --> Running %% Promise_unhandled_exception --> Running Promise_unhandled_exception --> Promise_final_suspend %% Promise_return_X --> Running Promise_return_X --> Promise_final_suspend Running --> Promise_final_suspend Promise_final_suspend --> AwaitableFlow_final
Promise transformation
With those two flow charts in mind, how might a compiler transform our code from the coroutine that we wrote to something executing those state machines?
The first transformation that must take place is defined in the coroutine TS (n4760#subsection.11.4.4):
// This coroutine
task<void> my_coro()
{
F;
co_return;
}
// becomes:
{
using P = task<void>::promise_type;
P p (promise-constructor-arguments);
co_await p.initial_suspend(); // initial suspend point
try { F; } catch(...) { p.unhandled_exception(); }
final_suspend:
co_await p.final_suspend(); // final suspend point
}
This covers the interactions from within the coroutine with the promise object. As you can see it has inserted the two implicit suspend points (initial and final), and added the unhanded exception connection to the promise object. Not shown is the interaction with the static operator new
/operator delete
/get_return_object_on_allocation_failure
.
Note: For all of these examples, I will assume the lazy task<>
defined as a minimal implementation in the Minimal coroutine object section.
Here’s what that might look like on a sample coroutine:
// Input
task<int> my_coro(int a) {
// implicit initial suspend
std::string s = "In";
co_await stdx::suspend_always{}; // suspend point 1
fmt::print(s);
s = " coroutine!\n";
co_await stdx::suspend_always{}; // suspend point 2
fmt::print(s);
co_return a;
// implicit final suspend
}
// After Transformation
task<int> my_coro(int a) {
using P = task<int>::promise_type;
P p;
co_await p.initial_suspend(); // initial suspend point
try {
std::string s = "In";
co_await stdx::suspend_always{}; // suspend point 1
fmt::print(s);
s = " coroutine!\n";
co_await stdx::suspend_always{}; // suspend point 2
fmt::print(s);
co_return a;
} catch(...) { p.unhandled_exception(); }
final_suspend:
co_await p.final_suspend(); // final suspend point
}
co_return / co_yield transformation
The two trivial keyword transforms are for co_return and co_await, both just talk to the promise.
co_return transform
// ...
co_return a;
// ...
// ...
promise->return_value(a);
goto final_suspend;
// ...
co_yield transform
// ...
co_yeild ++a;
// ...
// ...
promise->yeild_value(++a);
// ...
Await transformation
This one is a tad harder to express.
Lets start with the transform described by Marcin Grzebieluch in his codedive::2019 talk11. Conceptually this transform is easy to reason about, however, it is not particularly close to what clang actually does. If you’ve heard someone say “coroutines chop up your function into callbacks,” this is probably the mental modal of the transformation they have in mind.
// Input: After Promise Transform (above)
task<int> my_coro(int a) {
using P = task::promise_type;
P p (promise-constructor-arguments);
co_await p.initial_suspend(); // initial suspend point
try {
std::string s = "In";
co_await suspend_always{}; // suspend point 1
fmt::print(s);
s = " coroutine!\n";
co_await suspend_always{}; // suspend point 2
fmt::print(s);
co_return a;
} catch(...) { p.unhandled_exception(); }
final_suspend:
co_await p.final_suspend(); // final suspend point
}
// Transformed code
template<typename promise>
struct my_coro_frame {
my_coro_fame(int a) : _a(a), s(), current_suspend_point_(0) {}
void resume_from_suspension_point_initial(){
s = "In";
}
void resume_from_suspension_point_1(){
fmt::print(s);
s = " coroutine!\n"
}
void resume_from_suspension_point_2(){
fmt::print(s);
promise_->return_value(a);
}
void resume_from_suspension_point_final(){
}
// Capture arugments
int a;
// Capture locals
std::string s;
promise_type* promise_;
stdx::suspend_always suspension_point_initial_object;
stdx::suspend_always suspension_point_1_object;
stdx::suspend_always suspension_point_2_object;
stdx::suspend_always suspension_point_final_object;
// Not mentioned in Marcin's talk, but you need somewhere to track
// the suspension point and a dispatch funtion to resume to the
// right place...
int current_suspend_point_;
void resume_state_machine_() {
try {
switch(current_suspend_point_) {
// ... code goes here ...
} catch(...) { promise_->unhandled_exception(); }
}
};
task<int> my_coro(int a) {
// TODO:
// create a new frame (use task<int>::operator new if available)
// Initialize the promise in the frame,
// move the passed args into the frame object,
// same for locals, ...
// Run the state machine
// Return the task
}
The problem with that transformation, however, is if you throw our example coroutine into compiler explorer the assembly just doesn’t match up. Additionally, given that we left so much out by omitting the state machine, lets look at another version of this transformation.
Andreas Fertig has a post showing some options for wiring up cppinsights.io for coroutine support. Since as of February 2021 this is not online, we will manually transform our example.
There are also a few pieces in luncliff’s post that I want to incorporate as well (the coroutine_handle
/ frame_prefix
bits).
Await Transform (detailed)
Factory
Lets start with the factory function, can we figure out what this is doing? Note, this is the -O2
optimized version, you should also look at the -O1
to see more steps that get optimized away.
ce
test_coro(int): # @test_coro(int)
push rbp
push rbx
push rax
mov ebp, esi
mov rbx, rdi
## Allocate 56 bytes on the heap for our coroutnie frame
mov edi, 56
call operator new(unsigned long)
## First are two compiler reserved pointers for resume / destroy
mov qword ptr [rax], offset test_coro(int) [clone .resume]
mov qword ptr [rax + 8], offset test_coro(int) [clone .destroy]
## Next is the promise
## First the passed argument
mov dword ptr [rax + 20], ebp
## And zero out (what later looks to be the suspension point index)
mov byte ptr [rax + 48], 0
mov qword ptr [rbx], rax
mov rax, rbx
add rsp, 8
pop rbx
pop rbp
ret
I came up with: ce
namespace stdx = std::experimental;
struct my_fake_coro_frame {
using cb = void (*)(void *);
cb resume, destroy;
task<int>::promise_type p;
int a;
std::string s;
char suspend_point_id; // offset 48, suspension point index
stdx::suspend_always initial_suspend_object; // offset 49, initial_suspend_object
stdx::suspend_always suspend_point_1_object; // offset 50, suspend_point_1_object
stdx::suspend_always suspend_point_2_object; // offset 51, suspend_point_1_object
stdx::suspend_always final_suspend_object; // offset 52, final_suspend_object
char _53; // offset 53, padding
char _54; // offset 54 ...
char _55; // offset 55 ...
};
void my_fake_coro_resume(void *c);
void my_fake_coro_destroy(void *c);
task<int> my_fake_coro(int a) {
// Allocate space on the heap for coroutine's frame -- todo: alignment?
auto b = new char[sizeof(my_fake_coro_frame)];
my_fake_coro_frame *f = (my_fake_coro_frame *)b;
// Set up the prefix internals
f->resume = my_fake_coro_resume;
f->destroy = my_fake_coro_destroy;
// Save the paramaters
f->a = std::move(a);
// Set up the locals??
// new (&f->s) std::string();
// new (&f->p) task<int>::promise_type();
// Initalize suspension point id
f->suspend_point_id = 0;
// Grab our return object
auto ro = f->p.get_return_object();
f->initial_suspend_object = f->p.initial_suspend();
if (not f->initial_suspend_object.await_ready()) {
f->initial_suspend_object.await_suspend(
stdx::coroutine_handle<task<int>::promise_type>::from_address(
(void *)f));
return ro;
}
// Pump the state machine
my_fake_coro_resume(b);
// Return in a task
return ro;
}
State machinery
void my_fake_coro_resume(void *c) {
my_fake_coro_frame *f = (my_fake_coro_frame *)c;
fmt::print("RESUME FAKE {} suspend point {}\n", (uintptr_t)c, (int)f->suspend_point_id);
try {
auto &s = f->s;
switch (f->suspend_point_id) {
case 0:
f->initial_suspend_object.await_resume();
s = "In";
f->suspend_point_id = 1;
{
f->suspend_point_1_object = stdx::suspend_always{};
if (not f->suspend_point_1_object.await_ready()) {
f->suspend_point_1_object.await_suspend(
stdx::coroutine_handle<task<int>::promise_type>::from_address(c));
return;
}
}
[[fallthrough]];
case 1:
f->suspend_point_1_object.await_resume();
fmt::print(s);
s = " coroutine!\n";
f->suspend_point_id = 2;
{
f->suspend_point_2_object = stdx::suspend_always{};
if (not f->suspend_point_2_object.await_ready()) {
f->suspend_point_2_object.await_suspend(
stdx::coroutine_handle<task<int>::promise_type>::from_address(c));
return;
}
}
[[fallthrough]];
case 2:
f->suspend_point_2_object.await_resume();
fmt::print(s);
f->p.return_value(f->a);
f->suspend_point_id = 3;
{
f->final_suspend_object = stdx::suspend_always{};
if (not f->final_suspend_object.await_ready()) {
f->final_suspend_object.await_suspend(
stdx::coroutine_handle<task<int>::promise_type>::from_address(c));
return;
}
}
[[fallthrough]];
case 3:
f->final_suspend_object.await_resume();
f->resume = nullptr;
break;
}
} catch (...) {
f->p.unhandled_exception();
}
}
void my_fake_coro_destroy(void *c) {
my_fake_coro_frame *f = (my_fake_coro_frame *)c;
delete f;
fmt::print("DESTROY FAKE {}\n", (uintptr_t)c);
}
As you can see, the compiler does a ton of work for us! Take a second to go back and look at how succinct the input was.
After going through that exercise, I highly recommend doing the same. While what I did is not exactly what clang does, it really hammered home what need to happen for execution of a coroutine.
Notes and Nits
On optimizations
In the above you can see that we are always allocating the frame, though, we did pack the promise and the coroutine frame together. The compiler is able to this for us as well. The compiler, further, can eliminate even that allocation in some cases where it can reason about the lifetime of the coroutine.
For example, in Gor Nishanov’s 2015 talk, he shows an example of a disappearing coroutine.
Aside: When I port that forward to current clang-trunk the disappearing coroutine example still works great with clang at
-O2
, but most of the production code I ship is compiled with-Oz
(or-Os
). I could not figure out exactly why it will not optmize away with either of those flags. Though, I suspect something to do with inlining thresholds, adding the__attribute__((always_inline))
with-Os
allows it to optimize away, but nothing I did would make-Oz
do so. 🤷🏽‍♀️ If anyone reads this and can figure out why, send me a note, I’d love to understand.
On lifetimes
The most common issue I see posted about at work has to do with lifetime of captured variables. Can be because of references (more obvious in the code, or even something like iterators) being passed to a coroutine. It is possible to build clang-tidy linters to catch some of these cases, and as people start getting used to coroutines I suspect this will become less of an issue. Using a structured concurrency model would also likely alleviate some lifetime issues.
For now, however, be sure to keep an eye out for lifetime issues while using coroutines.
template <typename It>
coro::Task<void> process(It begin, It end);
coro::Task<void> process(std::vector<T> values) {
return process(values.begin(), values.end());
}
co_await process({...}) // BOOM
On RAII
Major limitation: Currently we cannot wrap coroutines in RAII because a destructor cannot be a coroutine! See proposal to add co_using
keyword for this.
On await_transform
I found await_transform
really difficult to understand. The key for me was Lewis Baker’s note, and the insight that await_transform
is called from the promise of the currently executing coroutine not the coroutine co_awaited on.
Resources
Sy Brand’s cat explains coroutines
When I started this journey, I was struggling to find good writings (but plenty of great talks/videos) on coroutines, but perhaps my google-foo was poor because now I am finding more and more good stuff! Below is a sampling of some of the resources I used to understand coroutines.
- Talks
- Playlist of talks on coroutines
- CppCon 2019: Eric Niebler, David Hollman “A Unifying Abstraction for Async in C++” – Great summary of structured concurrency, difference between concurrency and parallelism. Intro to the executors w/ sender/receiver proposal.
- CppCon 2014: Gor Nishanov “await 2.0: Stackless Resumable Functions”
- CppCon 2015: Gor Nishanov “C++ Coroutines - a negative overhead abstraction"
- CppCon 2016: Gor Nishanov “C++ Coroutines: Under the covers"
- CppCon 2017: Gor Nishanov “Naked coroutines live (with networking)
- CppCon 2018: Gor Nishanov “Nano-coroutines to the Rescue! (Using Coroutines TS, of Course)” – hide memory latency w/ coroutines, have your mind blown 🤯.
- CppCon 2019: Lewis Baker “Structured Concurrency: Writing Safer Concurrent Code with Coroutines”
- CppCon 2019: Hartmut Kaiser “Asynchronous Programming in Modern C++”
- C++20 Coroutines: What’s next? - Dawid Pilarski - code::dive 2019
- C/C++ Dublin User Group: “Exploring C++20 Coroutines” by Justin Durkan
- CppCon 2016: James McNellis “Introduction to C++ Coroutines"
- CppCon 2017: Toby Allsopp “Coroutines: what can’t they do?”
- CppCon 2017: Anthony Williams “Concurrency, Parallelism and Coroutines”
- CppCon 2020: Rainer Grimm “40 Years Of Evolution from Functions to Coroutines”
- Writings on the interwebs
- Lewis Baker’s excellent posts on coroutines
- Dawid Pilarski 3-part series on coroutines
- Raymond Chen has a bunch of posts on coroutines at the old new thing
- Charlie Kolb’s report and slides on coroutines – found this via google, not sure if there’s a better link. Kolb did a particularly good job of drawing out state machines that made sense to me, took some ideas from these.
- Luncliff has a nice presentation and post that is quite similar to what I was trying to accomplish here (wish I discovered that sooner!). It includes some of the coroutine_frame information for both clang and msvc.
- Other writings (non-free)
- In his draft c++20 book, Rainer Grimm, has a solid chapter on coroutines. This is where I got the idea to think of the two separate state machine flows (“promise workflow” and “awaiter workflow” in the book) as well as the idea that the “coroutine” term is overloaded meaning both the factory and the object.
- Another c++20 book, Andreas Fertig. This book also has a section on c++20 coroutines that seems pretty good. Andreas used a nice state machine problem for his example, and this seems like a really good reference to read for learning how to use coroutines. I discovered it later and spent a bit less time with it than Grimm’s book.
- Standards docs
- n4860 - draft final c++20 – 17.12 Coroutines, 7.6.2.3 Await, 7.6.17 Yielding a value, 9.5.4 Coroutine definitions
- n4760 - coroutine TS
- A Unified Executors Proposal for C++ | P0443R14
- Meta
- Matt P. Dziubinski’s c++ links – found this later, but has most of the resources listed here and more!
Conclusion
In c++20 we now have the lowest level of coroutine support. With third party libraries this is usable now. There is nothing magic about what the compiler does to your coroutine code and it can be understood, but it is not trivial either.
There are many things I’d love to have included here, but this is already taken me months and “done is better than perfect” .. right? Maybe I can hit them as a follow up. One big hole I didn’t touch on is the how of using coroutines, for that there are some good examples linked from the resources section.
My main goal in this article was understanding what we got in c++20 and roughly how it works under the hood. My hope is that now when I need to debug, say, a size regression in coroutine code I can look at the generated assembly and have a rough understanding of what is going on.
If this was helpful or if you noticed any obvious mistakes, please reach out to me on twitter on and let me know.
-
I am using the term “concepts” throughout here colloquially, not as in c++20’s concepts. ↩︎
-
See revisiting coroutines paper summarized in this stackoverflow question ↩︎
-
Seriously, follow the links in the Resources section and go watch them now. Nishanov is a fantastic speaker on this subject and always entertaining. I’ll wait… ↩︎
-
See Lewis Baker’s writing on symmetric transfer ↩︎
-
There are a few “limitations” around turning a function into a coroutine. First, you cannot use a placeholder return type (this will make more sense as we get into the weeds), and the return type must be convertible to an Awaitable (keep reading). Second,
constexpr
andcosteval
functions cannot be coroutines. Finally no variadic arguments (though you can use a template parameter pack). ↩︎ -
Note:
co_await
in a range basedfor
may seem different, but is just expanded to a coupleco_await
under the hood, see n4760#subsection.9.5.4{ auto && __range = FOR_RANGE_INITILIZER; auto __begin = co_await BEGIN_EXPR; auto __end = END_EXPR; for ( ; __begin != __end; co_await ++__begin ) { FOR_RANGE_DECLARIATION = *__begin; STATMENT } }
↩︎ -
Perhaps in c++23 we will have executors with coroutine support in the standard which may provide a usable out of the box solution! ↩︎
-
Existing implementations of coroutines and useful abstractions: libunifex, folly::coro, cppcoro, etc. ↩︎
-
Lewis Baker: understanding operator co_await - “The Promise interface specifies methods for customising the behaviour of the coroutine itself. The library-writer is able to customise what happens when the coroutine is called, what happens when the coroutine returns (either by normal means or via an unhandled exception) and customise the behaviour of any
co_await
orco_yield
expression within the coroutine.” “The Awaitable interface specifies methods that control the semantics of aco_await
expression. When a value isco_await
ed, the code is translated into a series of calls to methods on the awaitable object that allow it to specify: whether to suspend the current coroutine, execute some logic after it has suspended to schedule the coroutine for later resumption, and execute some logic after the coroutine resumes to produce the result of theco_await
expression.” “A type that supports theco_await
operator is called an Awaitable type. To be more specific where required I like to use the term Normally Awaitable to describe a type that supports theco_await
operator in a coroutine context whose promise type does not have anawait_transform
member. And I like to use the term Contextually Awaitable to describe a type that only supports theco_await
operator in the context of certain types of coroutines due to the presence of anawait_transform
method in the coroutine’s promise type.” “An Awaiter type is a type that implements the three special methods that are called as part of aco_await
expression:await_ready
,await_suspend
andawait_resume
.” ↩︎ -
c++20 book, Rainer Grimm This book’s chapter on coroutines is pretty solid summary, would recommend. Haven’t yet read any other parts, and so cannot comment on them. ↩︎