Exploring coroutines

Posted on Apr 7, 2021 | ~23mins

header: photo Mount Rainier from Poo Poo Point

The following is my notes from learning about the coroutine facilities added to c++ in c++20.

Disclaimer: I have yet to implement anything for production using coroutines. I have, however, spent a lot of time watching talks and reading up on this stuff, and a bit of time with toy examples. Take what I’ve documented here with a handful of salt. My goal is to understand how coroutines work under the hood, not present best practices for using them.

Motivation

Three things collided at the end of 2020 to spark my interest in coroutines.

First, a move from explicit futures to coroutines based on folly::coro was again highlighed in the guidance at work around coding style for concurrency in c++, as it has for the last couple years.

Second, the concept of structured concurrency, well explained in Eric Niebler’s recent article, the executors proposal for c++23 and the connection between sender/receiver and coroutines was present in social media and podcasts I listen to.

Third, I tossed some c++20 coroutines into the compiler explorer, what I saw did not make intuitive sense. I wanted to know more.

tl;dr

A coroutine is a generalization of a subroutine, it retains the call/return operations and adds a suspend and a resume operation.
There are many design choices around coroutines, c++20 gave us coroutines that are: stackless, first-class, and offer asymmetric (or symmetric) transfer. “The most efficient, scalable, open ended, versatile coroutines.” – Gor Nishanov
Coroutines are not executors; there is nothing inherently parallel about using them.
Coroutines are not even implementations in the c++20 standard, they are simply definitions of concepts¹ for building higher level abstractions!
Great implementations of higher level abstractions exist already to play with: libunifex, folly::coro, cppcoro, etc.

What is a coroutine (in general)

Put simply, coroutines are a generalization of subroutines that additionally have operations to be suspended and resumed.

There are several agreed upon classifications of trade offs in coroutine designs:²

First-class / Constrained: First-class can be stored in a data structure and passed around as a parameter, constrained cannot.
Symmetric/Asymmetric control transfer: Suspend/resume are symmetric, if you suspend, you must resume to another coroutine. With asymmetric coroutines, control is always transferred back to the coroutine’s invoker.
Stackless / Stackful: Save the whole stack on suspend (stackful), or save only the frame (stackless).

Coroutines allow developers to write code that looks linear and logically is, however, the execution flow is a cooperative multitasking between the coroutines.

Natural use cases of coroutines include: Implementing lazy generators to compute results on demand. Implementing structured concurrency. Hiding latency of blocking operations. Reactive stream programming. Implementing state machines. And much much more.

What is a coroutine (c++20)

What were the design goals of c++ in adopting a language feature for coroutines?

“The most efficient, most scalable, most open/customizable coroutines of any programming language in existence.” - Gor Nishanov (source: every talk by Nishanov on the subject 2014-2019³ 🙂 ):

Scalable (to billions of coroutines)
Efficient (resume and suspend operations comparable in cost to function call overhead)
Open ended coroutine machinery allowing library designers to develop coroutine libraries exposing high-level semantics, such as generators, green threads, tasks, and more.
Usable in environments where exceptions are forbidden or not available

With the above design goals in mind, what did we get in c++20?

We got an asymmetric (but also symmetric⁴), first-class, stackless coroutine design.

A simple c++20 coroutine

How does one write a coroutine in c++?

Definition: A coroutine is any⁵ function using any of the coroutine keywords: co_await⁶, co_return, co_yield

For example, the simplest function to coroutine conversion might be:

void my_func() { return; } // a function

class task; // forward declare, will implement later..
task my_coro() { co_return; } // now a coroutine!

You will notice two things changed: 1. the return became co_return, and since we now use one of the coroutine keywords the compiler will treat this as a coroutine. 2. we no longer return void, but instead, some class called task.

Great, now let’s run it. Well, that’s the kicker, see that task, we need an implementation for it.

For c++20 several concepts, the interaction between these concepts, and many customization points are defined, but an implementation is not provided⁷. Without more work (or a third party library⁸) our coroutine cannot be used.

Before we implement task, however, lets take an detour to explore what c++20 standard did provide us.

Classes and interactions defined

promise_type
std::coroutine_handle<promise_type>
std::coroutine_traits<promise_type>
Awaitable – not defined in standard
Awaiter – not defined in standard

I am adding two concepts not in the standard: Awaitable and Awaiter. Lewis Baker uses Awaitable and Awaiter in his writings⁹ to make understanding the interactions more clear. And note: The coroutine object can also be its own Awaitable / Awaiter.

One more terminology clarification before continuing: The term “coroutine” is overloaded for both the “coroutine factory” (call operation on the coroutine) and the “coroutine object” created by that factory. Thanks to Rainer Grimm’s draft c++20 book¹⁰ for the insight, it was causing me confusion as well. Keep it in mind as you reason about coroutines.

Promise

The promise object is the primary way a coroutine’s behavior can be customized and how it communicates with its invoker.

Example of the customization points:

template<typename T>
struct promise_type {
  class Awaitable; // ...
  using coro_handle = std::experimental::coroutine_handle<promise_type>;

  // required
  auto get_return_object() { return coro_handle::from_promise(*this); }
  Awaitable initial_suspend() noexcept;
  Awaitable final_suspend() noexcept;
  
  // result handling -- one required
  void return_void();
  //void return_value(T v);
  //void yield_value(T v);

  // error handling
  void unhandled_exception();

  // memory handling
  static void* operator new (std::size_t size, const std::nothrow_t& tag) noexcept;
  static void operator delete (void* ptr);
  static task get_return_object_on_allocation_failure();
  
  // ...
  //template<typename U> Awaiter await_transform(U);
};

Awaitable

We need this concept separate from an Awaiter to enable talking about several customization points. Often the Awaitable and Awaiter will be the same object, however, it is possible to implement them as separate classes.

To go from an Awaitible to an Awaiter we would use the Awaitible’s operator co_await() -> awaiter or a free operator co_await(awaitable) -> awaiter

To add more complexity to the mix, the Awaitable can also be passed through promise.await_transform(awaitable) before the co_await operator – NOTE: this promise here would be the promise object of the currently executing coroutine that is performing a co_await operation on an Awaitable – await_transform enables what Lewis refers to as “Contextually Awaitable”, e.g. we could wait on some class that is not otherwise an Awaitable if our promise can transform it to something that is an Awaitable. Don’t worry if you don’t follow along, this piece is very confusing, but not necessary for the basics.

class awaitable {
  //awaiter operator co_await();
};

//awaiter operator co_await(awaitable);

Awaiter

The Awaiter is the object that implements the special functions for suspending and resuming a coroutine.

Again, often this can be the coroutine object itself.

From the standard: “await-suspend is the expression e.await_suspend(h), which shall be a prvalue of type void, bool, or std::experimental::coroutine_handle<Z> for some type Z.”

That legalese is saying, the return value from await_suspend has various return options used to specify where control is transferred to next: true/void -> resume caller, false -> resume of this coroutine, another coroutine_handle -> resume that coroutine (e.g. can be used to symmetrically transfer control to another coroutine; a continuation).

template<typename T>
struct awaiter
{
  bool await_ready() noexcept;

  void await_suspend(std::experimental::coroutine_handle<>) noexcept;
  //bool await_suspend(std::experimental::coroutine_handle<>) noexcept;
  //stdx::coroutine_handle<Z> await_suspend(std::experimental::coroutine_handle<>) noexcept;

  void await_resume() noexcept;
  //T await_resume() noexcept;
};

coroutine_handle

Type erased coroutine handle. Interface to talk to a coroutine, also handles marshaling to a void * and back.

template<> struct coroutine_handle < void > {
  coroutine_handle() noexcept = default;
  coroutine_handle(nullptr_t) noexcept;
  coroutine_handle & operator=(nullptr_t) noexcept;
  explicit operator bool ( ) const noexcept;
  static coroutine_handle from_address (void * a) noexcept;
  void * to_address() const noexcept;
  void operator()() const;
  void resume() const;
  void destroy();
  bool done() const;
};

template < typename T > 
struct coroutine_handle : coroutine_handle < void > 
{
  Type erased coroutine handle for promise type T
  T & promise();
  static coroutine_handle from_promise( T &) noexcept;
};

coroutine_traits

Provides an alternative way to look up the promise_type from the return type of a coroutine. For instance, if you cannot modify the type e.g. to make std::future a coroutine compatible type.

template<class, class...>
struct coroutine_traits {};
 
template<class R, class... Args>
    requires requires { typename R::promise_type }
struct coroutine_traits<R, Args...> {
    using promise_type = R::promise_type;
};

Minimal coroutine object

From those interactions, we now have the context to dive in and implement our task<int> object required to get our example running.

We will make it “lazy” by returning stdx::suspend_always from our initial suspend.

// forward decl some promise types
template<typename U> struct promise_type_impl_base;
template<typename U> struct promise_type_impl;

template<typename T>
class task {
public:
  using promise_type = promise_type_impl<T>;
  using handle_type = std::experimental::coroutine_handle<promise_type>;

  task(handle_type handle) : handle_(handle) {  }
  task(task&) = delete; task(task&&) = delete; // disable copy/move
  bool await_ready() { return handle_.done(); }
  bool await_resume() {
    if (!handle_.done())
      handle_.resume();
    return !handle_.done();
  }
  template<typename PROMISE>
  void await_suspend(std::experimental::coroutine_handle<PROMISE> coroutine) {}
  ~task() { handle_.destroy(); }

private:
  handle_type handle_;
};

template<typename U>
struct promise_type_impl_base {
  auto get_return_object() { return task<U>::handle_type::from_promise(*static_cast<typename task<U>::promise_type*>(this));}

  auto initial_suspend() noexcept { return std::experimental::suspend_always{}; }
  auto final_suspend() noexcept { return std::experimental::suspend_always{}; }
  void unhandled_exception() { std::terminate(); };

#if 0
  static void* operator new (std::size_t size, const std::nothrow_t& tag) noexcept {
	void * rv = malloc(size)/*nullptr*/; // set nullptr to hit get_return_object_on_allocation_failure 
	fmt::print("new {} at {}\n", size, rv);
	return rv;
  }
  static void operator delete (void* ptr) { fmt::print("delete {}\n", ptr); free(ptr); }
  static task<U> get_return_object_on_allocation_failure() { throw std::bad_alloc(); };
#endif
};

template<typename U>
struct promise_type_impl : public promise_type_impl_base<U> {    
  void return_value(U v) { _v = v; }
  U& result() { return _v; }
  // void yield_value(U v) { _v = v; }
  U _v;
};

template<>
struct promise_type_impl<void> : public promise_type_impl_base<void> {
  void return_void() {}
};

That’s it! Now we can compile and execute our coroutine example.

Transformations

Finally with a working coroutine, we can look into what the compiler is doing to our code.

Await Flow

Lets map out the flow through an Awaiter when we do co_await awaitable

graph TD
  Caller[caller]
  Start[co_await awaitable]
  Await_Ready{awaiter.await_ready}
  Await_Suspend{awaiter.await_suspend}
  Await_Resume[awaiter.await_resume]
  Promise_await_transform["awaiter =
operator co_await( promise.await_transform( awaitable ) )"]
%%  Await_operator_co_await[awaiter = operator co_await(awaitable)]
  Running[Run to next suspension point ...]
  Coro_Resume[coro_handle.resume]
  %% Return_To_Caller[return to caller]
  Another_Coro_Resume["another_coro_handle<P>.resume"]
  %% Promise_Return_X[promise.return_value / promise.return_void]

  %% style Promise_Return_X color:#fff,fill:#bbf
  style Caller fill:#79A2A1,color:#fff
  style Running fill:#79A2A1,color:#fff
%%  style Coro_Resume fill:#A2798F,color:#fff
%%  style Another_Coro_Resume fill:#A2798F,color:#fff
  style Await_Ready fill:#A2798F,color:#fff
  style Await_Suspend fill:#A2798F,color:#fff
  style Await_Resume fill:#A2798F,color:#fff
  style Start fill:#A2798F,color:#fff
  style Promise_await_transform fill:#798fa2,color:#fff

  Caller --> Start
  Start ---> Promise_await_transform
  Promise_await_transform --> Await_Ready
  Await_Ready -- false --> Await_Suspend
  Await_Suspend -- true / void --> Caller
  %% Return_To_Caller --> Caller
  Await_Suspend -- "another_coro_handle<P>" --> Another_Coro_Resume
  %% Another_Coro_Resume --> Await_Ready
  Await_Suspend -- false ---> Await_Resume
  Await_Ready -- true ---> Await_Resume
  Coro_Resume --> Await_Ready

  Await_Resume --> Running
  Running --> Start

  %% Running -- co_return --> Promise_Return_X

Promise Flow

In context of the promise workflow

graph TD
  Caller[caller]
  Coro_Factory[factory]
  Coro_Factory_new[promise_type::new sizeof_frame]
  Promise_get_return_object[promise.get_return_object]
  Promise_initial_suspend[co_await promise.initial_suspend]
  Promise_return_X[promise.return_value
promise.return_void]
  Promise_final_suspend[co_await promise.final_suspend]
  Promise_unhandled_exception[promise.unhandled_exception]
  AwaitableFlow_initial((awaitable flow))
  AwaitableFlow_final((awaitable flow))
  AwaitableFlow_susp((awaitable flow))
  Running((coroutine's
code
running))
%%  Coro_Destroy[coro_handle.destroy]
  Promise_get_return_object_on_alloc_failure[promise_type::
get_return_object_on_allocation_failure]
  
  
  
  style Caller fill:#79A2A1,color:#fff
  style Promise_get_return_object fill:#798fa2,color:#fff
  style Promise_return_X fill:#798fa2,color:#fff
  style Promise_get_return_object_on_alloc_failure fill:#798fa2,color:#000
  style Coro_Factory_new fill:#798fa2,color:#000
  style Promise_initial_suspend fill:#798fa2,color:#fff
  style Promise_unhandled_exception fill:#798fa2,color:#fff
  style Promise_final_suspend fill:#798fa2,color:#fff
  style AwaitableFlow_initial fill:#A2798F,color:#fff
  style AwaitableFlow_final fill:#A2798F,color:#fff
  style AwaitableFlow_susp fill:#A2798F,color:#fff


  Caller --> Coro_Factory
  Coro_Factory --> Coro_Factory_new
  Coro_Factory_new -- nullptr ---> Promise_get_return_object_on_alloc_failure
  Coro_Factory_new --> Promise_get_return_object
  Promise_get_return_object --> Promise_initial_suspend
  Promise_initial_suspend --> AwaitableFlow_initial
  AwaitableFlow_initial -- resume --> Running
  Running --> Promise_return_X
  Running --> Promise_unhandled_exception
  Running -- suspension point --> AwaitableFlow_susp
  AwaitableFlow_susp --> Running
%%  Promise_unhandled_exception --> Running
  Promise_unhandled_exception --> Promise_final_suspend
%%  Promise_return_X --> Running
  Promise_return_X --> Promise_final_suspend
  Running --> Promise_final_suspend
  Promise_final_suspend --> AwaitableFlow_final

Promise transformation

With those two flow charts in mind, how might a compiler transform our code from the coroutine that we wrote to something executing those state machines?

The first transformation that must take place is defined in the coroutine TS (n4760#subsection.11.4.4):

// This coroutine
task<void> my_coro()
{
  F;
  co_return;
}

// becomes:
{
  using P = task<void>::promise_type;
  P p (promise-constructor-arguments);
  co_await p.initial_suspend(); // initial suspend point
  try { F; } catch(...) { p.unhandled_exception(); }
final_suspend:
  co_await p.final_suspend(); // final suspend point
}

This covers the interactions from within the coroutine with the promise object. As you can see it has inserted the two implicit suspend points (initial and final), and added the unhanded exception connection to the promise object. Not shown is the interaction with the static operator new/operator delete/get_return_object_on_allocation_failure.

Note: For all of these examples, I will assume the lazy task<> defined as a minimal implementation in the Minimal coroutine object section.

Here’s what that might look like on a sample coroutine:

// Input
task<int> my_coro(int a) {
                                // implicit initial suspend 
  std::string s = "In";
  co_await stdx::suspend_always{};    // suspend point 1
  fmt::print(s);
  s = " coroutine!\n";
  co_await stdx::suspend_always{};    // suspend point 2
  fmt::print(s);
  co_return a;
                                // implicit final suspend 
}


// After Transformation
task<int> my_coro(int a) {
  using P = task<int>::promise_type;
  P p;

  co_await p.initial_suspend(); // initial suspend point

  try {

    std::string s = "In";
    co_await stdx::suspend_always{};    // suspend point 1
    fmt::print(s);
    s = " coroutine!\n";
    co_await stdx::suspend_always{};    // suspend point 2
    fmt::print(s);
    co_return a;

  } catch(...) { p.unhandled_exception(); }

final_suspend:
  co_await p.final_suspend(); // final suspend point
}

co_return / co_yield transformation

The two trivial keyword transforms are for co_return and co_await, both just talk to the promise.

co_return transform

// ...
	co_return a;
// ...

// ...
	promise->return_value(a);
	goto final_suspend;
// ...

co_yield transform

// ...
	co_yeild ++a;
// ...

// ...
	promise->yeild_value(++a);
// ...

Await transformation

This one is a tad harder to express.

Lets start with the transform described by Marcin Grzebieluch in his codedive::2019 talk¹¹. Conceptually this transform is easy to reason about, however, it is not particularly close to what clang actually does. If you’ve heard someone say “coroutines chop up your function into callbacks,” this is probably the mental modal of the transformation they have in mind.

// Input: After Promise Transform (above)
task<int> my_coro(int a) {
  using P = task::promise_type;
  P p (promise-constructor-arguments);
  
  co_await p.initial_suspend(); // initial suspend point

  try {
  
    std::string s = "In";
    co_await suspend_always{};    // suspend point 1
    fmt::print(s);
    s = " coroutine!\n";
    co_await suspend_always{};    // suspend point 2
    fmt::print(s);
    co_return a;
	  
  } catch(...) { p.unhandled_exception(); }

final_suspend:
  co_await p.final_suspend(); // final suspend point
} 


// Transformed code
template<typename promise>
struct my_coro_frame {
  my_coro_fame(int a) : _a(a), s(), current_suspend_point_(0) {}
  void resume_from_suspension_point_initial(){
  	s = "In";
  }
  void resume_from_suspension_point_1(){
    fmt::print(s);
	s = " coroutine!\n"
  }
  void resume_from_suspension_point_2(){
    fmt::print(s);
	promise_->return_value(a);
  }
  void resume_from_suspension_point_final(){
  }

  // Capture arugments
  int a;

  // Capture locals
  std::string s;
  promise_type* promise_;
  
  stdx::suspend_always suspension_point_initial_object;
  stdx::suspend_always suspension_point_1_object;
  stdx::suspend_always suspension_point_2_object;
  stdx::suspend_always suspension_point_final_object;
  
  // Not mentioned in Marcin's talk, but you need somewhere to track
  // the suspension point and a dispatch funtion to resume to the 
  // right place...
  int current_suspend_point_;
  void resume_state_machine_() {
    try {
      switch(current_suspend_point_) {
		// ... code goes here ...
	} catch(...) { promise_->unhandled_exception(); }
  }
};

task<int> my_coro(int a) {
// TODO:
	// create a new frame (use task<int>::operator new if available)
	// Initialize the promise in the frame,
	//  move the passed args into the frame object,
	//  same for locals, ...
	// Run the state machine
	// Return the task
}

The problem with that transformation, however, is if you throw our example coroutine into compiler explorer the assembly just doesn’t match up. Additionally, given that we left so much out by omitting the state machine, lets look at another version of this transformation.

Andreas Fertig has a post showing some options for wiring up cppinsights.io for coroutine support. Since as of February 2021 this is not online, we will manually transform our example.

There are also a few pieces in luncliff’s post that I want to incorporate as well (the coroutine_handle / frame_prefix bits).

Await Transform (detailed)

Factory

Lets start with the factory function, can we figure out what this is doing? Note, this is the -O2 optimized version, you should also look at the -O1 to see more steps that get optimized away. ce

test_coro(int):                     # @test_coro(int)
        push    rbp
        push    rbx
        push    rax
        mov     ebp, esi
        mov     rbx, rdi

		## Allocate 56 bytes on the heap for our coroutnie frame
        mov     edi, 56
        call    operator new(unsigned long)
		
		## First are two compiler reserved pointers for resume / destroy
        mov     qword ptr [rax], offset test_coro(int) [clone .resume]
        mov     qword ptr [rax + 8], offset test_coro(int) [clone .destroy]
		
		## Next is the promise
		
		## First the passed argument
        mov     dword ptr [rax + 20], ebp
		
		## And zero out (what later looks to be the suspension point index)
        mov     byte ptr [rax + 48], 0
        mov     qword ptr [rbx], rax
        mov     rax, rbx
        add     rsp, 8
        pop     rbx
        pop     rbp
        ret

I came up with: ce

namespace stdx = std::experimental;

struct my_fake_coro_frame {
  using cb = void (*)(void *);
  cb resume, destroy;
  task<int>::promise_type p;
  int a;
  std::string s;
  char suspend_point_id;       // offset 48, suspension point index
  stdx::suspend_always initial_suspend_object; // offset 49, initial_suspend_object
  stdx::suspend_always suspend_point_1_object; // offset 50, suspend_point_1_object
  stdx::suspend_always suspend_point_2_object; // offset 51, suspend_point_1_object
  stdx::suspend_always final_suspend_object;   // offset 52, final_suspend_object
  char _53;                     // offset 53, padding
  char _54;                     // offset 54 ...
  char _55;                     // offset 55 ...
};
void my_fake_coro_resume(void *c);
void my_fake_coro_destroy(void *c);

task<int> my_fake_coro(int a) {
  // Allocate space on the heap for coroutine's frame -- todo: alignment?
  auto b = new char[sizeof(my_fake_coro_frame)];
  my_fake_coro_frame *f = (my_fake_coro_frame *)b;

  // Set up the prefix internals
  f->resume = my_fake_coro_resume;
  f->destroy = my_fake_coro_destroy;

  // Save the paramaters
  f->a = std::move(a);

  // Set up the locals??
  // new (&f->s) std::string();
  // new (&f->p) task<int>::promise_type();

  // Initalize suspension point id
  f->suspend_point_id = 0;

  // Grab our return object
  auto ro = f->p.get_return_object();

  f->initial_suspend_object = f->p.initial_suspend();
  if (not f->initial_suspend_object.await_ready()) {
    f->initial_suspend_object.await_suspend(
        stdx::coroutine_handle<task<int>::promise_type>::from_address(
            (void *)f));
    return ro;
  }

  // Pump the state machine
  my_fake_coro_resume(b);

  // Return in a task
  return ro;
}

State machinery

void my_fake_coro_resume(void *c) {
  my_fake_coro_frame *f = (my_fake_coro_frame *)c;
  fmt::print("RESUME FAKE {} suspend point {}\n", (uintptr_t)c, (int)f->suspend_point_id);
  try {
    auto &s = f->s;

    switch (f->suspend_point_id) {
    case 0:
      f->initial_suspend_object.await_resume();

      s = "In";

      f->suspend_point_id = 1;
      {
        f->suspend_point_1_object = stdx::suspend_always{};
        if (not f->suspend_point_1_object.await_ready()) {
          f->suspend_point_1_object.await_suspend(
              stdx::coroutine_handle<task<int>::promise_type>::from_address(c));
          return;
        }
      }
      [[fallthrough]];
    case 1:
      f->suspend_point_1_object.await_resume();

      fmt::print(s);
      s = " coroutine!\n";

      f->suspend_point_id = 2;
      {
        f->suspend_point_2_object = stdx::suspend_always{};
        if (not f->suspend_point_2_object.await_ready()) {
          f->suspend_point_2_object.await_suspend(
              stdx::coroutine_handle<task<int>::promise_type>::from_address(c));
          return;
        }
      }
      [[fallthrough]];
    case 2:
      f->suspend_point_2_object.await_resume();
      
      fmt::print(s);
      f->p.return_value(f->a);

      f->suspend_point_id = 3;
      {
        f->final_suspend_object = stdx::suspend_always{};
        if (not f->final_suspend_object.await_ready()) {
          f->final_suspend_object.await_suspend(
              stdx::coroutine_handle<task<int>::promise_type>::from_address(c));
          return;
        }
      }
      [[fallthrough]];
    case 3:
      f->final_suspend_object.await_resume();
      f->resume = nullptr;
      break;
    }
  } catch (...) {
    f->p.unhandled_exception();
  }
}

void my_fake_coro_destroy(void *c) {
  my_fake_coro_frame *f = (my_fake_coro_frame *)c;
  delete f;

  fmt::print("DESTROY FAKE {}\n", (uintptr_t)c);
}

As you can see, the compiler does a ton of work for us! Take a second to go back and look at how succinct the input was.

After going through that exercise, I highly recommend doing the same. While what I did is not exactly what clang does, it really hammered home what need to happen for execution of a coroutine.

Notes and Nits

On optimizations

In the above you can see that we are always allocating the frame, though, we did pack the promise and the coroutine frame together. The compiler is able to this for us as well. The compiler, further, can eliminate even that allocation in some cases where it can reason about the lifetime of the coroutine.

For example, in Gor Nishanov’s 2015 talk, he shows an example of a disappearing coroutine.

Aside: When I port that forward to current clang-trunk the disappearing coroutine example still works great with clang at -O2, but most of the production code I ship is compiled with -Oz (or -Os). I could not figure out exactly why it will not optmize away with either of those flags. Though, I suspect something to do with inlining thresholds, adding the __attribute__((always_inline)) with -Os allows it to optimize away, but nothing I did would make -Oz do so. 🤷🏽‍♀️ If anyone reads this and can figure out why, send me a note, I’d love to understand.

On lifetimes

The most common issue I see posted about at work has to do with lifetime of captured variables. Can be because of references (more obvious in the code, or even something like iterators) being passed to a coroutine. It is possible to build clang-tidy linters to catch some of these cases, and as people start getting used to coroutines I suspect this will become less of an issue. Using a structured concurrency model would also likely alleviate some lifetime issues.

For now, however, be sure to keep an eye out for lifetime issues while using coroutines.

template <typename It>
coro::Task<void> process(It begin, It end);

coro::Task<void> process(std::vector<T> values) {
  return process(values.begin(), values.end());
}

co_await process({...}) // BOOM

On RAII

Major limitation: Currently we cannot wrap coroutines in RAII because a destructor cannot be a coroutine! See proposal to add co_using keyword for this.

On await_transform

I found await_transform really difficult to understand. The key for me was Lewis Baker’s note, and the insight that await_transform is called from the promise of the currently executing coroutine not the coroutine co_awaited on.

Resources

Sy Brand’s cat explains coroutines

When I started this journey, I was struggling to find good writings (but plenty of great talks/videos) on coroutines, but perhaps my google-foo was poor because now I am finding more and more good stuff! Below is a sampling of some of the resources I used to understand coroutines.

Talks
- Playlist of talks on coroutines
- CppCon 2019: Eric Niebler, David Hollman “A Unifying Abstraction for Async in C++” – Great summary of structured concurrency, difference between concurrency and parallelism. Intro to the executors w/ sender/receiver proposal.
- CppCon 2014: Gor Nishanov “await 2.0: Stackless Resumable Functions”
- CppCon 2015: Gor Nishanov “C++ Coroutines - a negative overhead abstraction"
- CppCon 2016: Gor Nishanov “C++ Coroutines: Under the covers"
- CppCon 2017: Gor Nishanov “Naked coroutines live (with networking)
- CppCon 2018: Gor Nishanov “Nano-coroutines to the Rescue! (Using Coroutines TS, of Course)” – hide memory latency w/ coroutines, have your mind blown 🤯.
- CppCon 2019: Lewis Baker “Structured Concurrency: Writing Safer Concurrent Code with Coroutines”
- CppCon 2019: Hartmut Kaiser “Asynchronous Programming in Modern C++”
- C++20 Coroutines: What’s next? - Dawid Pilarski - code::dive 2019
- C/C++ Dublin User Group: “Exploring C++20 Coroutines” by Justin Durkan
- CppCon 2016: James McNellis “Introduction to C++ Coroutines"
- CppCon 2017: Toby Allsopp “Coroutines: what can’t they do?”
- CppCon 2017: Anthony Williams “Concurrency, Parallelism and Coroutines”
- CppCon 2020: Rainer Grimm “40 Years Of Evolution from Functions to Coroutines”
Writings on the interwebs
- Lewis Baker’s excellent posts on coroutines
- Dawid Pilarski 3-part series on coroutines
- Raymond Chen has a bunch of posts on coroutines at the old new thing
- Charlie Kolb’s report and slides on coroutines – found this via google, not sure if there’s a better link. Kolb did a particularly good job of drawing out state machines that made sense to me, took some ideas from these.
- Luncliff has a nice presentation and post that is quite similar to what I was trying to accomplish here (wish I discovered that sooner!). It includes some of the coroutine_frame information for both clang and msvc.
Other writings (non-free)
- In his draft c++20 book, Rainer Grimm, has a solid chapter on coroutines. This is where I got the idea to think of the two separate state machine flows (“promise workflow” and “awaiter workflow” in the book) as well as the idea that the “coroutine” term is overloaded meaning both the factory and the object.
- Another c++20 book, Andreas Fertig. This book also has a section on c++20 coroutines that seems pretty good. Andreas used a nice state machine problem for his example, and this seems like a really good reference to read for learning how to use coroutines. I discovered it later and spent a bit less time with it than Grimm’s book.
Standards docs
- n4860 - draft final c++20 – 17.12 Coroutines, 7.6.2.3 Await, 7.6.17 Yielding a value, 9.5.4 Coroutine definitions
- n4760 - coroutine TS
- A Unified Executors Proposal for C++ | P0443R14
Meta
- Matt P. Dziubinski’s c++ links – found this later, but has most of the resources listed here and more!

Conclusion

In c++20 we now have the lowest level of coroutine support. With third party libraries this is usable now. There is nothing magic about what the compiler does to your coroutine code and it can be understood, but it is not trivial either.

There are many things I’d love to have included here, but this is already taken me months and “done is better than perfect” .. right? Maybe I can hit them as a follow up. One big hole I didn’t touch on is the how of using coroutines, for that there are some good examples linked from the resources section.

My main goal in this article was understanding what we got in c++20 and roughly how it works under the hood. My hope is that now when I need to debug, say, a size regression in coroutine code I can look at the generated assembly and have a rough understanding of what is going on.

If this was helpful or if you noticed any obvious mistakes, please reach out to me on twitter on and let me know.

I am using the term “concepts” throughout here colloquially, not as in c++20’s concepts. ↩︎
See revisiting coroutines paper summarized in this stackoverflow question ↩︎
Seriously, follow the links in the Resources section and go watch them now. Nishanov is a fantastic speaker on this subject and always entertaining. I’ll wait… ↩︎
See Lewis Baker’s writing on symmetric transfer ↩︎
There are a few “limitations” around turning a function into a coroutine. First, you cannot use a placeholder return type (this will make more sense as we get into the weeds), and the return type must be convertible to an Awaitable (keep reading). Second, constexpr and costeval functions cannot be coroutines. Finally no variadic arguments (though you can use a template parameter pack). ↩︎
Note: co_await in a range based for may seem different, but is just expanded to a couple co_await under the hood, see n4760#subsection.9.5.4 { auto && __range = FOR_RANGE_INITILIZER; auto __begin = co_await BEGIN_EXPR; auto __end = END_EXPR; for ( ; __begin != __end; co_await ++__begin ) { FOR_RANGE_DECLARIATION = *__begin; STATMENT } } ↩︎
Perhaps in c++23 we will have executors with coroutine support in the standard which may provide a usable out of the box solution! ↩︎
Existing implementations of coroutines and useful abstractions: libunifex, folly::coro, cppcoro, etc. ↩︎
Lewis Baker: understanding operator co_await - “The Promise interface specifies methods for customising the behaviour of the coroutine itself. The library-writer is able to customise what happens when the coroutine is called, what happens when the coroutine returns (either by normal means or via an unhandled exception) and customise the behaviour of any co_await or co_yield expression within the coroutine.” “The Awaitable interface specifies methods that control the semantics of a co_await expression. When a value is co_awaited, the code is translated into a series of calls to methods on the awaitable object that allow it to specify: whether to suspend the current coroutine, execute some logic after it has suspended to schedule the coroutine for later resumption, and execute some logic after the coroutine resumes to produce the result of the co_await expression.” “A type that supports the co_await operator is called an Awaitable type. To be more specific where required I like to use the term Normally Awaitable to describe a type that supports the co_await operator in a coroutine context whose promise type does not have an await_transform member. And I like to use the term Contextually Awaitable to describe a type that only supports the co_await operator in the context of certain types of coroutines due to the presence of an await_transform method in the coroutine’s promise type.” “An Awaiter type is a type that implements the three special methods that are called as part of a co_await expression: await_ready, await_suspend and await_resume.” ↩︎
c++20 book, Rainer Grimm This book’s chapter on coroutines is pretty solid summary, would recommend. Haven’t yet read any other parts, and so cannot comment on them. ↩︎
Marcin Grzebieluch talk: Slides video ↩︎