Async Rust Mental Model

I recently noticed that I've been writing asynchronous code pretty frequently at work. Actually, nowadays it's much more common for me to work with async rust than just "vanilla" rust. The distinction sounds a little weird when you think about it–both are still just Rust–but the mental model for writing async code is a sort of "extension" of vanilla Rust; All the normal lifetime and borrowing rules apply, and you get plenty of complaints from the compiler if you do something wrong but from within an async context.

I've had a lot of time to think about my mental model for async Rust, and I've talked to other engineers about how they view it. I'm not typically one for analogies, but I certainly think they're extremely helpful when starting to think about a foreign concept, as long as you later leave them behind like the training wheels on your bike and understand what actually happens. Let's do that with async Rust! The approach in this post is straight-forward: we'll think of the async environment in Rust as a restaurant with clients, chefs, tables and dishes, and we'll go back and forth between the analogy and reality, exploring just a little more in depth how Rust approaches async, and how other languages like Go approach the same concept. This is meant to be more of a 50 foot view rather than a 1000-foot view, so we'll be skipping over some of the more intricate details of async Rust, but I think this is a good starting point for understanding how async Rust works under the hood.

Scheduling Strategies

Let's preface this post by giving some useful background information on some of the design decisions behind async Rust. The goal of async programming is to allow for concurrent execution of tasks, without blocking and without wasting resources. This is particularly important in environments where you have many tasks that are I/O-bound, such as web servers or network applications. We want efficient multitasking without the overhead of traditional threading models, since threads are expensive to create and manage--so how do we achieve this?

The two main strategies for scheduling tasks in async programming are cooperative multitasking and preemptive multitasking.

In cooperative multitasking, tasks voluntarily yield control of the CPU at specific points. It's like a game of tag where you can only tag someone if they agree to be tagged. Tasks decide when they're ready to pause and let other tasks run, typically at moments when they'd need to wait for something anyway (like I/O). This approach works really well at the language level where either programmers or compilers can insert yield points at logical pauses in execution.

The major advantage here is efficiency. Since tasks control when they pause, they can save exactly the state they'll need later, rather than having to save everything. This means less memory overhead per task. Even better, all tasks can actually share a single call stack! This dramatically reduces memory usage and lets you create thousands of concurrent tasks without breaking a sweat. The obvious downside? A single misbehaving task can hog the CPU indefinitely. If a task never yields, nothing else gets to run. This is why cooperative multitasking works best when all tasks are known to play by the rules, like within a single application. You wouldn't want your OS relying on arbitrary user programs to cooperate.

In contrast, preemptive multitasking gives the operating system complete control over when to switch tasks. The OS leverages interrupts to forcibly pause running tasks at any time. This means tasks can be interrupted mid-calculation without any say in the matter. When using preemption, each task needs its own separate stack and complete state backup. The OS has to save the entire CPU register state on each switch because it has no idea which registers the task was using. This creates more overhead, but it ensures that no single task can monopolize the system.

Preemptive multitasking is ideal for operating systems and untrusted code, where fairness matters more than raw efficiency. It guarantees that each task gets its fair share of CPU time without relying on cooperation.

For our purposes with async Rust, cooperative multitasking makes a lot more sense. It gives us the performance and memory benefits we need for handling thousands of concurrent connections with minimal overhead. After all, our tasks are all part of the same program, so we can trust them to yield appropriately. The language-level async/await syntax makes it easy to write code that cooperates naturally at logical pause points. The distinction between cooperative and preemptive isn't just academic—it fundamentally shapes how async Rust works under the hood and how we need to approach writing async code. We need to be mindful of including yield points (.await) in our code at appropriate intervals, especially when doing CPU-intensive work, and we'll get into specific patterns for this later in the post.

Futures: Async Computation

The most atomic unit of async programming in Rust is a future. You can make anything you want into a Future by implementing the Future trait for it, defined in the standard library as

pub trait Future {
    type Output;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}

Conceptually, a future is an async computation that may not have completed yet, or even started; it represents a value that will be available at some point in the future. It's like a schematic or a plan of execution. We can't do much with a schematic on its own, since it's useless unless we actually take the steps detailed in it. Thinking about this in the context of a kitchen at a restaurant, a future is like a recipe card. It tells you how to prepare a dish, but it doesn't actually cook anything until a chef picks it up and follows it. That chef--going back to the tokio world--is the tokio executor, or more specifically, the tokio worker thread(s). We'll talk more about the the tokio runtime later, but for now, it's enough to know that the executor is what actually runs the future and drives it to completion.

Let's look at Poll from the above definition

pub enum Poll<T> {
    Ready(T),
    Pending,
}

poll defines an interface for the executor to check if the future is ready to make progress or not. It returns Poll::Ready(value) if the future has completed, or Poll::Pending if it needs to wait for something. You as the programmer typically never need to call poll directly; you just tell the executor how to drive the future to completion and let it do its thing.

A future is a representation of a computation that can be done asynchronously. It does nothing unless awaited. A future can only be awaited from within async functions and blocks.

Executing poll should never take a long time. Think of poll as a way for the executor to quickly check if a task is ready to make progress. It's crucial for that check should be quick; either it returns Poll::Ready(Self::Output) or Poll::Pending. If you're making something in the kitchen, for example boiling water, think of it as you quickly glancing over at the pot to check if the water has reached a boiling point. You're not supposed to stay there and stare at the pot for 10 minutes, waiting for the water to boil. You just check if it's boiling, and if it is, you can move on to the next step in your recipe. If it's not, you continue doing what you're doing and check again later. We'll talk about how the executor knows when to check again later, but for now, just remember that poll should be a quick check.

An important detail to keep in mind is that your async computations get turned into state machines by the compiler. Behind the scenes, the compiler transforms your async function into an enum with different states, each representing a point in the function where it can yield control. This is how we keep track of the state of the computation and know where to pick up when the our future gets scheduled again.

When we write something like:

async fn prepare_carbonara() -> Dish {
    let pasta = boil_pasta().await; // Step 1
    let pancetta = fry_pancetta().await; // Step 2
    let sauce = mix_eggs_and_cheese().await; // Step 3
    let combined = combine(pasta, pancetta, sauce).await; // Step 4
    garnish_and_serve(combined).await // Step 5
}

Calling this function doesn't actually do any cooking. It just returns a future - basically a promise that the dish will be cooked when someone follows this recipe. The actual work happens when this future is awaited somewhere.

Just as a recipe might require certain equipment or ingredients to be ready before proceeding to the next step, futures can be composed of other futures. This brings us to an important distinction:

Leaf futures represent primitive operations, usually I/O-related, like tokio::net::TcpStream::connect("127.0.0.1:8000"). These are like fundamental cooking techniques - boiling water, frying an egg.
Non-leaf futures combine multiple steps, like when you write an async fn that awaits multiple operations. These are like full recipes that combine several techniques.

Tasks: An Order that Came In

As we've mentioned before, a future on its own doesn't do anything. It's a schematic of sorts--a set of instructions. If we want to complete these instructions, we need to create a task. An analogy that I really like to use is that futures are like classes, and tasks are like instances of those classes. When you're creating a task, you're creating an "instance" of a future, and the executor can then poll that task to completion.

The Tokio docs put it clearly:

In the Tokio execution model, futures are lazy. When a future is created, no work is performed. In order for the work defined by the future to happen, the future must be submitted to an executor. A future that is submitted to an executor is called a 'task'.

If we really try to relate this to how a restaurant would function, we can think of it as follows:

Futures are written-down recipes that tell you how to prepare a given dish.
Tasks are when an order for a given dish come in. This is what the chef actually works on (or poll) to completion.

When we write code like:

let table1 = task::spawn(serve_table(1));

We're taking the serve_table(1) recipe and assigning a chef to execute it. The spawn function creates a new task, which can be scheduled and executed independently.

The Executor: The Chef

The term "executor" should be understood as something somewhat abstract to encompass different async runtimes. In the more abstract sense, the executor goes hand-in-hand with the reactor, which we will talk about later. In the context of Tokio, the executor is the Tokio runtime itself. It manages the execution of tasks and coordinates their scheduling. We won't actually go into the details of how tokio schedules tasks, but we should note that there really are two types of executors: single-threaded and multi-threaded. By default, tokio uses a multi-threaded runtime, which is what [tokio::main] desugars to. However, you can also use a single-threaded runtime by using [tokio::main(flavor = "current_thread")].

In the multi-threaded runtime, tokio creates n worker threads (or executors), where n is the number of CPU cores available. Each worker thread can run multiple tasks concurrently. The runtime uses a work-stealing algorithm to balance the load across threads, ensuring that tasks are executed efficiently. Tokio maintains one global queue from which worker threads pick off tasks. Each worker thread, in turn, has its own local queue of tasks that it works on. If a worker thread runs out of tasks in its local queue, it can "steal" tasks from other threads' local queues. This helps to balance the load and keep all threads busy.

When the executor polls a future, it checks if the future is ready to make progress. If the future returns Poll::Ready, the executor can proceed with the next step in the computation. If it returns Poll::Pending, the executor knows that it needs to wait for something to happen before it can continue--this is where wakers come in. The executor creates a waker and passes it to the future, allowing the future to notify the executor when it's ready to be polled again.

In our restaurant, the executor is the manager who assigns work to chefs and coordinates the kitchen. The manager decides which chef works on which dish and when, ensuring that the kitchen runs efficiently.

If you recall to the note about futures earlier in the post, I mentioned that poll should never take a long time. This is because the executor is constantly polling futures, and if one of them takes too long, it can block the entire system. An easy way to block the executor is for poll to have a blocking operation. I also mentioned that if poll returns Poll::Pending, the executor continues polling other futures, and that eventually it would poll our future again. I'm now going to describe how it knows when to poll it again, and this is where wakers come in.

Again we look to the docs to get us started on the Waker.

A Waker is a handle for waking up a task by notifying its executor that it is ready to be run.

The typical life of a Waker is that it is constructed by an executor, wrapped in a Context, then passed to Future::poll(). Then, if the future chooses to return Poll::Pending, it must also store the waker somehow and call Waker::wake() when the future should be polled again.

Looking at the Future trait's poll definition, looks like we've now uncovered what that Context is for-- it's a way for the future to store the waker that the executor gave it

Going back to the kitchen analogy, when we're boiling pasta, we know how long it takes to boil, at least more or less. However, say we're particularly anal about timing, and choose to set a timer that will go off in 10 minutes. The timer is our Waker. We poll our boiling pot and see that it's nowhere near boiling yet, so we set the timer and go do something else. When the timer goes off, it wakes us up and tells us to check on the pasta again.

The Reactor: The Kitchen's Notification System (sort of)

We've established that after returning Poll::Pending, futures need to establish a way to notify the executor when they're ready to be polled again. Wakers do this. But now we can try to answer the question of "what wakes a waker?"

Async runtimes are typically divided into two parts: the executor and the reactor. We've already talked about the executor; we've established that it's what drives tasks to completion and decides who gets to run and when. The reactor, on the other hand, is responsible for handling I/O events and dispatching them to the executor. Typically, these are I/O events such as network requests, timers, or file I/O.

In tokio's case, the reactor is an event loop. From the tokio runtime docs, one of the things the runtime contains is

An I/O event loop, called the driver, which drives I/O resources and dispatches I/O events to tasks that depend on them.

Since the reactor is responsible for reacting to I/O events, it requires tight integration with non-blocking I/O primitives. It uses a system of epoll (on Linux), kqueue (on macOS) or IOCP (on Windows) to monitor file descriptors and notify the executor when they are ready for I/O operations.

On Linux systems, this is done through epoll, though since these OS primitives change based on the OS, as one would expect, we also have kqueue on macOS and IOCP on Windows. The details for how these handle non-blocking I/O are outisde the scope of this post, and is even abstracted for tokio, since tokio uses mio under the hood.

While the executor is actively polling futures, the reactor is passively waiting for I/O events to occur. It's watching for things like "is this network request complete?" or "has this timer expired?".

The reactor maintains a collection of Wakers and notifies the appropriate one when an event happens. When the reactor receives a notification that an event has occurred, it finds the corresponding waker and calls wake() on it, which tells the executor that the associated future is ready to make progress.

Importantly, the executor and reactor don't communicate directly-- they coordinate through wakers. Wakers are provided by rust's standard library, and are what closes the gap between the executor (the chef) and the reactor (the notification system). The reactor doesn't need to know about the executor's inner workings, and the executor doesn't need to know about the reactor's event loop. They just need to agree on a way to wake each other up when something happens, and that's all that wakers really are.

Using Concurrency

Now that we've covered the basics of the interface provided to us by Rust to write async code and how async runtimes such as tokio use these interfaces, we can look at how we can use tokio to write async code. There's two important concepts I want to highlight here: concurrency and parallelism. This fits in pretty well into our kitchen analogy; you can have multiple chefs working on different orders at the same time (parallelism), or you can have one chef working on multiple orders at the same time (concurrency).

In reality, parallelism is a specific type of concurrency. You can have concurrency on a single worker thread, where you interleave the execution of multiple futures in a single task, and you can have concurrency on multiple worker threads, where you can have multiple tasks running at the same time on different threads. The distinction here is intratask concurrency vs intertask concurrency.

Interleaving the execution of multiple futures in a single task is known as intratask concurrency. Spawning multiple tasks to execute each future, potentially in parallel, is known as intertask concurrency.

Intertask Concurrency: Multiple Chefs

When you want completely independent operations to happen concurrently, you use intertask concurrency by spawning multiple tasks. This is like having multiple chefs working independently on different orders:

let table1 = task::spawn(serve_table(1));
let table2 = task::spawn(serve_table(2));

Each task gets its own "thread" of execution. With a multi-threaded runtime like Tokio's default, these tasks can genuinely run in parallel across different CPU cores. Tokio does not guarantee this, however. You can spawn more tasks than you have cores, and the runtime will schedule them across the available threads as best it can.

Intratask Concurrency: One Chef, Multiple Orders

Sometimes you want concurrency within a single task. This happens when you use combinators like join to run multiple futures concurrently but within the same task context:

let dish1_future = cook_dish(&order[0], 3);
let dish2_future = cook_dish(&order[1], 4);
let (dish1, dish2) = join(dish1_future, dish2_future).await;

This is like having one chef cooking multiple dishes at once - working on one while another simmers. It's more resource-efficient since you're only using one task, but all these futures are dependent on the same task making progress. For example, while the chef is cooking pasta, he arrives at a point where he's waiting for the water to boil. He sets a timer for 10 minutes (registers a waker) and goes to chop vegetables (polls another future). When the timer goes off, he checks the pot again (polls the future again) and sees that the water is boiling. He can now add the pasta to the pot and continue cooking.

One interesting thing about this approach is that it creates what some Rust engineers call a "perfectly sized stack" - the exact amount of memory needed for all the operations is allocated upfront as part of the task's state machine. This means no dynamic allocations are needed for each concurrent operation, making it extremely efficient. This is different from Go's goroutines, which have a more complex stack management system. In Go, goroutines can grow and shrink their stacks dynamically, which adds some overhead and complexity. In Rust, the stack size is fixed at compile time, which makes it easier to reason about memory usage and performance.

Working with Multipe Futures

If you're just working with a single task, your async code might as well be synchronous. The real power of async comes from working with multiple futures concurrently. In Rust, we have two main patterns for this: join and select. We can also spawn new top-level futures, aka tasks, using tokio::spawn.

There's some useful macros that come with tokio that can help us with this, such as tokio::join! and tokio::try_join!. These macros are just syntactic sugar for the join and try_join functions, respectively. They allow you to run multiple futures concurrently and wait for all of them to complete.

Join: Wait for All Dishes

The join pattern runs multiple futures concurrently and waits for all of them to complete. It's like a waiter ensuring that all dishes for a table are ready before serving. For example, if we have a group of people at our restaurant waiting for their food, we'd ideally want to bring everything out at the same time, so that nobody is left waiting for their food while everyone else is eating.

Looking at the definition,

The join! macro takes a list of async expressions and evaluates them concurrently on the same task. Each async expression evaluates to a future and the futures from each expression are multiplexed on the current task.

Note that it says evaluates them concurrently on the same task. This means that the same chef is executing both of these (recipes) as a single task (order). We could instead also use tokio::spawn to spawn each of these futures as separate tasks, and join their handles, which would enable them to be run in parallel on separate worker threads, though again, not guaranteed. This would essentially be multiple chefs working on each order independently, and we would only proceed to serve the dishes once they're all done. The former approach is intratask concurrency, since we're multiplexing within a single task, while the latter is intertask concurrency, since we're spreading the work across multiple tasks and worker threads.

From the docs:

By running all async expressions on the current task, the expressions are able to run concurrently but not in parallel. This means all expressions are run on the same thread and if one branch blocks the thread, all other expressions will be unable to continue. If parallelism is required, spawn each async expression using tokio::spawn and pass the join handle to join!.

let (dish1, dish2) = tokio::join!(
    cook_dish("Pasta"),
    cook_dish("Pizza")
);

This says: "Cook these dishes concurrently, but don't proceed until both are done." Only when all futures complete will the code continue, with the results of all futures in a tuple.

If you want to do this with a number of futures unknown at runtime, you can use FuturesUnordered

To better understand the difference between intratask and intertask concurrency, let's visualize them:

In a practical example:

async fn intratask_example() {
    let (dish1, dish2) = tokio::join!(
        cook_dish("Pasta", 3),
        cook_dish("Salad", 2)
    );
}

// Intertask concurrency with spawn, passing the join handles of the spawned tasks to `join!`
async fn intertask_example() {
    let handle1 = tokio::spawn(cook_dish("Pasta", 3));
    let handle2 = tokio::spawn(cook_dish("Salad", 2));
    
    tokio::join!(
        handle1,
        handle2
    ).await;
}

The key differences:

join runs futures within the same task (one chef multitasking)
spawn creates separate tasks that can run independently (multiple chefs)
spawn has more overhead but allows truly parallel execution on multiple threads

Select: Whichever Comes First

While join waits for all futures, select races them and takes the first one to finish, canceling the others. We could try relating this to our restaurant analogy by thinking of someone waiting for a table to be ready when our restaurant is full. If we have three tables (we're a super small restaurant) and all are occupied, naturally we'd want to sit our guests at the first table that becomes available.

let available = tokio::select! {
    table1 = table1_future => table1,
    table2 = table2_future => table2,
    table3 = table3_future => table3,
}

sit_guests(available).await;

This is like a bartender serving whichever drink gets prepared first. As soon as one future completes, the select returns its result (along with the still-pending future, which you can continue to poll if needed).

The select pattern is ideal for:

Implementing timeouts (race an operation against a timer)
Fetching data from multiple redundant sources (use whichever responds first)
Handling user input that could come from multiple sources

As a segway into pinning, let's look at part of the documentation for tokio::select!:

Using the same future in multiple select! expressions can be done by passing a reference to the future. Doing so requires the future to be Unpin. A future can be made Unpin by either using Box::pin or stack pinning.

Here, a stream is consumed for at most 1 second.

use tokio_stream::{self as stream, StreamExt};
use tokio::time::{self, Duration};


#[tokio::main]
async fn main() {
    let mut stream = stream::iter(vec![1, 2, 3]);
    let sleep = time::sleep(Duration::from_secs(1));
    tokio::pin!(sleep);

    loop {
        tokio::select! {
            maybe_v = stream.next() => {
                if let Some(v) = maybe_v {
                    println!("got = {}", v);
                } else {
                    break;
                }
            }
            _ = &mut sleep => {
                println!("timeout");
                break;
            }
        }
    }
}

Let's talk about why pinning is necessary in some scenarios.

Pinning: Keeping Things in Place

Pinning is a critical concept in Rust's async ecosystem that solves a fundamental challenge: how to safely work with self-referential structures in memory.

When the compiler transforms an async function into a state machine, it may create internal references - pointers between different parts of that state machine. This becomes problematic because Rust normally assumes values can be freely moved in memory, which would invalidate these internal references.

The Pin<&mut T> type in the Future::poll signature provides a crucial guarantee: a pinned future won't be moved for the duration of its lifetime. This is essential for:

Preserving self-references in future state machines
Ensuring safety for types that rely on stable memory locations
Maintaining memory layout stability as part of the API contract

In the last example, we need to reuse the sleep future across multiple loop iterations. Without pinning, Rust would move the future each time, breaking any internal references. Notice how we:

Create the sleep future
Pin it with tokio::pin!(sleep)
Use a reference (&mut sleep) in the select! macro

This pattern is common when you need to repeatedly poll the same future, especially in loops with select! where you're waiting for either some data or a timeout. Essentially, when you need to pass a reference to your future or when you're maintaining a reference across .await points, you need to pin it.

There are two common ways to pin values:

// Box pinning (heap allocation)
let future = Box::pin(async_operation());

// Stack pinning (no heap allocation)
let mut future = async_operation();
tokio::pin!(future);  // Macro that creates a pinned reference

Returning to our restaurant analogy: pinning isn't just about designating a fixed workstation - it's about acknowledging that the chef has created a complex web of interdependencies. The chef has arranged ingredients in specific positions and set up tools that reference each other's locations. Moving this entire arrangement would cause these carefully established relationships to break down.

When working with low-level async code or implementing custom futures, understanding pinning becomes essential for ensuring memory safety and preventing subtle bugs that could arise from unexpected movement of self-referential data.

Don't Take Up All the Chef's Time.

We've been using tokio::time::sleep as an example of a future that represents a delay, or as a lazy way for me to say "we're doing something that takes a long time." Why not just use std::thread::sleep? Asynchronous code tends to creep its way into your codebase, which some people complain about; you mark your main function as async, and pretty much every other function you have will have to be async-- at least that's my experience. Once you decide you have a valid reason to use an async runtime, it's normal for that to happen, and you will need to opt for "async-aware" versions of "normal" functions. The difference between tokio::time::sleep and std::thread::sleep is that, as you might expect, tokio::time::sleep is a future that will do nothing unless .awaited, while std::thread::sleep will block the thread it's called on until the sleep time is up.

To illustrate this, consider the following code, where we wait for 4 futures to finish concurrently on a single-threaded runtime. We won't be using the multi-threaded runtime for this example, since even with blocking operations, Tokio is smart enough to schedule the blocking tasks on separate worker threads, though it's important to note that this is not guaranteed.

The most problematic case is the standard library's Mutex. If you hold this mutex across an .await point:

let mut data = mutex.lock().unwrap();
something_async().await;  // Other tasks can't take the lock while we wait!
*data = new_value;

You could easily deadlock if another task on the same thread tries to take the same lock while your task is waiting on the future. This is why async runtimes usually provide their own async mutex implementations, like tokio's tokio::sync::Mutex. These async mutexes are designed to work with the async runtime and allow other tasks to make progress while waiting for the lock, along with operating on a FIFO basis, guaranteeing that if multiple taskss called .lock, they will acquire the lock in the order they called it.

There are, however, cases where blocking an executor is inevitable, and not just a mistake of writing non-cooperative code. Some code is just not cooperative. Returning to our restaurant analogy, consider a step in a recipe where we have to whip cream continuously for 5 minutes, and the key for it to have the consistency we want is for us to keep stirring non-stop. Since we're stirring continuously, we have no way of including any "yield" points in this step of our recipe, so anything we have that's ready for us to get back to will have to wait until we're done stirring. This step of the recipe never yields back control to the executor--or in our case, the chef.

Luckily, tokio has a way to understand that we're dealing with blocking code; you just need to tell it nicely that what you want to do will be a blocking operation. To do this, we have two options:

Use tokio::task::spawn_blocking to spawn a blocking task on a separate, dedicated threat pool for blocking tasks.
Use tokio::task::block_in_place to run the blocking code on the current tokio worker thread, transitioning it to a blocking thread and moving the tasks that were running on it to another worker thread.

Thinking back to our restaurant kitchen analogy, suppose we have two chefs: Jeff and Nick. Jeff is working on a dish that involves many steps, but one of those steps is whipping cream for 5 minutes non-stop. Nick, however, isn't doing anything at all, but is at the ready to help out if needed.

tokio::task::spawn_blocking would be equivalent to Jeff telling Nick "Hey, I need you to take care of this cream whipping business while I go cut some vegetables." Nick would then take over the cream whipping task, while Jeff can continue working on his dish.
tokio::task::block_in_place would be equivalent to Jeff saying "Hey, I need to whip this cream for 5 minutes, so take over my station and once I'm done, I'll be the one waiting to help if needed." Nick would then take over Jeff's station, and once Jeff is done whipping the cream, he can now do what Nick was doing before, which is waiting to help out if needed.

Note that you have one Tokio worker thread per core, while the default limit set by tokio for the blocking thread pool is 512 threads. This means that using spawn_blocking will pretty much always work as a method of offloading blocking code to the blocking thread pool even if you're running a single-threaded executor (or if you're running on a low-resource environment like a 1 vCPU ECS container). However, with block_in_place, it's possible that you're transitioning your only Tokio worker thread to a blocking thread, and there's no other Tokio worker thread to run the tasks that you offloaded from it. To use block_in_place, you must be using a multi-threaded executor. However, the benefit of using block_in_place is that it doesn't require the overhead of spawning a new thread, so it's more efficient for short-lived blocking operations.

Blocking the executor is a bad idea. If you have blocking code, use spawn_blocking or block_in_place to offload it to a separate thread pool. Your code should play nice and be cooperative, otherwise you're not holding up your end of the async contract--not keeping the executor running your code for too long without yielding control back to the executor.

Go's Approach: Goroutines

To provide some contrast, let's look at how Go handles concurrency with goroutines, which are quite different from Rust's futures.

Goroutines are lightweight threads managed by the Go runtime. Unlike OS threads, which might require megabytes of stack space, goroutines start with just a few kilobytes and can grow or shrink as needed. This makes them much more resource-efficient than traditional threads.

The Go runtime includes a scheduler that multiplexes goroutines onto a smaller number of OS threads. This scheduler operates on an m:n scheduling principle, where many goroutines (m) are mapped onto a smaller number of OS threads (n).

Here's a simple example in Go:

func printMessage() {
    fmt.Println("Hello from goroutine")
}

func main() {
    go printMessage()  // Start a new goroutine
    fmt.Println("Hello from main function")
    time.Sleep(time.Second)  // Wait for the goroutine to finish
}

The go keyword starts a new goroutine, and the Go runtime takes care of scheduling it efficiently.

Key Differences from Rust's Approach

Implicit vs. Explicit Yielding: In Go, the runtime can preempt goroutines during function calls or at certain points, allowing for transparent scheduling without explicit yield points. In Rust, you must explicitly .await to yield control.
Memory Model: Go's goroutines are stackful coroutines with a dynamic stack. Rust's futures are stackless coroutines that store their state in a state machine on the heap.
Programming Model: In Go, you write code as if it were synchronous, and the runtime handles the asynchronous parts. In Rust, async functions are explicitly marked, making the distinction between synchronous and asynchronous code clear in the type system.
Error Handling: Go uses multiple return values for error handling, while Rust uses the Result type, which can be combined with async using the ? operator.

Go's approach makes concurrent programming simpler, but at the cost of some control and transparency. Rust's approach is more explicit and gives you more control, but requires more careful management of async/await points.

Wrapping Thoughts

Async Rust is built around several key abstractions working together. Futures represent potential computations that don't run until awaited. They're transformed by the compiler into state machines where each .await creates a suspension point. Tasks are futures that have been submitted to an executor, which manages their execution and decides when to poll them. When a future can't make progress, it registers a waker with the reactor, which will notify the executor when the future is ready to continue.

Rust's async environment leverages cooperative multitasking, where tasks explicitly yield control at .await points rather than being preemptively scheduled. This approach enables efficient concurrency with minimal overhead, as futures only consume memory proportional to their state. The downside, however, is that you must be careful to avoid blocking the executor with long-running operations. Rust's async model working properly depends on you honoring the cooperative multitasking "contract".

For practical usage, we generally work with higher-level abstractions: join to run multiple futures concurrently and wait for all to complete, select to race futures and take the first to finish, and spawn to create independent tasks. Again, we must be careful not to block worker threads with CPU-intensive operations or synchronous I/O, using spawn_blocking or block_in_place for such work instead.

Once you understand these fundamental concepts, writing effective async Rust becomes more intuitive, letting you focus on solving problems rather than wrestling with the machinery.