Metaduck

Asynchronous Iteration Patterns In Node.js - Part 2

This is a follow-up to my previous article “Asynchronous iteration patterns”.

The latest feedback has been great, some corrections and remarks were sent my way.

Also, some node.js hate was spread around because of what I think is the added complexity some times needed to handle asynchronous IO. It was not my intention to spread fear about handling asynchronous IO, but to gather and spread my view on how to handle them. In my view, asyhcnronous is good and here to stay, you should learn to embrace it.

But first, a correction:

Doh!

I was alerted by some readers that the serialize_timeout.js version (where we detach from the stack using a setTimeout) would not be necessary since we are already detaching from the stack by calling an asynchronous IO operation.

Right you are.

Apples and Oranges - being fair

If we are going to compare node.js IO programming with “normal” blocking programming, let’s be fair, and compare two equivalent objects.

In my opinion, comparing syntaxes for blocking IO APIs with non-blocking IO APIs is not fair, since non-blocking allows you to do much more. If you want to compare both, you have to compare the node.js solution to the following solution on the blocking world:

Since non-blocking IO allows you to put a lot of IO operations on the background, it’s only fair that we compare the same capability on the blocking world.

To have the same behaviour on the blocking world you would have to have, to start with, a thread pool. No, you can’t get away with creating a fresh new thread for each IO operation, since node.js does away with that overhead.

Then you would have to assign each IO operation to a new thread from the thread pool.

Then the main thread would block waiting for a completion signal from one of them.

Each thread would have to keep a global completion counter in order to know when all operations are done with. And yes, you have to synchronize thread access to that piece of memory.

When one thread detects that all operations are done, it has to notify the main thread that’s waiting it in order to unblock. Not simpler in my opinion.

You could also run with co-routines, continuations, fibers - however you wish to call cooperative multi-tasking -. It would be a simplification - you wouldn’t have to synchronize access to the counter - but you still would have to manage multiple contexts. Also, not simpler in my opinion.

Abstractions

Also, the article served to expose the underlying patterns and their consequences. Day-to-day programmers should not have to deal with this complexity. That’s why some abstractions have been devised.

Here is one of them:

Step

Step is Tim Caswell’s flow control library. It allows for chaining callbacks in an easy way.

For example, let’s say we have a function called async, which serves to simulate asynchronous IO, abstracting us away from the database interaction.

function async(i, callback) {
  timeout = Math.round(Math.random() * 3000);
  setTimeout(function() {
    console.log(i + ' is done');
    callback();
  }, timeout);
}

Next we need to install step using npm, just type

$ npm install step

Then, we need to, say, insert 10 elements into the database in parallel.

var Step = require('step');

var collection = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];

Step(

  function insertAll() {
    var self = this;
    collection.forEach(function(element) {
      async(element, self.parallel());
    });
  },
  function finalize(err) {
    if (err) { console.log(err);return;}
    console.log('done with no problem');
  }
);

You would simply use step with 2 functions.

On the first you would just insert all the elements in parallel. To the callback you pass this.parallel(), which is the way to create a callback in step that handles parallel requests.