Today, a young man on acid realized that all matter is merely energy condensed to a slow vibration - that we are all one consciousness experiencing itself subjectively. There’s no such thing as death, life is only a dream, and we’re the imagination of ourselves. Here’s Tom with the weather
- Bill Hicks
This is a follow-up to my previous article “Asynchronous iteration patterns”.
The latest feedback has been great, some corrections and remarks were sent my way.
Also, some node.js hate was spread around because of what I think is the added complexity some times needed to handle asynchronous IO. It was not my intention to spread fear about handling asynchronous IO, but to gather and spread my view on how to handle them. In my view, asyhcnronous is good and here to stay, you should learn to embrace it.
But first, a correction:
I was alerted by some readers that the serialize_timeout.js version (where we detach from the stack using a setTimeout) would not be necessary since we are already detaching from the stack by calling an asynchronous IO operation.
Right you are.
If we are going to compare node.js IO programming with “normal” blocking programming, let’s be fair, and compare two equivalent objects.
In my opinion, comparing syntaxes for blocking IO APIs with non-blocking IO APIs is not fair, since non-blocking allows you to do much more. If you want to compare both, you have to compare the node.js solution to the following solution on the blocking world:
Since non-blocking IO allows you to put a lot of IO operations on the background, it’s only fair that we compare the same capability on the blocking world.
To have the same behaviour on the blocking world you would have to have, to start with, a thread pool. No, you can’t get away with creating a fresh new thread for each IO operation, since node.js does away with that overhead.
Then you would have to assign each IO operation to a new thread from the thread pool.
Then the main thread would block waiting for a completion signal from one of them.
Each thread would have to keep a global completion counter in order to know when all operations are done with. And yes, you have to synchronize thread access to that piece of memory.
When one thread detects that all operations are done, it has to notify the main thread that’s waiting it in order to unblock.
Not simpler in my opinion.
You could also run with co-routines, continuations, fibers - however you wish to call cooperative multi-tasking -. It would be a simplification - you wouldn’t have to synchronize access to the counter - but you still would have to manage multiple contexts. Also, not simpler in my opinion.
Abstractions
Also, the article served to expose the underlying patterns and their consequences. Day-to-day programmers should not have to deal with this complexity. That’s why some abstractions have been devised.
Here is one of them:
Step is Tim Caswell’s flow control library. It allows for chaining callbacks in an easy way.
For example, let’s say we have a function called async, which serves to simulate asynchronous IO, abstracting us away from the database interaction.
Next we need to install step using npm, just type
$ npm install step
Then, we need to, say, insert 10 elements into the database in parallel.
You would simply use step with 2 functions.
On the first you would just insert all the elements in parallel. To the callback you pass this.parallel(), which is the way to create a callback in step that handles parallel requests.
Source: nodenerdGitHub Commit Monitor currently uses the default, in-memory session provider, which will mostly work for development, but has some downsides:
- When the app restarts or is updated all of the session data is lost
- If the session data is large or there are a large number of sessions, the amount of…
Para lá de Teerão
Caetano Veloso - “Qualquer Coisa” with Los Super Seven (via brontis)
Source: youtube.com
Some patterns are hard to grasp, specially when programming asynchronously like you have to when you’re doing IO on node.js.
For example, let’s suppose you had to program the following routine:
Insert a collection of objects on the database and then, when finished, call a callback.
So, if you had to write this in a synchronous fashion you would do something like this:
So, since we are using node.js, db.insert is most probably asynchronous. We have to turn this into an asynchronous function.
I have seen some obviously wrong implentations like this one:
The problem with this one is obvious: callback is called right after launching all the db.inserts on the background, not leaving them a chance to finish. By the time callback gets called, none of the inserts has terminated.
Another approach would be this one:
So, there is some temptation to think “we have to call when the last insert calls back”, but this is plain wrong. The first insert can still be executing when the last one callsback. You never know.
I think that the safest approach is to do something like this:
You should only callback when all of the inserts have called back.
Serialization
Sometimes you want to control the flow and / or the order of the execution.
You may want the inserts to be perfectly ordered in this case, or you may want to stop inserting if an error occurs so you can recover more easily.
If that’s the case, you can do something like this:
Here we are using tail recursion to keep inserting the records.
This example has one problem: it uses the stack, so if collection os too big, you might end blowing up the stack.
One solution to this problem is to abandon the stack when recursing. And you can do it using a setTimeout with the timeout value of 0. The makes the inner function being called after the stack unwinds:
Follow-up
See the follow-up article Asynchronous iteration patterns in Node.js - part 2
Update:
Also (as pointed out by Tim Caswell), it’s important that no exceptions go back up into the event loop instead of ending up on the callback. So, you should wrap your db.insert or any other external function call. Our last example should then be:
Read more:
Resources: