Metaduck

Node.js, Ruby, Rails and more

1 note &

The news

Today, a young man on acid realized that all matter is merely energy condensed to a slow vibration - that we are all one consciousness experiencing itself subjectively. There’s no such thing as death, life is only a dream, and we’re the imagination of ourselves. Here’s Tom with the weather

- Bill Hicks

1 note &

Asynchronous iteration patterns in Node.js - part 2

This is a follow-up to my previous article “Asynchronous iteration patterns”.

The latest feedback has been great, some corrections and remarks were sent my way.

Also, some node.js hate was spread around because of what I think is the added complexity some times needed to handle asynchronous IO. It was not my intention to spread fear about handling asynchronous IO, but to gather and spread my view on how to handle them. In my view, asyhcnronous is good and here to stay, you should learn to embrace it.

But first, a correction:

Doh!

I was alerted by some readers that the serialize_timeout.js version (where we detach from the stack using a setTimeout) would not be necessary since we are already detaching from the stack by calling an asynchronous IO operation.

Right you are.

Apples and Oranges - being fair

If we are going to compare node.js IO programming with “normal” blocking programming, let’s be fair, and compare two equivalent objects.

In my opinion, comparing syntaxes for blocking IO APIs with non-blocking IO APIs is not fair, since non-blocking allows you to do much more. If you want to compare both, you have to compare the node.js solution to the following solution on the blocking world:

Since non-blocking IO allows you to put a lot of IO operations on the background, it’s only fair that we compare the same capability on the blocking world.

To have the same behaviour on the blocking world you would have to have, to start with, a thread pool. No, you can’t get away with creating a fresh new thread for each IO operation, since node.js does away with that overhead.

Then you would have to assign each IO operation to a new thread from the thread pool.

Then the main thread would block waiting for a completion signal from one of them.

Each thread would have to keep a global completion counter in order to know when all operations are done with. And yes, you have to synchronize thread access to that piece of memory.

When one thread detects that all operations are done, it has to notify the main thread that’s waiting it in order to unblock.

Not simpler in my opinion.

You could also run with co-routines, continuations, fibers - however you wish to call cooperative multi-tasking -. It would be a simplification - you wouldn’t have to synchronize access to the counter - but you still would have to manage multiple contexts. Also, not simpler in my opinion.

Abstractions

Also, the article served to expose the underlying patterns and their consequences. Day-to-day programmers should not have to deal with this complexity. That’s why some abstractions have been devised.

Here is one of them:

Step

Step is Tim Caswell’s flow control library. It allows for chaining callbacks in an easy way.

For example, let’s say we have a function called async, which serves to simulate asynchronous IO, abstracting us away from the database interaction.

Next we need to install step using npm, just type

$ npm install step

Then, we need to, say, insert 10 elements into the database in parallel.

You would simply use step with 2 functions.

On the first you would just insert all the elements in parallel. To the callback you pass this.parallel(), which is the way to create a callback in step that handles parallel requests.

Filed under node.js

4 notes &

Node Nerd: Durable Sessions

nodenerd:

GitHub Commit Monitor currently uses the default, in-memory session provider, which will mostly work for development, but has some downsides:

  • When the app restarts or is updated all of the session data is lost
  • If the session data is large or there are a large number of sessions, the amount of…

17 notes &

Asynchronous iteration patterns in Node.js

Some patterns are hard to grasp, specially when programming asynchronously like you have to when you’re doing IO on node.js.

For example, let’s suppose you had to program the following routine:

Insert a collection of objects on the database and then, when finished, call a callback.

So, if you had to write this in a synchronous fashion you would do something like this:

So, since we are using node.js, db.insert is most probably asynchronous. We have to turn this into an asynchronous function.

I have seen some obviously wrong implentations like this one:

The problem with this one is obvious: callback is called right after launching all the db.inserts on the background, not leaving them a chance to finish. By the time callback gets called, none of the inserts has terminated.

Another approach would be this one:

So, there is some temptation to think “we have to call when the last insert calls back”, but this is plain wrong. The first insert can still be executing when the last one callsback. You never know.

I think that the safest approach is to do something like this:

You should only callback when all of the inserts have called back.

Serialization

Sometimes you want to control the flow and / or the order of the execution.

You may want the inserts to be perfectly ordered in this case, or you may want to stop inserting if an error occurs so you can recover more easily.

If that’s the case, you can do something like this:

Here we are using tail recursion to keep inserting the records.

This example has one problem: it uses the stack, so if collection os too big, you might end blowing up the stack.

One solution to this problem is to abandon the stack when recursing. And you can do it using a setTimeout with the timeout value of 0. The makes the inner function being called after the stack unwinds:

Follow-up

See the follow-up article Asynchronous iteration patterns in Node.js - part 2

Update:

Also (as pointed out by Tim Caswell), it’s important that no exceptions go back up into the event loop instead of ending up on the callback. So, you should wrap your db.insert or any other external function call. Our last example should then be:

Read more:

Resources:

  • Step - a framework for asynchronous flow control by Tim Caswell;
  • Futures - Futures (promises framework) by AJ O’Neal;

Filed under node.js

0 notes &

Introducing Alfred.js - a node.js in-process key-value store

On November 2010 I blogged about wanting to have an in-process key-value store for node.js. I was displeased with the impedance mismatch between external datastores and node.js, and soon noticed there was nothing really similar to what I wanted. Following the advise from Tom Preston Werner, I created my own BOMTYCC problem.

Two months and a lot of long nights later, Alfred.js emerges.

To sum it up, Alfred.js is a key-document store to be used internally on node.js apps.

It uses append-only files, it has in-memory indexes, supports master-slave replication, supports atomic operations on one record , and some other features.

Also, you can - as in CouchDB -, to have a live feed of changes.

It supports a mongodb-similar query syntax, and also supports a power-user javascript finder interface.

Benchmarks

Speed is a major focus of Alfred.js.

I have some non-scientific benchmarks (which you will probably only understand if you get familiarized with the internals of Alfred.js and the benchmarks themselves).

They were done on my trusty MBP on a magnetic 5400RPM drive,  and yet they indicate you can do 15.000 to 20.000 writes per second may be possible.

They also indicate you can do 500.000 to 1.100.000 reads per second.

Hey, I may have some basic error on the benchmarks, but I would love to see these benchmarks running on a fast SSD…

Summing up

This is still experimental stuff. The API may change. It may delete your entire data. Use it at your own risk.

I still have to try it out with a lot of data to see how it behaves (I plan to use it on my next node-powered web app project).

But try it out. I hear it’s fast.

Filed under node.js

0 notes &

My almost perfect in-process node.js key-value store

Lately I have been looking into node-dirty and nStore. Both are in-process key-value stores for node.js. nStore is a bit more complex that node-dirty - it allows query results streaming, for instance - , but both are fairly simple approaches.

I like the power of these databases, since they allow for javascript-function-based queries. And since everything is in-process, there is no over-the-wire protocol for delegating queries, which saves a lot of time and effort.

But… every time you perform a query, node-dirty or nStore have to scan all the stored objects and apply the filter on them. Node-dirty database is all in-memory, but nStore has to load and scan the whole database to do that. On each query. In the other hand, a node-dirty database has to sit entirely in memory.

So, my (almost) perfect in-process node.js key-value store would have the following features:

Buckets

Buckets, containers, databases, tables - to allow you to partition your information across several buckets: one for users, another for products, etc.

This could be easily simulated with nStore or node-dirty by using separate databases.

Indexes. In memory.

You should be able to define indexes. These indexes would sit in memory. And that would be the only data that’s filterable using a javascript function. All the actual data would be on the filesystem.

Indexes would also be eventually persisted, so they would survive without having to reconstitute the indexes from meta-data.

Meta-data

Meta-data containing information about buckets and indexes. Persisted.

Append-only files

The file format would be append-only to avoid file corruption and allow for some nice features. For instance, we could rsync the file at any given time without the fear of cathing it in an inconsistent state.

Time-stamping

Each record written would be time-stamped. This way we could easily reconstitute (and audit) the state of the database at a given time.

Compacting

Since we are using append-only, the data file would grow indefinitely.

We should then be able to easily compact the database if we wanted to, loosing past state. This should not interrupt reads or writes.

This should also not introduce much more load on the process.

Replication

Any decent database allows for asynchronous replication, so this one should too. Each instance, running on each node could act as a master or a slave.

This should allow for master-master replication so you can easily have a cluster of node.js processes where you can read or write to any of them. An appropriate conflict-resolution mechanism should be thought of.

Durability

There would be an in-process write queue, which you could force to flush.

Filed under node.js