Posts tagged node.js
Posts tagged node.js
30 notes &
Banzai is a document processing framework for Node.js.
You define a set of pipelines into which you push documents. Each document in a pipeline has a given state. A state transition triggers a state entry handler that can transform the document and interact with the outside world. The documents ends in a defined or in an “error” state.
You can roll-back the state of a document to a certain previous state, and playback the pipeline flow. This can be useful, for instance, if a given document enters an error state because of a bug or a networking problem somewhere. You can correct the bug, roll-back to a previous state and play the pipeline from thereon, hopefully escaping that error condition.
Each state transition has a “next state”, a priority and a optional pre-condition. The candidate transitions (there can be more than one) are evaluated in the given priority, and if there is a pre-condition, it is evaluated, and if there is a match, the corresponding state transition handler is triggered.
Each state transition can have an “undo handler”, that takes care of undoing the document. This can be useful if external services were changed and you need to revert those changes when you revert a transition.
The state transition handlers and the pre-conditions are all defined in JavaScript and are asynchronous, meaning that you can perform I/O inside them. The pipeline definition is also written in JavaScript.
A Banzai deployment has 4 main components: the document store, the state store, the workers and the work queue.
The document store is where - you guessed it - the documents are stored. The document is retrieved when entering a state transition, passed into the state transition handler and saved when the handler is done. This way a state transition can be picked by any worker and the document is always persisted, surviving failures.
The state store is where the state for each document transitioning or that has transitioned a pipeline is stored. There you can also find some additional meta-data, like all the transitions that occurred and their start and end times plus some meta-data that the state transitions can save.
Workers are processes that are listening for state transitions and that pick up the work of invoking the state transition handler and deciding the next state.
The Work Queue is an event queue that persists and distributes the transitions to be picked up by the workers.
Currently the only supported database is CouchDB, but technically any document database is supported. It should, by the way, store every version of the documents (as CouchDB does) if you want to be able to roll-back to certain versions of the documents.
The module for supporting CouchDB is banzai-couchdb-store.
Currently we support Redis (any version >= 2.1.7) if you use the banzai-redis module, but any queueing system that allows the same semantics should work.
Check out the project README.
1 note &
When developing server-side Javascript for Node.js, generally I tend to encapsulate classes inside CommonJS modules and expose the constructor function as module itself.
As an incomplete example of how I used to do it, let’s build a module that exposes a rectangle class:
function Rectangle(x, y, width, height) {
this.x = x;
this.y = y;
this.width = width;
this.height = height;
}
Rectangle.prototype.area = function() {
return this.width * this.height;
};
module.exports = Rectangle;
Let’s say you save this module under the name “rectangle.js”, on the current directory.
Then, to instantiate a rectangle you must do:
var Rectangle = require('./rectangle');
var rectangle = new Rectangle(1,2,3,4);
rectangle.area(); // -> 12
All is fine and dandy, right?
Nope. This way you can tamper with the rectangle object, changing properties and even overriding functions. I think this is not a major problem, but exposes a major design flaw, which I’ll cover later.
Now you want to add a private function. You have two main options: 1) add it as a function on the module scope or 2) add it as a function on the Rectangle.prototype object, but giving it an underscore so everyone knows they shouldn’t be calling.
Lets’ say for the purpose of the example, that you want to add a provate function named “coalesce”, which you want to call after the constructor.
function coalesce() {
var self = this;
['x', 'y', 'width', 'height'].forEach(function(prop) {
if (!self[prop]) { self[prop] = 0; }
});
}
function Rectangle(x, y, width, height) {
this.x = x;
this.y = y;
this.width = width;
this.height = height;
coalesce.apply(this);
}
Rectangle.prototype.area = function() {
return this.width * this.height;
};
module.exports = Rectangle;
Here we can see the constructor calling the “coalesce” function using the function.apply(), which sets the “this” scope, which then the coalesce function can use as the object.
function Rectangle(x, y, width, height) {
this.x = x;
this.y = y;
this.width = width;
this.height = height;
this._coalesce();
}
Rectangle.prototype.area = function() {
return this.width * this.height;
};
Rectangle.prototype._coalesce = function() {
var self = this;
['x', 'y', 'width', 'height'].forEach(function(prop) {
if (!self[prop]) { self[prop] = 0; }
});
};
module.exports = Rectangle;
This way is simpler, but we’re exposing the coalesce function, which is ugly.
As I said earlier, this pattern exposes the methods and the data on the rectangle object.
The ultimate goal would be to expose the methods and encapsulate the data. How can we do that?
Here is a solution I like to use:
function Rectangle(x, y, width, height) {
function area() {
return width * height;
};
function coalesce() {
if (! x) { x = 0; }
if (! y) { y = 0; }
if (! width) { width = 0; }
if (! height) { height = 0; }
}
coalesce();
return {
area: area
};
}
module.exports = Rectangle;
And a client of this module would look like:
var Rectangle = require('./rectangle');
var rectangle = Rectangle(undefined, undefined, 3, 4);
The constructor simply returns an object that has the methods we want to expose. The data is encapsulated inside the constructor function, which also contains all the functions (private and public) that have privileged access to these.
Then we’re dropping the using of “new” notation on the class clients (which could cause a lot of problems on the previous model if module clients omitted it).
This pattern also allows for object methods (private or public) to call each other with no restraints, since we are not relying on the leaky this object.
A useful way of declaring that a class (or pseudo-class, if you will) inherits from another one is having the constructor prototype pointing to an object that it “inherits” behavior from. Node (and almost all the Javascript frameworks) has convenience function for doing this in util.inherit().
For instance, say you want our Rectangle class (as in our first incarnation) inheriting from the Node EventEmitter class:
var inherit = require('util').inherit
, EventEmitter = require('events').EventEmitter;
function Rectangle(x, y, width, height) {
this.x = x;
this.y = y;
this.width = width;
this.height = height;
}
inherit(Rectangle, EventEmitter);
Rectangle.prototype.area = function() {
return this.width * this.height;
};
module.exports = Rectangle;
Convenient, heh? (You must be careful to call inherit before setting the prototype properties, or else they will be nuked). How can we then implement inheritance if we’re not using the tradicional Javascript constructor functions?
Here is a way:
var EventEmitter = require('events').EventEmitter;
function Rectangle(x, y, width, height) {
var that;
function area() {
return width * height;
};
function coalesce() {
if (! x) { x = 0; }
if (! y) { y = 0; }
if (! width) { width = 0; }
if (! height) { height = 0; }
}
coalesce();
that = {
area: area
};
that.__proto__ = EventEmitter.prototype;
return that;
}
module.exports = Rectangle;
So, we’re using the __proto__ object, which is reserved in Javascript for the actual prototype object. So if you call any EventEmitter-specific methods like on() and emit(), the runtime will look into the rectangle object, and if not found, will search inside the prototype chain.
Mind you that the __proto__ object is not entirely portable to all Javascript platforms and browsers, but there are ways around that.
0 notes &
0 notes &
I love testing and I love small easily testable modules in Node.
Recently I had to build a library module that interacts with some web services via HTTP.
To test this module as it was would mean that I would have to have a sandboxed account on the other end. I also would have to have setup and teardown routines that would reset the sandbox to a known state, etc, etc.
What I really wanted for the unit tests was to test the module in isolation. In this case, it would mean capturing the HTTP requests and replying a pre-made response.
Enter nock.
Nock is an HTTP mocking and expectations library for Node.js
With Nock you can easily mock a GET request:
var nock = require('nock');
var scope = nock('http://myapp.iriscouch.com')
.get('/users/1')
.reply(200, {_id: "123ABC", _rev: "946B7D1C", username: 'pgte', email: 'pedro.teixeira@gmail.com'});
or a POST request with a specified body (string or json-encoded object):
var scope = nock('http://myapp.iriscouch.com')
.post('/users', {username: 'pgte', email: 'pedro.teixeira@gmail.com'})
.reply(201, {ok: true, id: "123ABC", rev: "946B7D1C"});
or a PUT or a DELETE in the same fashion.
You can also specify the response as a string:
var scope = nock('http://api.app.com')
.post('/users', {username: 'pgte', email: 'pedro.teixeira@gmail.com'})
.reply(201, "OK");
or as a JSON-encoded object:
var scope = nock('http://api.app.com')
.post('/users', {username: 'pgte', email: 'pedro.teixeira@gmail.com'})
.reply(201, {ok: true, _id: "abcdef", _rev: "1234"});
or from the contents of a file:
var scope = nock('http://api.app.com')
.post('/users', {username: 'pgte', email: 'pedro.teixeira@gmail.com'})
.replyWithFile(201, __dirname + '/assets/reply.json');
If you have time-dependent or random data you want to filter out from the request path or body, you can use a regular expression, much like String.prototype.replace:
var scope = nock('http://api.app.com')
.filterPath(/timestamp=[^&]*/g, '')
.post('/users', {username: 'pgte', email: 'pedro.teixeira@gmail.com'})
.replyWithFile(201, __dirname + '/assets/reply.json');
.filterPath() also accepts a function as sole argument. That function should return the filtered path.
As said, Nock also supports request body filtering much the same way it does with path filtering. Just use .filterRequestBody like this:
var scope = nock('http://api.app.com')
.filterRequestBody(/timestamp=[^&]*/g, '')
.post('/users', {username: 'pgte', email: 'pedro.teixeira@gmail.com'})
.replyWithFile(201, __dirname + '/assets/reply.json');
or even with a function as the only argument.
When a scope is defined Nock intercepts every HTTP request being made in that process to that host. If a match is not found - Nock matches verb, path and body - an exception is raised.
When a mocking match is found, Nock removes that match.
At the end of the test, if you wish to test that all the expected calls were made, you can use
scope.done();
and a detailed exception will be thrown if some expectations were not met.
I hope this module makes testing easier for you.
Feedback with suggestions is welcome!
0 notes &
Configuration is always a chore, a simple thing you have to do, and keep reinventing every time you start a new project.
I usually have the following setup:
I have one configuration file per domain and several domains. A domain may be “www”, “couchdb” or a remote service that requires some parametrization like Postmark.
These are JSON files that lie in some directory.
Now, I want to be able to override them for each environment the application works on. I may have a “development” environment, a “test” environment, a “staging” environment and a “production” environment. I may even have different development environment configurations depending on the developer.
I want, for each of these environments, to be able to slightly tweak each configuration for some domains. For instance, I use localhost for Redis in my development environment, but redis.hostname.com in my production one. I want to be able to just specify the differences, not the whole configuration again, because soon that starts to be unmaintainable.
Looks simple, right? I searched through some existing Node.js modules that manage configuration, and had none that answered these requirements.
So I came up with Konphyg.
$ npm install konphyg
First you import konphyg and give it a source dir like this:
var config = require('konphyg')(__dirname + "/config");
Then you create the “config” dir and put a configuration file for each domain. For instance, for the “postmark” domain, I can have:
{
"host": "api.postmarkapp.com"
, "ssl": false
, "api_key": "myapikey"
}
I place this file inside the “config” dir and name it “postmark.json”.
Now, for my development environment I need to specify my API key. So, inside the config dir I place the file “postmark.development.json” with just this:
{
"api_key": "ABCDEFGHI"
}
Then, on my Node code, if I do:
console.log(config("postmark"));
I get:
{
"host": "api.postmarkapp.com"
, "ssl": false
, "api_key": "ABCDEFGHI"
}
Simple, right?
Konphyg can also handle deep object nesting in your configuration file, and it will correctly merge the environment-specific configuration with the base one.
Konphyg uses the NODE_ENV environment variable to determine the environment. If not present, it defaults to “development”.
0 notes &
0 notes &
1 note &
1 note &
This is a follow-up to my previous article “Asynchronous iteration patterns”.
The latest feedback has been great, some corrections and remarks were sent my way.
Also, some node.js hate was spread around because of what I think is the added complexity some times needed to handle asynchronous IO. It was not my intention to spread fear about handling asynchronous IO, but to gather and spread my view on how to handle them. In my view, asyhcnronous is good and here to stay, you should learn to embrace it.
But first, a correction:
I was alerted by some readers that the serialize_timeout.js version (where we detach from the stack using a setTimeout) would not be necessary since we are already detaching from the stack by calling an asynchronous IO operation.
Right you are.
If we are going to compare node.js IO programming with “normal” blocking programming, let’s be fair, and compare two equivalent objects.
In my opinion, comparing syntaxes for blocking IO APIs with non-blocking IO APIs is not fair, since non-blocking allows you to do much more. If you want to compare both, you have to compare the node.js solution to the following solution on the blocking world:
Since non-blocking IO allows you to put a lot of IO operations on the background, it’s only fair that we compare the same capability on the blocking world.
To have the same behaviour on the blocking world you would have to have, to start with, a thread pool. No, you can’t get away with creating a fresh new thread for each IO operation, since node.js does away with that overhead.
Then you would have to assign each IO operation to a new thread from the thread pool.
Then the main thread would block waiting for a completion signal from one of them.
Each thread would have to keep a global completion counter in order to know when all operations are done with. And yes, you have to synchronize thread access to that piece of memory.
When one thread detects that all operations are done, it has to notify the main thread that’s waiting it in order to unblock.
Not simpler in my opinion.
You could also run with co-routines, continuations, fibers - however you wish to call cooperative multi-tasking -. It would be a simplification - you wouldn’t have to synchronize access to the counter - but you still would have to manage multiple contexts. Also, not simpler in my opinion.
Abstractions
Also, the article served to expose the underlying patterns and their consequences. Day-to-day programmers should not have to deal with this complexity. That’s why some abstractions have been devised.
Here is one of them:
Step is Tim Caswell’s flow control library. It allows for chaining callbacks in an easy way.
For example, let’s say we have a function called async, which serves to simulate asynchronous IO, abstracting us away from the database interaction.
Next we need to install step using npm, just type
$ npm install step
Then, we need to, say, insert 10 elements into the database in parallel.
You would simply use step with 2 functions.
On the first you would just insert all the elements in parallel. To the callback you pass this.parallel(), which is the way to create a callback in step that handles parallel requests.
17 notes &
Some patterns are hard to grasp, specially when programming asynchronously like you have to when you’re doing IO on node.js.
For example, let’s suppose you had to program the following routine:
Insert a collection of objects on the database and then, when finished, call a callback.
So, if you had to write this in a synchronous fashion you would do something like this:
So, since we are using node.js, db.insert is most probably asynchronous. We have to turn this into an asynchronous function.
I have seen some obviously wrong implentations like this one:
The problem with this one is obvious: callback is called right after launching all the db.inserts on the background, not leaving them a chance to finish. By the time callback gets called, none of the inserts has terminated.
Another approach would be this one:
So, there is some temptation to think “we have to call when the last insert calls back”, but this is plain wrong. The first insert can still be executing when the last one callsback. You never know.
I think that the safest approach is to do something like this:
You should only callback when all of the inserts have called back.
Serialization
Sometimes you want to control the flow and / or the order of the execution.
You may want the inserts to be perfectly ordered in this case, or you may want to stop inserting if an error occurs so you can recover more easily.
If that’s the case, you can do something like this:
Here we are using tail recursion to keep inserting the records.
This example has one problem: it uses the stack, so if collection os too big, you might end blowing up the stack.
One solution to this problem is to abandon the stack when recursing. And you can do it using a setTimeout with the timeout value of 0. The makes the inner function being called after the stack unwinds:
Follow-up
See the follow-up article Asynchronous iteration patterns in Node.js - part 2
Update:
Also (as pointed out by Tim Caswell), it’s important that no exceptions go back up into the event loop instead of ending up on the callback. So, you should wrap your db.insert or any other external function call. Our last example should then be:
Read more:
Resources: