Using MongoDB and Node.js with Express-Resource to create a REST service

A while back I had a problem I was trying solve.

I was using node.js with express-resource to create a REST (ish) web service and return data from mongodb.

My app looked something like this:

app.js (server)

var express = require('express');
var Resource = require('express-resource');
var app = express.createServer();

// create express-resource handler  which essentially does app.get('things', ...)
var things = app.resource('things', require('./things.js'));

app.listen(port);

things.js (request handler)

require('./sendThings');

// handle request  'http://example.com/things'
exports.index = function(request, response) {
  sendThings(db, response);
};

sendThings.js (handles mongodb queries)

var mongodb = require('mongodb');

// create database connection
var server = new mongodb.Server(host, port, {auto_reconnect: true});
var db = new mongodb.Db(dbName, server);

db.open(function (err, db) {
  if (err) { }
  // auto_reconnect will reopen connection when needed
});

function sendThings(db, response) {
  db.collection('things', function(err, collection) {
    collection.find(function(err, cursor) {
      cursor.toArray(function(err, things) {
        response.send(things);
      });
    });
  });
}

module.exports.sendThings = sendThings;

My main issue was that I had to pass my response to my mongodb function in order to send it — node.js being asynchronous.  Then my db handling function had to know what to do to send a response.  Not something I wanted, and very untestable with tight coupling.

I posted a question on StackOverflow.com describing the problem.
how to use events keep mongodb logic out of node.js request handlers

I only got one response that was somewhat less than useless. I reached out to co-workers who said things like “try these other frameworks“.

What I was really looking for was a way to register requests with an event handler so that the DB could then send a notification when it was done. Of course this would mean passing the db result to the event handler. A co-worker suggested looking at EventEmitter2. I was a bit worried that I would be reinventing the node.js event processing.

I settled on a strategy suggested by a co-worker of wrapping the response in a closure, and then adding functionality that knows how to send the response, handle errors, no result found, etc. So instead of passing the response, I pass a callback that contains the response. Not ideal, but good enough.

Here is what I came up with:

I used mongojs which greatly simplifies the mongodb interface –at the cost of flexibility in configuration– but it hides the nested callbacks the mongodb driver requires. It also makes the syntax much more like the mongo client.

I then wrap the HTTP Response object in a closure and pass this closure to the mongodb query method in a callback.

var MongoProvider = require('./MongoProvider');

exports.index = function(request, response){
  function sendResponse(err, data) {
    if (err) { 
      response.send(500, err);
    }    
    response.send(data);
  };
  MongoProvider.fetchAll(things, sendResponse);
};

It is still essentially just passing the response object to the database provider, but by wrapping it in a closure that knows how to handle the response, it keeps that logic out of my database module.

A slight improvement is to use a function to create a response handler closure outside my request handler:

function makeSendResponse(response){
  return function sendResponse(err, data) {
    if (err) {
      console.warn(err);
      response.send(500, {error: err});
      return;
    }

    response.send(data);
  };
}

So now my request handler just looks like this:

exports.index = function(request, response) {
  response.send(makeSendResponse(response));
}

And my MongoProvider looks like this:

var mongojs = require('mongojs');

MongoProvider = function(config) {
  this.configure(config);
  this.db = mongojs.connect(this.url, this.collections);
}

MongoProvider.prototype.configure = function(config) {
  this.url = config.host + "/" + config.name;
  this.collections = config.collections;
}

MongoProvider.prototype.connect = function(url, collections) {
  return mongojs.connect(this.url, this.collections);
}

MongoProvider.prototype.fetchAll = function fetchAll(collection, callback) {
  this.db(collection).find(callback);
}

MongoProvider.prototype.fetchById = function fetchById(id, collection, callback) {
  var objectId = collection.db.bson_serializer.ObjectID.createFromHexString(id.toString());
  this.db(collection).findOne({ "_id": objectId }, callback);
}

MongoProvider.prototype.fetchMatches = function fetchMatches(json, collection, callback) {
  this.db(collection).find(Json.parse(json), callback);
}

module.exports = MongoProvider;

I can also extend MongoProvider for specific collections to simplify the API and do additional validation:

ThingsProvider = function(config) {
  this.collection = 'things';
  this.mongoProvider = new MongoProvider(config);
  things = mongoProvider.db.collection('things');
}

ThingsProvider.prototype.fetchAll = function(callback) {
  things.fetchAll(callback);
}

//etc...

module.exports = ThingsProvider;

(I originally used util.extend, but decided that composition was more reliable because extend breaks instanceof and is unreliable)

I might continue looking for an event handling solution or build one myself. It seems a common enough use case, that I’m surprised there’s not a notification based web server framework

  • onRequest – register an EventHandler, create a Response. If no other events are registered send the “done” event which sends the Response and flushes the EventQueue.
  • onTimeout – send the Response, either all that has been built at this point, or an error — configurable. Possibly also send “202 Accepted”.
  • onEventQueueEmpty – send “done” Event and send the Response. Send Document Empty if response has had nothing added (204 No Content) and has no defaults.
  • registerEvent – add an Event to the Queue.
  • unregisterEvent – fails if unable to stop processing

Getting started with Ruby and Selenium

I got an email recently asking for info on getting started with Selenium & Ruby.  I replied:

I can offer training with Selenium, but I’d be happy to try to answer specific questions for free.

I’ve written a blog post about using Selenium with Ruby:
https://fijiaaron.wordpress.com/2010/09/29/writing-page-based-tests-with-selenium-in-ruby/

But your best reference is probably the Selenium home page:
http://seleniumhq.org/

You can install Selenium IDE (a Firefox plugin) and experiment with recording tests and then convert them to Ruby.  You’ll need to turn on formats, explained here:
http://blog.reallysimplethoughts.com/2011/06/10/does-selenium-ide-v1-0-11-support-changing-formats/

Here’s an example I created in Selenium IDE:

After checking “Enable Experiemental Features” under the “Options” menu, you can see Ruby code that looks like this:

 

A good introduction on using Selenium with Ruby and RSpec:
http://selenium.rubyforge.org/getting-started.html

Refactoring for Testability

Refactoring for Testability

(or how I learned to stop worrying and love failing tests)

link to slides: PPTX PDF

I’d like to talk about refactoring. Specifically, refactoring to improve testability.  So we can write better tests and have more confidence when we refactor code (and add new features) that we don’t break existing functionality — except when we really want to.

That’s a nice idea, it’s but not practical.  I have all this legacy code — some of it I don’t understand — or even want to.

Here in the real world, we do what we can, test new features as best we can, and try to make sure we don’t break anything with regression testing (often manual) — and try not to look too closely at that cruft.  Maybe we’ll poke it occasionally with a long sharp stick to see if it’s still working, but there’s no time to clean it up even if we wanted to.

I hope I can touch on a few points that will help you think about how to tackle a refactoring task with an eye to improving testability.  And talk a bit about why that’s a good thing.

What is refactoring?

First a few definitions:

“Refactoring is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior”

That’s from Martin Fowler.  He literally wrote the book on refactoring.

In simpler words, refactoring means “changing your code but not changing what it does.”

That’s why it’s so hard to convince your manager that you really need to go back and clean up that mess you shipped last month.  “If it ain’t broke, don’t fix it.”

If refactoring doesn’t change how code works, the test of a successful refactor is that after your changes, your tests (you do have tests, right?) still pass.  If you have confidence in your tests, you should be confident in your refactoring.

Nice theory, right?

What is testability?

So what do I mean by testability?  Being able to write tests that you have confidence in.

Tests are often brittle, and sometimes inaccurate or misleading.  A code coverage tool can show a metric of 100% and not cover use cases very thoroughly.  Above all, keeping our tests up to date with code changes is a never ending and thankless  task.

Testers are often limited in what they can test, and we often have to rely on UI testing, manual testing, heuristics, and even our gut feeling or instincts.

But sometimes a slight change in code can make our work easier.  That tantalizing hope is what refactoring for testability is about.

How do we do it?

I’m going to highlight a few strategies for refactoring and go over how they can improve the testability of code.

Why Refactor

Why should we refactor?  Especially if “It Just Works” (TM).  If we know it works, it must have been tested, and if it’s well tested, it must be testable.

My code is good

 [Image – Michaelangelo’s Sistine Chapel

Right?

We’re all great coders, we use the appropriate design patterns, and we don’t make mistakes…

Our code is flawless. But those guys before left us a mess, and we have to clean it up.

The only constant is change

I [Image – Garth from Wayne’s World: We fear change]

Even if you’re the Michelangelo of code, time will affect your code.  If not yours, then all other code around it.  Even the Sistine Chapel needed a touch up after a few hundred years.

The only constant is change.

And systems will change, requirements will change.  Operating Systems and device drivers will change.  Database schemas and use cases will change.

Maybe not all for the better, but it’s something we have to learn to live with.

And some of us may actually improve our skills over time.  Code I wrote 2 or 5 years ago is not as clean as code I wrote yesterday (but you should see the horrible mess I wrote last week).

If you practice agile, you start coding with a simple solution — the simplest possible solution that could possibly work, and elaborate from there.  It’s like starting with an impressionist painting and adding the details, until the picture comes into clearer focus as you trick the requirements and implementation details out from the business customers.

All of these are causes of change.

User Interface Code

How many of you have seen something like this?

PHP Code Sample

 [Image - PHP MySQL Example]

I chose this example — right out of the PHP manual — because it’s an easy target.  But the problems illustrated aren’t just because it’s PHP.

You can write fairly clean PHP.  I’ve written fairly decent PHP.  But why would you want to, right?

And now that my credibility is completely shot…

First of all, there’s the classic problem that everyone did in the bad old days of the internet (admit it, you did it too–or were still in diapers).

Separate business logic from display logic

That’s mixing business and display logic.  But this is only one particularly egregious example.  Hopefully no one does this anymore.

Encapsulate implementation details

But there are other layers that can get mixed just as easily — but shouldn’t.  Mixing database and business logic.  Relying on a single vendor’s implementation.

And then there are all these other problems…

  • Don’t hard code configuration
  • Handle errors gracefully
  • Don’t expose other layers
  • Don’t tie your code to vendor
  • Use meaningful variable names

I’m sure you could spot more in this example.

Refactoring the UI

Since we’re on the subject, how many of you have written UI tests? Aren’t they fun to automate?  …and maintain?

There are a few simple things we can do to improve our UI tests.  The easiest is probably to not do them at all.  But alas, sometimes there is no other apparent way.

[Image - Santa Claus]

Yes, Virginia, there really is UI logic

And sometimes, there really is logic in the UI that needs tested.

Sometimes we start to wonder if we really need these tests, like a little girl who doesn’t know if she should still believe in Santa Claus.

ID tags

One thing that would be nice is if our HTML elements all had IDs.  Or at least the important ones.  Add them.  Or get the owner of the code to add them.  Just do it.  Send XPath back to where it came from (the W3C – Have you ever tried to read one of their specs?).

Individually Testable Components

Something else that would be nice is if a GUI component could be rendered (and thus tested) in isolation.  You can’t test a full page layout this way, but creating individually testable UI components is a step in the right direction.  It helps you concentrate on the issue at hand — and it may help make the backend code cleaner.

There’s often more “PHP” than we’d like to admit in our code.

Maintainable UI tests

There are ways to make UI tests themselves maintainable.  Things like Page Objects.  I can talk more about that another time.  There is a better way to create UI tests than record / playback.

 

Pushing testing down the stack

[Image – Pushing down suitcase]

It really is better to push as much testing as possible further down the stack.  Eliminate the UI component from your tests wherever possible.

Reduce brittle UI tests

You can start by making your tests not dependent on the UI automation framework.  If a method has the word “test” in it, make sure it doesn’t also have the word “Selenium”.  Describe your tests in terms of functionality, not clicks and wait fors.

As you refactor your tests, you can then reduce the brittle UI component in stages.

Run tests faster

There are more advantages to this than immediately apparent.  The first is that your tests will run faster.  Not having to instantiate a browser for every test is a huge win.  Eliminating network latency for an HTTP response, and not having to render (or parse) HTML are other advantages.

Isolate the system under test

If you isolate the system under test, it allows you to focus your testing as well.  It removes unnecessary variables, and likely makes the code cleaner.

Think about code as individual components

When you think about individual components with specific behaviors, and code accordingly, it reduces dependencies, and increases testability.

SOLID

I’m sure you’ve all heard of SOLID principles.  Right?

[Image - cracked egg]

Oops, wrong picture.  That’s not SOLID.

Just to review, SOLID is an acronym for:

Single Responsibility Principle

An object should do only one thing, and do it well.

This can lead to huge wins for testing, and I’ll talk about it more when I talk about Dependency Injection.

Open /Closed Principle

Objects should be open to extension, but closed for modification.

While true, when you’re refactoring, sometimes you have to break this egg.  You might have to destroy the object’s interface in order to save it.

Liskov Substitution Principle

You should be able to replace an object with one of its subtypes without breaking things.

I don’t know who Liskov is.  I suspect he was invented just to fit the pneumonic.

Interface Segregation Principle

Many specific interfaces are better than one general purpose interface.

There is a balance here between simplicity and complexity — but the principle is generally true.  Lines of code per method is often a good measure of if you’re doing this right.

Dependency Inversion Principle

Not dependency injection — which is something almost completely but not entirely unlike…actually they’re somewhat related.

The idea is that you should not depend on a concrete implementation.

Refactoring Strategies

[Image - chessboard]

It helps to keep SOLID principles in mind when refactoring.  It helps you spot areas that are ripe for refactoring.  Let’s talk a little bit about some specific examples of refactoring.

Martin Fowler, as I mentioned, wrote a book on refactoring and maintains the website refactoring.com.  It has a catalog of refactoring examples, and I’ve tried to distill some common refactoring strategies here.

Extract, Consolidate

If a method is too long, or does more than one thing, you can turn it into multiple methods.  Use descriptive names to make your code more readable.

Consolidate conditional statements and describe algorithms in business terms.

Change, Move

Rename a variable or method name, rearrange the order of commands.

Encapsulate, Substitute

Provide limited accessors to a collection (i.e. add, remove).

Substitute a different, maybe more efficient algorithm.

Hide, Expose

Make fields and methods private that are not used externally.

Expose fields that may be useful for extension, testing, etc.

Pull Up, Push Down

Move methods to a parent or child class as appropriate.

And many more…  See refactoring.com for more examples.

 Refactoring Example

Summer Discount

Here’s an example demonstrating several refactorings.

[Image - getDiscountRate code]

Adding Layers of Abstraction

[Image - Onion]

Indirection

One of the pioneers of Computer Science, David Wheeler (the inventor of the subroutine) famously said:

“Any problem in computer science can be solved with another layer of indirection”

There are several abstraction layers we use every day, without even thinking about them.  File I/O is a great example.  Indeed “Files” and “Windows” are excellent metaphors that help programmers as well as users to think in abstractions.

Abstraction is the generalization of a model or algorithm, apart from the specific implementation.

Reusability

The goal should be reusability or simplification, not abstraction for its own sake.

Composition over Inheritance

Composition is when one object contains instances of other objects — as opposed to inheritance when it extends a base object and implements additional functionality.  The reason composition is preferable to inheritance is that it allows the composing object to be (somewhat) in the dark about its children.  It leads to cleaner code — that is easier to test — and helps simplify interfaces.

Here is an example of multiple abstraction layers.

Simplifying Interfaces

[Image -- Light Switch]

There is a constant battle between simplicity and complexity.  Coding is primarily the exercise of reducing complexity.

Abstraction is one way to combat complexity.  But abstraction can also create its own complexity.  We need to be careful about this, and only add layers of abstraction when it adds value.

A simple interface leads to cleaner code, but too simple and it can be difficult to test.

Exposing functionality

Use your judgment about when to expose functionality.

Decoupling

[Image - Jet Engine Schematic]

One of the main goals of refactoring…and especially refactoring for testability is to decouple dependencies.

Too often one component depends on another component — when it shouldn’t have too.

Presentation and Business Logic

Decoupling presentation and business logic shouldn’t need any more justification.  MVC frameworks exist to do this.

Persistence and Domain Objects

Decoupling persistence and domain objects is another low hanging fruit.  ORM frameworks try to do this.

Interface from Implementation

Not just creating separate interface and implementation files but even more, not relying on a specific implementation in your code.  For example, using a database abstraction layer (like ODBC) instead of a specific database driver.  Or using a standard API (like OpenGL) instead of a vendor library.

This allows you to swap out algorithms and other implementation details.  It also allows you to create mocks–for the file system or network interface, for example.

Dependency Injection

[Image - Gary Gnu puppet]

How many of you have heard of dependency injection?

Can you explain what it does?

The main point behind dependency injection is to remove internal dependencies.  To enable decoupling.  Which makes for more modular code.  And makes testing easier as a side benefit.

No news is good news

One simple check for dependencies is do you instantiate composite objects in your classes?  And especially, do you have to configure these objects?

As my obscure puppet friend always said: “No Gnus is good Gnus…whatsoever!”

No framework is necessary

When people hear dependency injection, they often think of some complex Aspect Oriented Framework and run for cover.  But no fancy framework is necessary to accomplish dependency injection in many cases.

Factory pattern

You can start by using factories — whenever you have more than one class that implements an interface, a factory can be used to “inject” the right implementation for the occasion.

Constructors, Setters, Builders

Dependency injection (even with fancy frameworks) is usually accomplished by passing instantiated (and configured) objects into your constructor, or using an explicit setter method to pass one to your object before using it.

Sometimes this is done with bytecode manipulation, but when you have the source… you can do it the easy way.

There is a risk of polluting your API with a bunch of object parameters you’d rather not think about.  Another thing you can do is create “builders” that know how to assemble your object and sew in all its dependencies, thus hiding the ugly details behind — that’s right — another layer of abstraction.

Your builder may have reasonable defaults that cover the common cases so only when you need to build an exotic configuration do you ever have to worry about the details.

Loose Coupling

Again, the goal of dependency injection is loose coupling.

Dependency Injection Example

MyFileHandler

[Image - MyFileHandler code]

MyFileHandler has methods to read, write, and delete a file.  With a pretty simple implementation.  Of course this isn’t a realistic example, but I hope it illustrates the point.

FileUtil

[Image - FileUtil code]

FileUtil can open a file and alphabetize it.  Pretty neat, huh?  I’m looking for venture capital for this revolutionary new app.

What can we do to improve this?

Extract Interface

[Image - Extract Interface]

First, we can create an interface for MyFileHandler to implement.  Not very spectacular, but it allows us to now

Inject Dependency

[Image - FileUtil refactored code]

Inject a dependency with any class that implements the read(), write(), and delete() methods.  This means we can swap out one implementation for another …with say a factory… and uses Unix style sockets or pipes in the same way as files.

[Image - Test 1]

Use mocks

Or help us test by allowing us to create mocks, and inject them into the code.  Now, we’re not dependent on the file system.  Or it could be the database, or a web service, or some third party framework — or something that isn’t completely developed yet.

But let’s see if we can do even better.

[Image - Test 2]

That’s more like it.  We don’t want to test the file system.  And we don’t want to create mocks either.  We only want to test our alphabetizer.

That it is used for alphabetizing files is a detail we don’t care about.  An object should do one thing and do it well.

I suppose you want to see how it does it…

[Image - TextUtil code]

In the fine tradition of computer science, as popularized by Mr. Donald Knuth, I’ll leave it as an exercise for the reader.

But you get the idea.

Reduce dependencies

We want to reduce dependencies.

[Image - No Mocks]

It’s not realistic that we’ll never have to use mocks.  But you should keep an eye out for where you can eliminate them.  Because when you eliminate a dependency, an angel gets his wings…

Or something like that.

Coding in Business Terms

[Image - UserFriendly Cthulhu in a suit]

This is one last area I want to touch on in refactoring.  That’s trying to write your code in business terms.

Common domain language

Not only does it give you a common domain language to talk about the problem, but it makes your tests more readable, and that helps them more clearly reflect the business needs.

Try writing tests first that express your API as close as possible to the business language.  And then write your interface to match the tests.

Acceptance Criteria

This helps you to write tests that more closely resemble the acceptance criteria.  Acceptance tests are different from functional tests in that they’re not testing the implementation.  They are testing the business requirements.

Often there is no easy way to do this.  That’s why we frequently end up with UI tests that are brittle, unmaintainable, etc.

If we can push testing down the stack so that we’re not relying on the UI to test business features, we can make our tests better — and our code more testable.

BDD frameworks

One thing worth exploring are Behavior Driven Development Frameworks like RSpec, SpecFlow, JBehave, Cucumber.  They attempt to use a natural language syntax to express tests with the aim of making them more readable.

Given {some state}
When {a condition}
It should {do something}

I’d encourage you to look at them if you haven’t yet, but you don’t really need a framework to help you write better tests.

Often, what you really need is just a little refactoring.

[Image - Dr. Strangelove]

Quality Bots at Ancestry.com

I’ve been working at Ancestry.com for the past month.  Today we gave a presentation on the QualityBots framework we’ve been developing – -inspired by a similar tool used by Google.

It crawls the web site and scrapes web pages using WebDriver. Then it compares the HTML source & screenshot, giving the percentages changed that can be measured against a baseline and threshold.   A really smart coworker developed the diff tool & algorithms.

It’s great for comparing different versions of the site (say development against production) or  doing a pixel comparison in  different browsers.

I created the dashboard UI with dustjs client side javascript templates and JQuery for special effects.  Another co-worker has taken over that aspect.

It reads local files using the HTML5 File API and uses mongodb with the C# driver for persistence.   So it can be used as a standalone tool or save scrapes to the database for future comparison.

The MongoDB document store is a great fit because we can serialize our hierarchical data and persist it without creating a DB schema, and the send JSON from Mongo to the dashboard where it is rendered with only a only a thin web service wrapper layer.

I built a simple node.js REST API for storing and retrieving comparisons from the database with express-resource.

It seems like after our presentation to several testers, a couple managers and development architects that there’s a lot of excitement for using the tool.  I can’t make any promises, but we may even be able to open source it.