Getting started with ElasticSearch on Node

“Searching,
Seek and Destroy
Searching,
Seek and Destroy” – Metallica

I recently had to set up ElasticSearch within a Node project.  I found the supporting documentation to be scattered and had a difficult time finding examples of what I would consider everyday production configurations.  So I’ve gathered a lot of what I learned here and hopefully this saves some of you a bit of time.

Get the elasticsearch-js client

First, use the elasticsearch-js npm package.  This seems to be actively developed.  Specifically, do not use the elasticsearchclient package as that seems to be end-of-lifed.  Create your client like:

ElasticSearch = require("elasticsearch")
client = new ElasticSearch.Client {host: 'localhost:9200'}

Create the index

Next, you’ll want to create your indices.  This is where it can get overwhelming.  ElasticSearch is great because it can be configured in countless ways, but understanding all the variants can be hard at the beginning.  Here is what I think would be a great place to start.  Using your ES client, create an index ‘users’:

client.indices.create
  index: "users"
   type: "user"
   body:
     settings:
       analysis:
        filter:
          mynGram:
             type: "nGram"
             min_gram: 2
             max_gram: 12
          analyzer:
             default_index:
               type: "custom"
               tokenizer: "standard"
               filter: [
                 "lowercase"
              ]
             ngram_indexer:
               type: "custom"
               tokenizer: "standard"
               filter: [
                "lowercase"
                "mynGram"
              ]
             default_search:
              type: "custom"
              tokenizer: "lowercase"
              filter: [
                "standard"
              ]
     mappings:
       dynamic: "true"
       properties:
         id:
           type: "string"
           index: "not_analyzed"
         name:
          type: "string"
          index_analyzer: "ngram_indexer"
         suggest:
           type: "completion"
           index_analyzer: "simple"
           search_analyzer: "simple"
           payloads : true

Here we have:

  • defined the default index and search analyzers (default_index, default_search).  These will be used to index all fields unless otherwise specified, such as in…
  • mappings.properties.name.  Here we want to use an ngram tokenizer on the user’s name so we can do fun things like search within names so searching on ‘ike’ would return Mike and Ike.
  • defined a completion suggester index in mappings.properties.suggest

Test the setup

That’s the bare bones getting started.  To test out what your indexer is doing, you can simply run a query, specify the indexer you want to test out, and you’ll get a list of all the tokens that are being generated from that indexer.

localhost:9200/users/_analyze?pretty&analyzer=ngram_indexer&text=Michael
Advertisements

Using a single global DB connection in Node.js

“And I’m single, yeah, I’m single,
And I’m single, tonight I’m single,
And I ain’t tripping on nothing, I’m sipping on something,
And my homeboy say he got a bad girl for me tonight” – Lil Wayne

Since Javascript, and in turn Node.js, are single threaded, we can get by with just using one database connection throughout an entire application.  Pretty cool!  We no longer have to open/close/maintain connections and connection pools.  Let’s see how this could be done.

Think Globally, Code Locally

In pure Javascript, any variable you declare that is not defined within another scope (e.g. a function, within an object), will be added to the global scope.  However, because Node wraps each file into its own module with CommonJS, each module does not have direct access to the global scope.  But, Node provides a handy workaround:

global.db = ...

global.varName is accessible from any module.  Perfect.  So we can simply set a db connection on global.db and throw a reuse celebration party!  But before that party, let’s see how we would code this.

Global DB in Express with Mongoose
In this example we will create a single connection to Mongo using the Mongoose library.  In app.js, we define the global connection where we can use the local environment to determine the db uri. We will lazily create it so it will get created only when it is first requested.

// app.js
var express = require('express'); 
var app = express(); 
var mongoose = require('mongoose');

// Configuration 
app.configure('development', function(){ 
  app.set('dburi', 'localhost/mydevdb'); }); 

app.configure('production', function(){ 
  app.set('dburi', 'mongodb://somedb.mongolab.com:1234/myproddb'); }); 

global.db = (global.db ? global.db : mongoose.createConnection(app.settings.dburi));

Now let’s say in another file, we want to create a Mongoose schema for an Alien being.  We simply do it as such:

// otherfile.js
var alienSchema = new Schema({
  planet : { type: String }
});
var Alien = db.model('Alien', alienSchema);

What is happening here is that db.model is looking for db within the current module otherfile.js, not finding it, and then going up the chain till it gets to global.db.  Then boom!  We’ve got aliens.

Closing the connection

Since we want the connection open the entire lifespan of the application, we can simply allow it to terminate upon the application’s termination.  No cleanup needed.  (Caveat: if you know of an explicit way of doing this, perhaps to do some additional cleanup or logging, I’d love to hear about it).

Using Node cronjobs to replace Heroku worker dynos

“Time keeps on slipping, slippin…into the future.”  – Steve Miller Band

Hopscotch.fm relies on data that is pulled in from several music related services.  I have created multiple tasks each of which can be run from the command line, like:

node run manualrun getShows sf

This will get the shows for San Francisco. On Heroku, I was using the Scheduler to automatically run this periodically. All good so far.

The Problem

The solution worked great up until I started needing several of these workers, to get shows, to get artist metadata, to get artist songs, cleanup, and more.  Easy enough I thought.  I just added more tasks to the Heroku Scheduler. Except there is a limit to the free tier on Heroku…

The Surprise (the Heroku bill!)

My Heroku bill was over $70!  How did this happen??  Turns out I had exceeded the free monthly hours with all the worker dynos I had been spinning up.  So I needed a solution quick.  I host hopscotch.fm on nodejitsu so I figured why not just use that.

The Solution (cron!)

Enter node-cron. If you’ve ever used Linux/UNIX cron jobs, it’s nearly identical.  The syntax for dates is the same.  All you need to do is specify the function to run.  Here is a cron job in file cronjobs.js that crunches radio stations for a city:

var cronJob = require('cron').CronJob;
var cruncher = require('./cruncher); // internal class that does the crunching
var jobMetroRadio = new cronJob('00 05 00 * * 1-7', (function() {
  console.log('Starting crunching shows.');
  return cruncher.crunchShows(function() {
    return console.log('Finished crunching shows.');
  });
}), (function() {}), true, time.tzset("America/Los_Angeles"));

Then in app.js I just:

require('./cronjobs');

And lastly add these two packages to package.json and install them

cron  // add to package.json
time  // add to package.json
npm install

The Result
I’ve moved all of the tasks over from Heroku Scheduler onto the nodejitsu deployment and everything is running smoothly. Hooray for cron!