/ 

How to Sanitize Inputs for Web App Security in Node.js

URL copied to clipboard
By AstroMacGuffin dated  last updated 

If your website is hackable, it will be hacked eventually. One of the friendly members of the JavaScript Mastery discord server did me a favor by performing some security auditing on this website. I admit, I was in a rush to launch, and I wasn't in any hurry to spend time on security steps. When I tried the mongo-sanitize NPM package it did nothing, so there went my lazy option. But I already had code for stripping symbols from a string, thanks to the search index / relevance-weighted search project. It just needed a little adjustment.

Once you have something that can sanitize inputs, you need to use it. And, because every input is different, there's no getting around this part - you have to analyze your input-handling code line-by-line for ways you can be hacked. That means inputs that:

  • get used for database inputs and queries
  • get used as filenames
  • get used for logical control structures

Here's a brief primer from someone who can explain it like a newbie, because when it comes to security, I only know so much. In other words this is a starting point, not the end-all-be-all, when it comes to web app security.


Yes, there are other steps to security: configuring your server software to remove potential attack vectors, for example. But today we're just dealing with how to sanitize inputs. First, you need code for the sanitizer function. Then you need to use it.

The Sanitizer Function/Method

I have a MiscUtils object in my usual kit, and I added sanitizeString() to that object. Notice first that fully a third of the method's overall definition is just laying out the default options.

Newbie Hint: Remember, object classes don't use the keyword function to declare a method, so don't be confused by not seeing function in the code below.

  sanitizeString(
    /******* From here to the comment-line below, we're just defining parameters and default options *******/
    s, replacer='-',
    arg = {
      stripNewlines: true,
      stripAllWhitespace: true,
      stripSlashes: true,
      stripDots: false,
      stripQuestions: false,
      stripDollars: true,
      stripExclamations: false,
      stripPipes: true,
      stripQuotes: true,
      stripBraces: true,
      stripBrackets: true,
      stripUnderscores: false,
      stripDashes: false,
      stripCommas: false,
      stripSpecialChars: true,
      stripParens: true,
      stripLessGreaters: true,
    }
    /******* From here to the comment-line above, we're just defining parameters and default options *******/
  ) {
    if (!s || s === '' || typeof s !== 'string') return '';
    try {
      if (arg.stripSpecialChars) s = s
        .replace(/@/gi, replacer)
        .replace(/#/gi, replacer)
        .replace(/\%/gi, replacer)
        .replace(/\^/gi, replacer)
        .replace(/&/gi, replacer)
        .replace(/\*/gi, replacer)
        .replace(/=/gi, replacer)
        .replace(/\+/gi, replacer)
        .replace(/:/gi, replacer)
        .replace(/;/gi, replacer);
      if (arg.stripParens) s = s
        .replace(/\(/gi, replacer)
        .replace(/\)/gi, replacer);
      if (arg.stripLessGreaters) s = s
        .replace( /* do a replace here for the less than symbol */, replacer)
        .replace(/>/gi, replacer);
      if (arg.stripUnderscores) s = s.replace(/_/gi, replacer);
      if (arg.stripCommas) s = s.replace(/,/gi, replacer);
      if (arg.stripDashes) s = s.replace(/-/gi, replacer);
      if (arg.stripBracket) s = s
        .replace(/\[/gi, replacer)
        .replace(/\]/gi, replacer);
      if (arg.stripBrace) s = s
        .replace(/\{/gi, replacer)
        .replace(/\}/gi, replacer);
      if (arg.stripQuotes) s = s
        .replace(/"/gi, replacer)
        .replace(/'/gi, replacer)
        .replace(/`/gi, replacer);
      if (arg.stripExclamation) s = s.replace(/!/gi, replacer);
      if (arg.stripPipe) s = s.replace(/\|/gi, replacer);
      if (arg.stripQuestions) s = s.replace(/\?/gi, replacer);
      if (arg.stripDots) s = s.replace(/\./gi, replacer);
      if (arg.stripDollar) s = s.replace(/\$/gi, replacer);
      if (arg.stripSlashes)
        s = s.replace(/\//gi, replacer).replace(/\\/gi, replacer);
      if (arg.stripNewlines) s = s.replace(/\n/gi, replacer);
      if (arg.stripAllWhitespace) s = s.replace(/\s+/gi, replacer);
      s = s.trim()
      return s;
    }
    catch (e) {
      console.log(`Error sanitizing string: ${e}`);
      return '';
    }
  }

Things to note:

  • if an option is true or evaluates to true, it will strip all instances of that character or group of characters.
  • if an option is false or evaluates to false -- including if it's missing from the options arg -- those characters will not be stripped.

Here's how to use the method. Say you have an input called username defined with something like this:

<input type="text" name="username" />

In Express.js, assuming a POST-method form submission, that username field becomes req.body.username. All nice and official-looking, but we know it came from user input. We need to sanitize this field before it gets used in a database operation or something. If you were creating a temporary variable username from sanitizing the unsanitized field using default options, you'd do this:

// sanitizeString method is on an object called mu
let username = mu.sanitizeString(req.body.username);

Don't forget the options, though: your username format may not make sense with the default sanitizeString options. To provide options you also must provide a replacer string, because hyphens are rubbish in plenty of cases. So here's an example with options:

let username = mu.sanitizeString(req.body.username, "" /* empty string */, {
    stripNewlines: true, stripAllWhitespace: true, stripDollars: true, stripSpecialChars: true,
    stripSlashes: true, stripDots: true, stripQuestions: true, stripQuotes: true, stripBraces: true,
    stripBrackets: true, stripCommas: true, stripParens: true, stripLessGreaters: true
  }
);

As you can see, this would be a lot of code to repeat, so if you're going to use a set of options more than once, save it as a variable, an object field, a class static field, or whatever makes sense.

Where in My Code does the Sanitization Go?

There are two schools of thought about when to use sanitization, and the only thing that matters is being consistent. Well, and also one method is clearly better if your data travels multiple paths in your code for multiple reasons:

  • You can sanitize when you receive the data -- that is, as early in each path of execution flow, as you can confirm you should have the input.
  • You can sanitize when you use the data -- that is, as late in each path of execution flow as you can wait, right up at the moment it goes to an API for disk operations, database operations, logical flow, etc.

The first option is quicker with smaller apps, but the second option is better in the long run, and also better no matter what. Using the second method, you'll be saving almost every object containing sanitizeString options, because this is at least double the work for a small project, but better for growth, especially in the eyes of security. Meanwhile, the risk of sanitizing early is that you might need the original version of the string value, later in your code.

So let's say your web app is a Node.js / Express.js server. Let's say you're also using the MongoDB database server. Here are the steps:

Include MiscUtils (or whatever) Wherever you Need sanitizeString()

So you've put the sanitizeString method into a class definition. Let's say you even called it MiscUtils.js and the module exports is a new MiscUtils() instance. Then you'd include it in your Node module and/or app script, like this:

const mu = require('./MiscUtils.js');
// and/or
const sanitizeString = require('./MiscUtils.js').sanitizeString;

For the sake of examples, let's say you did both of the above. Now mu is a place for reusable options objects, and sanitizeString() is a function.

Make sanitizeString Options

Let's say you come up with the perfect list of options for your username format. You edit e.g. MiscUtils.js and (since the module exports a live object instead of a class) add something like this in its constructor method:

    this.sanitizeUsernameOpt = {
      // list of sanitizeString options
    };

Go on a Sanitizing Spree

Wherever your database calls are happening that involve this user-inputted variable, you sanitize. For example, in processing a login form, your Express.js route would have something like the following:

  const isVerified = await db.verifyLogin(req.body);

That is, you have a db object class where you write all your database code, and it has a .verifyLogin() method. The .verifyLogin() method expects an object as its sole argument: we know req.body contains .username and .password in this part of the program execution flow, because this is the login route, which is being sent those fields, because those fields are on the login form. So you need to find your .verifyLogin() method. Inside that, find the part where you're doing, or preparing for, the MongoDB query. For example:

      const query = {$and: [
        {username: arg.username},
        {password: arg.password},
      ]};

…and turn that into this:

      const emptyString = "";
      const query = {$and: [
        {username: sanitizeString(
          arg.username, emptyString, mu.sanitizeStripAllOpt
        )},
        {password: sanitizeString(
          arg.password, emptyString, mu.sanitizePasswordOpt
        )},
      ]};

But if you're paying attention, you'll notice our options objects -- mu.sanitizePasswordOpt and mu.sanitizeStripAllOpt -- which would have to be added. A .sanitizeStripAllOpt would be every option of sanitizeString set to true, and a .sanitizePasswordOpt would be whatever makes sense for you…

…Which comes partially down to the struggle between…

Portability vs Brevity

Different sites are going to have different versions of MongoDB and, to a lesser extent, different versions of Node.js. I think it makes sense to code like it's the stone age when it comes to sanitizing my inputs against potential vulnerabilities, though: and by that I only mean, I write code as if any potential vulnerability, could become an actual vulnerability when running this app at some future web host.

So that's the long way of saying, I think it makes sense to strip dollar signs, curly braces, square brackets, and all three quotes, in as many cases as possible, for database security. Then if something is being shown on a browser and you don't want the user to be able to alter and break your website's presentation, or worse, execute unwanted scripts, simply replace the less-than and greater-than symbols with their html-encoded counterparts, &lt; and &gt;.

Security is a constant uphill battle, but you have to start somewhere, and also… why make it easy for your website to be hacked?

Profile image

About the author of this piece, AstroMacGuffin: "I enjoy coding in Node.js and JavaScript as a hobby. I find Node.js to be a welcome escape, after a lifetime of code, with previous experience with PHP and JavaScript on the LAMP stack. I'm one of those people who spends way too much time herding electric sheep in front of a computer. You can find me on the JavaScript Mastery discord, where I'm a moderator, or on the Classic Shadowrun discord, where I'm the owner and founder. "

For more by AstroMacGuffin, click here.

🔍

Valid HTML!Valid CSS!Powered by Node.js!Powered by Express.js!Powered by MongoDB!