How to Sanitize Inputs for Web App Security in Node.js

URL copied to clipboard
By AstroMacGuffin dated  last updated 
![If your website is hackable, it will be hacked eventually.](/static/img/mm/villains/he-buries-the-competitionSmall.jpg) One of the friendly members of the JavaScript Mastery discord server did me a favor by performing some security auditing on this website. I admit, I was in a rush to launch, and I wasn't in any hurry to spend time on security steps. When I tried the `mongo-sanitize` NPM package it did nothing, so there went my lazy option. But I already had code for stripping symbols from a string, thanks to the search index / relevance-weighted search project. It just needed a little adjustment. Once you have something that can sanitize inputs, you need to use it. And, because every input is different, there's no getting around this part - you have to analyze your input-handling code line-by-line for ways you can be hacked. That means inputs that: - get used for database inputs and queries - get used as filenames - get used for logical control structures Here's a brief primer from someone who can explain it like a newbie, because when it comes to security, I only know so much. In other words this is a starting point, not the end-all-be-all, when it comes to web app security.

Yes, there are other steps to security: configuring your server software to remove potential attack vectors, for example. But today we're just dealing with how to sanitize inputs. First, you need code for the sanitizer function. Then you need to use it. ### The Sanitizer Function/Method I have a `MiscUtils` object in my usual kit, and I added `sanitizeString()` to that object. Notice first that fully a third of the method's overall definition is just laying out the default options. ***Newbie Hint: Remember, object classes don't use the keyword `function` to declare a method, so don't be confused by *not* seeing `function` in the code below.*** ```js sanitizeString( /******* From here to the comment-line below, we're just defining parameters and default options *******/ s, replacer='-', arg = { stripNewlines: true, stripAllWhitespace: true, stripSlashes: true, stripDots: false, stripQuestions: false, stripDollars: true, stripExclamations: false, stripPipes: true, stripQuotes: true, stripBraces: true, stripBrackets: true, stripUnderscores: false, stripDashes: false, stripCommas: false, stripSpecialChars: true, stripParens: true, stripLessGreaters: true, } /******* From here to the comment-line above, we're just defining parameters and default options *******/ ) { if (!s || s === '' || typeof s !== 'string') return ''; try { if (arg.stripSpecialChars) s = s .replace(/@/gi, replacer) .replace(/#/gi, replacer) .replace(/\%/gi, replacer) .replace(/\^/gi, replacer) .replace(/&/gi, replacer) .replace(/\*/gi, replacer) .replace(/=/gi, replacer) .replace(/\+/gi, replacer) .replace(/:/gi, replacer) .replace(/;/gi, replacer); if (arg.stripParens) s = s .replace(/\(/gi, replacer) .replace(/\)/gi, replacer); if (arg.stripLessGreaters) s = s .replace( /* do a replace here for the less than symbol */, replacer) .replace(/>/gi, replacer); if (arg.stripUnderscores) s = s.replace(/_/gi, replacer); if (arg.stripCommas) s = s.replace(/,/gi, replacer); if (arg.stripDashes) s = s.replace(/-/gi, replacer); if (arg.stripBracket) s = s .replace(/\[/gi, replacer) .replace(/\]/gi, replacer); if (arg.stripBrace) s = s .replace(/\{/gi, replacer) .replace(/\}/gi, replacer); if (arg.stripQuotes) s = s .replace(/"/gi, replacer) .replace(/'/gi, replacer) .replace(/`/gi, replacer); if (arg.stripExclamation) s = s.replace(/!/gi, replacer); if (arg.stripPipe) s = s.replace(/\|/gi, replacer); if (arg.stripQuestions) s = s.replace(/\?/gi, replacer); if (arg.stripDots) s = s.replace(/\./gi, replacer); if (arg.stripDollar) s = s.replace(/\$/gi, replacer); if (arg.stripSlashes) s = s.replace(/\//gi, replacer).replace(/\\/gi, replacer); if (arg.stripNewlines) s = s.replace(/\n/gi, replacer); if (arg.stripAllWhitespace) s = s.replace(/\s+/gi, replacer); s = s.trim() return s; } catch (e) { console.log(`Error sanitizing string: ${e}`); return ''; } } ``` Things to note: - if an option is true or evaluates to true, it will strip all instances of that character or group of characters. - if an option is false or evaluates to false -- *including if it's missing from the options `arg`* -- those characters will *not* be stripped. Here's how to use the method. Say you have an input called `username` defined with something like this: ```html <input type="text" name="username" /> ``` In Express.js, assuming a `POST`-method form submission, that `username` field becomes `req.body.username`. All nice and official-looking, but we know it came from user input. We need to sanitize this field before it gets used in a database operation or something. If you were creating a temporary variable `username` from sanitizing the unsanitized field *using default options*, you'd do this: ```js // sanitizeString method is on an object called mu let username = mu.sanitizeString(req.body.username); ``` Don't forget the options, though: your username format may not make sense with the default `sanitizeString` options. To provide options you also must provide a `replacer` string, because hyphens are rubbish in plenty of cases. So here's an example with options: ```js let username = mu.sanitizeString(req.body.username, "" /* empty string */, { stripNewlines: true, stripAllWhitespace: true, stripDollars: true, stripSpecialChars: true, stripSlashes: true, stripDots: true, stripQuestions: true, stripQuotes: true, stripBraces: true, stripBrackets: true, stripCommas: true, stripParens: true, stripLessGreaters: true } ); ``` As you can see, this would be a lot of code to repeat, so if you're going to use a set of options more than once, save it as a variable, an object field, a class static field, or whatever makes sense. ### Where in My Code does the Sanitization *Go?* There are two schools of thought about *when* to use sanitization, and the only thing that matters is being consistent. Well, and also one method is clearly better if your data travels multiple paths in your code for multiple reasons: - You can sanitize when you *receive* the data -- that is, as early in each path of execution flow, as you can confirm you *should* have the input. - You can sanitize when you *use* the data -- that is, as late in each path of execution flow as you can wait, right up at the moment it goes to an API for disk operations, database operations, logical flow, etc. The first option is quicker with smaller apps, but the second option is better in the long run, and also better no matter what. Using the second method, you'll be saving almost every object containing `sanitizeString` options, because this is at least double the work for a small project, but better for growth, especially in the eyes of security. Meanwhile, the risk of sanitizing early is that you might need the original version of the string value, later in your code. So let's say your web app is a Node.js / Express.js server. Let's say you're also using the MongoDB database server. Here are the steps: #### Include MiscUtils (or whatever) Wherever you Need `sanitizeString()` So you've put the `sanitizeString` method into a class definition. Let's say you even called it `MiscUtils.js` and the module exports is a `new MiscUtils()` instance. Then you'd include it in your Node module and/or app script, like this: ```js const mu = require('./MiscUtils.js'); // and/or const sanitizeString = require('./MiscUtils.js').sanitizeString; ``` For the sake of examples, let's say you did both of the above. Now `mu` is a place for reusable options objects, and `sanitizeString()` is a function. #### Make `sanitizeString` Options Let's say you come up with the perfect list of options for your username format. You edit e.g. `MiscUtils.js` and (since the module exports a live object instead of a class) add something like this in its `constructor` method: ```js this.sanitizeUsernameOpt = { // list of sanitizeString options }; ``` #### Go on a Sanitizing Spree Wherever your database calls are happening that involve this user-inputted variable, you sanitize. For example, in processing a login form, your Express.js route would have something like the following: ```js const isVerified = await db.verifyLogin(req.body); ``` That is, you have a `db` object class where you write all your database code, and it has a `.verifyLogin()` method. The `.verifyLogin()` method expects an object as its sole argument: we know `req.body` contains `.username` and `.password` in this part of the program execution flow, **because this is the login route, which is being sent those fields, because those fields are on the login form**. So you need to find your `.verifyLogin()` method. Inside that, find the part where you're doing, or preparing for, the MongoDB query. For example: ```js const query = {$and: [ {username: arg.username}, {password: arg.password}, ]}; ``` ...and turn that into this: ```js const emptyString = ""; const query = {$and: [ {username: sanitizeString( arg.username, emptyString, mu.sanitizeStripAllOpt )}, {password: sanitizeString( arg.password, emptyString, mu.sanitizePasswordOpt )}, ]}; ``` But if you're paying attention, you'll notice our options objects -- `mu.sanitizePasswordOpt` and `mu.sanitizeStripAllOpt` -- which would have to be added. A `.sanitizeStripAllOpt` would be every option of `sanitizeString` set to true, and a `.sanitizePasswordOpt` would be whatever makes sense for you... ...Which comes partially down to the struggle between... ### Portability vs Brevity Different sites are going to have different versions of MongoDB and, to a lesser extent, different versions of Node.js. I think it makes sense to code like it's the stone age when it comes to sanitizing my inputs against potential vulnerabilities, though: and by that I only mean, I write code as if any potential vulnerability, could become an actual vulnerability when running this app at some future web host. So that's the long way of saying, I think it makes sense to strip dollar signs, curly braces, square brackets, and all three quotes, in as many cases as possible, for database security. Then if something is being shown on a browser and you don't want the user to be able to alter and break your website's presentation, or worse, execute unwanted scripts, simply replace the less-than and greater-than symbols with their html-encoded counterparts, `&lt;` and `&gt;`. Security is a constant uphill battle, but you have to start somewhere, and also... why make it easy for your website to be hacked?
Profile image

About the author of this piece, AstroMacGuffin: "I enjoy coding in Node.js and JavaScript as a hobby. I find Node.js to be a welcome escape, after a lifetime of code, with previous experience with PHP and JavaScript on the LAMP stack. I'm one of those people who spends way too much time herding electric sheep in front of a computer. You can find me on the JavaScript Mastery discord, where I'm a moderator, or on the Classic Shadowrun discord, where I'm the owner and founder. "

For more by&nbsp;AstroMacGuffin, click here.


Valid HTML!Valid CSS!Powered by Node.js!Powered by Express.js!Powered by MongoDB!