A Beyond the Basics Guide
Samer Buna
by Samer Buna
Copyright © 2024 Samer Buna. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.
See http://oreilly.com/catalog/errata.csp?isbn=9781098145194 for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Efficient Node.js, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
The views expressed in this work are those of the author and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
978-1-098-14519-4
When it comes to learning Node, many tutorials, books, and courses tend to focus on the packages and tools available within the Node ecosystem, rather than the Node runtime itself. They prioritize teaching how to utilize popular Node libraries and frameworks, instead of starting from the native capabilities of the Node runtime.
This approach is understandable because Node is a low-level runtime. It does not offer comprehensive solutions, but rather a collection of small essential modules that makes creating solutions easier and faster. For example, a full-fledged web server will have options like serving static files (like images, css files, etc). With the Node built-in http module, you can build a web server that serves binary data, and with the Node built-in fs module, you can read the content of a file on the file system. You can combine both of these features to server static assets, using your own JavaScript code. There’s no built-in Node way to serve static assets under a web server.
Popular libraries that are not part of the Node runtime (such as Express.js, Next.js, and many others with .js in their names) aim to provide nearly complete solutions within specific domains, for example, creating and running a web server (and serving static assets, and many other neat features). Practically, most developers will not be using Node on its own, so it makes sense for educational materials to focus on the comprehensive-solution packages, so learners can skip to the good parts. The common thinking here is that only developers whose job is to write these packages need to understand the underlying base layer of the Node runtime.
However, I would argue that a solid understanding of the Node runtime is essential before utilizing any of its popular packages and tools. Having a deep understanding of the Node runtime allows developers to make informed decisions when choosing which packages to use and how to use them effectively. This book is my attempt to prioritize learning the native capabilities of the Node runtime first, and then use that knowledge to efficiently utilize the powerful packages and tools in its ecosystem.
In this first chapter, I will start with an introduction to the Node runtime and why I believe it is a great option for both backend and frontend development. I’ll discuss both its benefits and limitations. I will also provide instructions on how to install and set up a Node development environment, and execute a Node script. Furthermore, I will provide examples of utilizing some of the built-in modules within the Node runtime, and demonstrate how to install and use a non-built-in package as well.
Throughout the book, I use the term Node instead of Node.js for brevity. The official name of the runtime is Node.js but referring to it as just Node is a very common thing. Don’t confuse that with “node” (with a lower-case n), which is the command we use to execute a Node script.
Ryan Dahl started the Node runtime in 2009 after he was inspired by the performance of the V8 JavaScript engine in the Google Chrome web browser. V8 uses an event-driven model, which makes it efficient at handling concurrent connections and requests. Ryan wanted to bring this same high-performance, event-driven architecture to server-side applications. The event-driven model is the first and most important concept you need to understand about the Node runtime (and the V8 engine as well). I’ll explain it briefly in this section, but we’ll have a chance to talk about it a lot more in Chapter 4.
I decided to give Node a spin and learn more about it after watching the presentation Ryan Dahl gave to introduce it. I think you’ll benefit by starting there as well. Search YouTube for “Ryan Dahl introduction to Node”. Node has changed significantly since then, so don’t focus on the examples, but rather the concepts and explanations.
In its core, Node enables developers to use the JavaScript language on any machine without needing a web browser. Node is usually defined as “JavaScript on backend servers”. Before Node, that was not a common or easy thing. JavaScript was mainly a frontend thing.
However, this definition isn’t really an accurate one. Node offers a lot more than executing JavaScript on servers. In fact, the execution of JavaScript is not done by Node at all. It’s done with a Virtual Machine (VM) like V8 or Chakra. Node is just the coordinator. It’s the one who instructs a VM like V8 to execute your JavaScript.
Node is better defined as a server environment that wraps V8 and provides small modules that can facilitate building software applications with JavaScript
When you write JavaScript code and execute it with Node, Node will pass your JavaScript to V8, V8 will execute that JavaScript and tell Node what the result is, and Node will make the result available to you. In addition to that, Node has a few handy built-in modules that provide easy-to-use asynchronous APIs. Let’s talk about that, and a few other reasons why developers are picking Node over many other options when it comes to creating services for their backends.
V8 is Google’s open source JavaScript engine. It’s written in C++ and used in Google Chrome and in Node. Both Chrome and Node use V8 to execute JavaScript code. V8 is the default VM in Node, but other VMs can be used with Node as well.
The event-driven model in Node (which is also known as the non-blocking I/O model) is based on a single-threaded event loop. There’s a lot to unpack about that statement but the gist is that Node can handle multiple tasks at the same time by registering an “event listener” for each task. When a task is completed, its event listener is triggered. Node utilizes threads and native asynchronous capabilities of the underlying operating system to accomplish that.
The Node event loop is responsible for handling all the asynchronous I/O operations. For example, when a web client sends a request to a Node web server, Node processes the request internally without blocking the main single thread, and when Node is done with the request, it picks up the request listener (which is is a special JavaScript function that’s part of the web server code), and adds it to an event queue. The event loop is a forever ticking loop that waits for V8 single thread to be available, picks an event listener from the event queue, and sends it V8 for processing.
The exact same flow happens when you instruct Node to read a file from the file system, start a timer, encrypt data, and so on. Everything has a listener function (which is also known as a callback function, because Node basically calls it back when it’s ready for it). Every listener function is queued in the event queue. Sometimes the event queue will have multiple pending events which are all ready for processing. That’s why there’s a forever ticking loop, it’ll pick events one by one, and in queue order (first-in, first-out).
In Chapter 2, we’ll go over some code examples of how exactly this flow works with more details around the interactions between Node and V8.
Node comes with feature-rich modules that make it a great platform for hosting and managing servers. These modules offer features like reading and writing files, sending and receiving data over the network, and even compressing and encrypting data. You don’t need to install these modules. They come natively packaged with Node.
The great thing about these modules is that they offer asynchronous APIs that you can use without worrying about threads (thanks to Node’s event-loop). This is really why Node became very popular very quickly. You can do asynchronous programming in Node and do things in parallel without needing to deal with threads. Writing code to deal with threads is not an easy task, and Node was the escape.
The asynchronous nature of Node modules works great with VMs like V8 because these VMs are all single-threaded. In Node (and in web browsers as well), you only get a single precious thread to work with. It’s extremely important to not block that thread (hence the non-blocking model). For example, in a browser, if a website blocks that single thread for, say, 2 seconds, the user cannot scroll up and down during these 2 seconds! In Node, if an incoming HTTP connection to a web server was handled synchronously rather than asynchronously, that’ll block the single thread, and the whole web server cannot handle any other incoming connections while the synchronous operation is active.
Beyond the built-in modules you get with Node, Node has first-class support for C++ addons. You can write C++ code to create high-performing modules, and link them to be used directly within Node.
Node also ships with a powerful debugger and has some other handy, generic utilities that enhance the JavaScript language and provide extra APIs (for example, to create timers, work with data types, and process arrays and objects).
Node ships with a powerful package manager named npm. We did not have a package manager in the JavaScript world before Node. npm was nothing short of revolutionary. It changed the way we work with JavaScript.
You can make a feature-rich Node application just by using code that’s freely available on npm. The npm registry has hundreds of thousands of packages that you can just install and use in your Node servers. npm is a reliable package manager which comes with a simple CLI (the npm command). The npm command offers simple ways to install and maintain third-party packages, share your own code, and reuse it too.
You can install packages for Node from other package registries as well. For example, you can install them directly from GitHub.
Node also comes with a reliable module dependency manager (different from npm). This module dependency manager is also another thing that we did not have in the JavaScript world before Node. JavaScript today has what’s known as ES modules (ES is short for ECMAScript) and Node has first class support for them. In this book, we’ll see examples of both the original module dependency management in Node (named CommonJS), and the new support for ES modules.
Node’s original module dependency management has been available since Node was released and it opened the door to so much flexibility in how we code JavaScript! It is widely used, even for JavaScript that gets executed in the browser, because npm has many tools to bridge the gap between modules written in Node and what browsers can work with today.
npm and Node’s module systems together make a big difference when you work with any JavaScript system, not just the JavaScript that you execute on backend servers or web browsers. For example, if you have a fancy fridge monitor that happens to run on JavaScript, you can use Node and npm for the tools to package, organize, and manage dependencies, and then bundle your code, and ship it to your fridge!
The packages that you can run on Node come in all shapes and forms, some are small and dedicated to specific programming tasks, some offer tools to assist in the life cycles of an application, others help developers every day to build and maintain big and complicated applications. Here are a few example of some of my favorite ones:
ESLint: A tool that you can include in any Node applications, and use it to find problems with your JavaScript code, and in some cases, automatically fix them. You can use ESLint to enforce best practices and consistent code style, but ESLint can help point out potential runtime bugs too. You don’t ship ESLint in your production environments, it’s just a tool that can help you increase the quality of your code as you write it.
Webpack: A tool that assists with asset bundling. The Webpack Node package makes it very easy to bundle your multi-file frontend frameworks application into a single file for production and compile JavaScript extensions (like JSX for React) during that process. This is an example of a Node tool that you can use on its own. You do not need a Node web server to work with Webpack.
Prettier: An opinionated code formatting tool. With Prettier, you don’t have to manually indent your code, break long code into multiple lines, remember to use a consistent style for the code (for example, always use single or double quotes, always use semicolons or no semicolons). Prettier automatically takes care of all that.
TypeScript: A tool that adds static typing and other features to the JavaScript language. It is useful because it can help developers catch errors before the code is run, making it easier to maintain and scale large codebases. TypeScript’s static typing can also improve developer productivity by providing better code auto-completion and documentation in development tools.
All of these tools (and many more) enrich the experience of creating and maintaining JavaScript applications, both on the frontend and the backend. Even if you choose not to host your frontend applications on Node, you can still use Node for its tools. For example, you can host your frontend application with another framework such as Ruby on Rails and use Node to build assets for the Rails Asset Pipeline.
By using Node, you’re committing to the simple and flexible JavaScript language, which is used on every website today. It is a very popular programming language and despite its many historical problems, I believe JavaScript is a good language today.
With Node, you get to have a single language across the full-stack. You use JavaScript in the browser and you use it for the backend as well. There are some subtle but great benefits to that:
One language means less syntax to keep in your head, less APIs and tools to work with, and less mistakes over all.
One language means better integrations between your frontend code and your backend code. You can actually share code between these two sides. For example, You can build a frontend application with a JavaScript framework like React, then use Node to render the same components of that frontend application on the server and generate initial HTML views for the frontend application. This is known as server-side rendering (SSR) and it’s now something that many Node packages offer out of the box.
One language means teams can share responsibilities among different projects. Projects don’t need a dedicated team for the frontend and a different team for the backend. You would also eliminate some dependencies between teams. A full-stack project can be assigned to a single team, The JavaScript People; they can develop APIs, they can develop web and network servers, they can develop interactive websites, and they can even develop mobile and desktop applications. Hiring JavaScript developers who can contribute to both frontend and backend applications is attractive to employers.
While Node has also played a significant role in the growing popularity of JavaScript, the language itself is simple, flexible, easy to learn, and available on every computer (client with browsers, and thanks to Node, servers as well). JavaScript is widely adopted in the programming community, particularly among beginner programmers, coding bootcamps, and startups.
Node’s approach to handling code in an asynchronous and non-blocking manner is a unique model of thinking and reasoning about code. If you’ve never done it before, it will feel weird. You need time to get your head wrapped around this model and get used to it.
Node has a relatively small standard library. This means that developers need to rely on third-party modules to perform most big tasks. There is a large amount of third-party modules available for Node. You need to do some research to pick the most appropriate and efficient ones. Many of these modules are small, which means you’ll need to use multiple modules in a single project. It’s not uncommon for a Node project to use hundreds of third-party modules. While this can enhance maintainability and scalability, it also requires more management and oversight. As modules are regularly updated or abandoned, it becomes necessary to closely monitor and update all modules used within a project, replacing deprecated options and ensuring that your code is not vulnerable to any of the security threats these modules might introduce.
Smaller code is actually why Node is named Node! In Node, we build simple small single-process building blocks (nodes) that can be organized with good networking protocols, to have them communicate with each other and scale up to build large, distributed programs.
Additionally, Node is optimized for I/O and high-level programming tasks but it may not be the best choice for CPU-bound tasks, such as image and video processing, which require a lot of computational power. Because Node is single-threaded, meaning that it can only use one core of a CPU at a time, performing tasks that require a lot of CPU processing power might lead to performance bottlenecks. JavaScript itself is not the best language for high-performance computation, as it is less performant than languages like C++ or Rust.
Node also has a high rate of release and version updates, this can create the need for constant maintenance and updates of the codebase, which can be a disadvantage for long-term projects.
Finally, the language you use in Node, JavaScript, has one big valid argument against it. It is a dynamically typed language, which means objects don’t have explicitly declared types and they can change during runtime. This is fine for small projects but for bigger ones, the lack of strong typing can lead to errors that are difficult to detect and debug and it generally makes the code harder to reason with and to maintain.
The TypeScript language, which can easily be used with Node, is one popular way to mitigate the problems with dynamically-typed JavaScript. It provides a significant advantage over plain JavaScript by mitigating the weakness of dynamic typing and providing developers with powerful tools for creating secure, maintainable code.
If you have Node installed on your computer, you should have the commands node and npm available in a terminal. If you have these commands, make sure the Node version is a recent one (20.x or higher). You can verify by opening a terminal and running the command node -v.
If you don’t have these commands at all, you’ll need to download and install Node. You can download the latest version from the official Node website (https://nodejs.org/). The installation process is straightforward and should only take a few minutes.
For Mac users, Node can also be installed using the Homebrew package manager with the command brew install node.
Another option to install Node is using Node Version Manager (NVM). NVM allows you to run and switch between multiple versions of Node, it works on Mac and Linux, and there’s an NVM-windows option as well.
To get started, open a terminal and issue the node command on its own without any arguments:
$ node
Throughout this book, I use the $ sign to indicate a command line to be executed in a terminal. The $ sign is not part of the command.
This will start a Node REPL session. REPL stands for Read, Eval, Print, Loop. It’s a convenient way to quickly test simple JavaScript and Node code. You can type any JavaScript code in the REPL. For example, type Math.random() and then, press Enter:
Node will read your line, evaluate it, print the result, and loop over these 3 things until you exit the session (which you can do with a CTRL+D).
Note how the “Print” step happened automatically. We didn’t need to add any instructions to print the result. Node will just print the result of each line you type. This is not the case when you execute code in a Node script. Let’s do that next.
We’ll discuss Node’s REPL mode (and command-line options) in detail in Chapter 2.
Create a new directory for the exercises of this book, and then cd into it:
$ mkdir efficient-node $ cd efficient-node
Open up your editor for this directory, then create a file named index.js. Put the same Math.random() line into it:
Math.random();
Now to execute that file, in the terminal, type the command:
node index.js
You’ll notice that the command will basically do nothing. That’s because we have not outputted anything from that file. To output something, you can use the console object, which is similar to the one available in browsers:
console.log( Math.random() );
Executing index.js now will output a random number
Note how in this simple example we’re using both JavaScript (Math object), and an object from the Node API (console). Let’s look at a more interesting example next.
The console object is one of many top-level global objects that we can access in Node without needing to declare any dependencies. Node has a global object similar to the window object in browsers. The console object is part of the global object. All properties of the global object can be accessed directly; console.log instead of global.console.log (which also works). Other examples of global objects in Node are process and timer functions like setTimeout and setInterval. We’ll discuss these in Chapter 2.
You can create a simple web server in Node using its built-in http module.
Create a server.js file and write the following code in there:
const http = require('http');
const server = http.createServer((req, res) => {
res.end('Hello World\n');
});
server.listen(3000, () => {
console.log('Server is running...');
});
This is Node’s version of a “Hello World” example. You don’t need to install anything to run this script. This is all Node’s built-in power.
When you execute this script:
$ node server.js
Node will run a web server, and you’ll notice that the Node process does not exit in that case. Since the script we’re executing has a “listener” that needs to run in the background.
Let’s decipher this simple web server example:
The require function (on the first line) is what you use in Node to manage the dependencies of modules. It allows a module (like server.js) to load and use the exports of another module (like http). This web server example depends on the built-in http module to create a web server. There are many other libraries that you can use to create a web server, but this one is built-in. You don’t need to install anything to use it, but you do need to require it.
In a Node’s REPL session, built-in modules (like http) are available immediately without needing to require them. This is not the case with executable scripts. You can’t use modules (including built-in ones) without requiring them first.
The second line creates a server constant by invoking the createServer function from the http module. This function is one of many functions that are available under the http module’s API. You can use it to create a web server object. It accepts an argument that is known as the Request Listener. The request listener is a simple function that Node will invoke every time there is an incoming connection request to the web server.
This is why this listener function receives the request object as an argument (named req above but you can name it whatever you want). The other argument this listener function receives, named res in the example, is a response object. It’s the other side for a request connection. We can use the res object to write things back to the requester. It’s exactly what our simple web server is doing. It’s writing back — using the .end method — the Hello World string.
The .end method can be used as a shortcut to write data and then end the request in one line.
The createServer function only creates the server object. It does not activate it. To activate this web server, you need to invoke the listen method on the created server.
The listen method accepts many arguments, like what OS port and host to use for this server. The last argument for it is a function that will be invoked once the server is successfully running on the specified port. The example above just logs a message to indicate that the server is running successfully at that point.
While the server is running, if you go to a browser and ask for an http connection on localhost with the port that was used in the script (3000 in this case), you will see the Hello World string that this example had in its request listener function.
Both functions passed as arguments to createServer and listen are examples of events that get queued in Node’s event queue and later picked up by the event loop when V8 is ready to execute them. It’s easy to understand these simple examples without the complexity of how things work in the background, but when the code gets more complicated, this understanding help avoid critical errors.
Let’s now look at an example of how to use an npm module in Node. Let’s use the popular lodash module which is a JavaScript utility library with many useful methods you can run on numbers, strings, arrays, objects, and more,
First, you need to download the module. You can do that using the npm install command:
$ npm install lodash
This command will download the lodash module from the npm registry, and then place it under a node_modules folder (which it will create if it’s not there already). You can verify with an ls command:
$ ls node_modules
You should have a folder named lodash in there.
Now in our Node code, we can require the lodash module to use it. For example, lodash has a random method that can generate a random number between any 2 numbers we specify for it. Here’s an example of how to use it:
const _ = require(lodash);
console.log( _.random(1, 99) );
Running this script, you’ll get a random number between 1 and 99.
The _ is common name to use for lodash, but you can use any name.
Since we called the require method with a non built-in module lodash, Node will look for it under the node_modules folder. Thanks to npm, it’ll find it.
In a team Node project, when you make the project depend on a third-party module, you need to let other developers know of that dependency. You can do so in Node using a package.json file at the root of the project.
With a package.json file, when you npm install a module, the npm command will also list the module and its current version in package.json, under a dependencies section. When other developers pull your code, they can run the command npm install without any arguments, and npm will read all the dependencies from package.json and install them in the node_modules folder.
The package.json file also contains information about the project, including the project’s name, version, description, and more. It can also be used to specify scripts that can be run from the command line to perform various tasks, like building or testing the project.
Here’s an example of a package.json file:
{
"name": "efficient-node",
"version": "1.0.0",
"description": "A comprehensive guide to learning the Node.js runtime from scratch",
"license": "MIT"
"scripts": {
"start": "node index.js"
},
"dependencies": {
"lodash": "^4.17.21"
},
}
You can create a package.json file for a Node project using the npm init command:
$ npm init
This command will ask a few questions and you can interactively supply your answers (or press Enter to keep the defaults, which often are good because npm tries to detect what it can about the project).
You can use npm init -y to generate your package.json file with the default values (the y is for yes to all questions).
Now that the project has a package.json file, npm install a new module (for example, express) and see how it gets written to the package.json file. Then npm uninstall the module and see how it gets removed from package.json.
You can also install a module that’s only needed in the development environment, but not in production. An example of that is eslint. To install eslint as a development dependency only, you add a --save-dev argument (or -D for short) to the npm install command.
$ npm install -D eslint
This will install eslint in the node_modules folder, and document it as a development dependency under a devDependencies section in package.json. This is where you should place things like your testing framework, your formatting tools, or anything else that you use only while developing your project.
In a production machine, development dependencies are usually ignored. The npm install command has a --production flag to make it ignore them. You can also use the NODE_ENV environment variable and set it to “production” before you run the npm install command. We’ll learn more about Node environment variables in Chapter 2.
The require method is used by Node to implement the CommonJS module system, which is the default module system used in Node, but Node also supports the ES module system (which is part of JavaScript itself).
Let’s go through another example, but this time, write it using ES modules.
To create a feature-rich web server in Node, one popular options is Express.js (available as express from npm). With the express module, you can easily handle routing, create and use middlewares, and handle many other common web server functionalities.
You’ll need to install express as a new dependency:
$ npm install express
This will download express and extract it under the node_modules folder, but if you take a look at what’s under node_modules now, you’ll notice that there are a lot more modules there. The express module depends on all these other modules, and our little example project now does too, because it depends on express.
Since we’re going to use ES modules, we need to use the .mjs file extension to signal to Node that we’re using the new module system.
In a server.mjs file, write the following code
import express from 'express';
const app = express();
app.get('/', (req, res) => {
res.send('Hello Express');
});
export default app;
Note the use of import/export statements. This is the syntax for ES modules. You use import to declare a module dependency and export to define what other modules can use when they depend on your module.
In this example, the server.mjs module exports an app object, which we created using the express module, and made it able to handle connections to the root path on the server.
To use this module, just like we imported express into server.mjs, we now need to import the server.mjs module itself.
In an index.mjs file, write the following code:
import app from './server.mjs';
app.listen(3000, () => {
console.log('Server listening on http://localhost:3000');
});
The “./” in the import line signals to Node that this import is a relative one. Node expects to find the server.mjs file in the same folder where index.mjs is. You can also use a “../” to make Node look for the module up one level, or “../../” for two levels, and so on. Without “./” or “../”, Node assumes that the module you’re trying to import is either a built-in module, or a module that exists under the node_modules folder.
With this code, the index.mjs module depends on the server.mjs module, and uses its default export (app) to run the server on port 3000.
You can execute this code with:
$ node index.mjs
This will start an Express.js web server on port 3000 and log a message to the console when the server is ready. if you go to http://localhost:3000/ in the browser, you will see the Hello Express string that this example had in its root path handler function.
If you want to use the .js extension with ES modules, you can configure Node to assume that all .js files are ES modules. For that, you can add a “type” property in package.json and give it the value of “module” (the default value for it is “commonjs”):
"type": "module"
With that, ES module files can use the .js extension.
Node is a powerful framework for building network applications. Its event-driven, non-blocking I/O model, single-threaded event loop, and built-in module system make it easy for developers to create efficient and scalable applications.
Node wraps a VM like V8 to enable developers to execute JavaScript code in a simple way.
Node built-in modules provide easy-to-use asynchronous APIs. Node’s module system allows developers to organize their code into reusable modules. These modules can be imported and used in other parts of the application.
Node has a large and active community that has created many popular modules that can be easily integrated into Node projects. These modules can be found and downloaded from the npm registry.
In this chapter, we will first get comfortable with Node’s CLI and REPL mode, then will learn the fundamentals of how modules work in Node. We’ll see examples of functions that use Node’s concurrency model, then learn how Node’s event loop works with events and how event functions get executed when their events are triggered.
In Chapter 1, we used the node command briefly to explore Node’s REPL mode and then to execute Node scripts. The node command has many options and its behavior can be customized. It also supports arguments and environment variables to further customize what it does and pass data from the operating system environment to Node’s process environment. Let’s take a look:
In the terminal, type
$ node -h | less
This will output the “help” documentation for the command (on page at a time because we piped the output on the less command). I find it useful to always get myself familiar with the help pages for the commands I use often.
Usage: node [options] [ script.js ] [arguments]
node inspect [options] [ script.js | host:port ] [arguments]
Options:
- script read from stdin (default if no
file name is provided, interactive mode
if a tty)
-- indicate the end of node options
--abort-on-uncaught-exception
aborting instead of exiting causes a
core file to be generated for analysis
--build-snapshot Generate a snapshot blob when the
process exits. Currently only supported
in the node_mksnapshot binary.
-c, --check syntax check script without executing
--completion-bash print source-able bash completion
script
-C, --conditions=... additional user conditions for
conditional exports and imports
--cpu-prof Start the V8 CPU profiler on start up,
:
The first 2 lines specify how to use the node command. Anything in square brackets is optional, which means, according to the first line, that we can use the node command on its own without any options, scripts, or arguments. That’s what we did to start a REPL session. To execute a script, we used the node script.js syntax (“script” can be any name there).
What’s new here is that there are options and arguments that we can use with the command. Let’s talk about these.
The second usage line is to start a terminal debugging session for Node. While that’s sometimes useful, in Chapter 6, I’ll show you a much better way to debug code in Node.
In the help page, right after the usage lines, there is a list of all the options that you can use with the command. Most of these options are advanced, but knowing of their existence is a helpful reference. You should scan through this list just to get a quick idea of all the types of things that you can do, but let me highlight a few of the options that I think you should be aware of.
The -c option (or --check) lets you check the syntax of a Node script without running that script. An example use of that option is to automate a syntax check before sharing code with others.
The -e and -p options (or --eval and --print) can both be used for executing code directly from the command line. I like the -p one more because it executes and prints (just like in the REPL mode). To use these options, you pass a string of Node code in quotes. For example:
$ node -p "Math.random()"
This is handy, as you can use it to create your own powerful commands (and alias them if you want). For example, say you need a command to generate a unique random string (to be used as a password maybe). You can leverage Node’s crypto module in a short -p one liner:
$ node -p "crypto.randomBytes(16).toString('hex')"
Pretty cool, isn’t it!
Note how the crypto module is available to the -p option without needing to require it (just like in the REPL mode).
How about a command to count the words in any file?! This one will help us understand how to use arguments with the node command:
$ node -p "fs.readFileSync(process.argv[1]).toString().split(' ').length" ~/.bashrc
Don’t panic. There’s a lot going on with this one. It leverages the powers of both Node and JavaScript. Go ahead and try it first. You can replace ~/.bashrc with a path to any file on your system.
Let’s decipher this one a bit:
The readFileSync takes a file path as an argument and synchronously returns a binary representation of that file’s data. That’s why I chained a .toString call to it, to get the file’s actual content (in UTF-8). Furthermore, instead of hardcoding the file path in the command, I put the path as the first argument to the node command itself and used process.argv[1] to read the value of that argument (see explanation of that next). This allows the word-counting one-liner to be generic. We can alias it without the path argument and then use the alias with a path argument.
Then once I have the content of the file, I use JavaScript’s split method, which is available on any string, to split the content using spaces, giving me an array of words. Then I just counted those with a .length call to estimate the number of words.
The -r option (or --require) allows you to require a module before executing the main script. This is useful if you need to load a specific module before running your code or if you want to set up certain configurations or variables.
For example, let’s say you have a Node project that requires the use of a module called dotenv, which loads environment variables from a file. Normally, you would need to include something like require('dotenv').config() at the beginning of your main file to use the dotenv module. However, with the -r option, you can load the module automatically without having to add it to each file:
$ node -r dotenv/config index.js
The --watch option allows you to watch a file (and its dependencies) for changes. It automatically restarts Node when a change is detected. This is very useful in development environments. You can test it with any of the files we wrote so far. For example, to run the Express.js example in watch mode, you can run:
$ node --watch index.mjs
This will start the server in watch mode. Make a change to the server.mjs file (change the “Hello Express” string, for example) and notice how the node command will automatically restart.
The --test option makes Node look for and execute code that’s written for testing. Node uses a simple naming convention for that. For example, it’ll look for any files named with a .test.js suffix, or files whose names begin with test-.
There are a lot more options, but most of them are for advanced use. It’s good to be aware of them so that in the future, you can look up if there’s one particular option that might make a task you’re doing simpler.
Since Node is a wrapper around V8, and V8 itself has CLI options, the node` command accepts many V8 options as well. The list of all the V8 options you can use with the node command can be printed with:
$ node --v8-options | less
This is an even bigger list! You can set JavaScript harmony flags (to turn on/off experimental features), you can set tracing flags, customize the engine memory management, and many other customizations. As with the node command options, it’s good to know that all these options exist.
Toward the end of the node -h output, you can see a list of environment variables, like NODE_DEBUG, NODE_PATH, and many more. Environment variables are another way to customize the behavior of Node or make custom data available to the Node process (similar to command arguments)
Every time you run the node command, you start an operating system process. In Linux, the command ps can be used to list all running process, if you run it while a Node process is running (like the Express.js example), one of the listed process will be Node (and you can see its process ID, and stop it from the terminal if you need to). Here’s a command to output process details and filter the output for processes that have the word “node” in them:
$ ps -ef | grep "node"
Node’s global process object represents a bridge between the Node environment and the operating system environment. We can use it to exchange information between Node and the operating system. In fact, when you console.log a message, under the hood, the code is basically using the process object to write a string to the operating system stdout (standard output) data stream.
Environment variables are one way to pass information from the operating system environment (used to execute the node command), to the Node environment, and we can read their values using the env property of the process object.
Here’s an example to demonstrate that:
$ NAME="Reader" node -p "'Hello ' + process.env.NAME"
This will output “Hello Reader”. It sets an environment variable NAME then reads its value with process.env.NAME. You can even set multiple environment variables if you need, either directly from the command line like this example, or using the Linux export command prior to executing the node command:
$ export GREETING="Hello"; export NAME="Reader"; \ node -p "process.env.GREETING + ' ' + process.env.NAME"
In Linux, you can use a semicolon to execute multiple commands on the same line, and \ to split a command into multiple lines.
You can use environment variables to make your code customizable on different machines or environments. For example, the Express.js example in Chapter 1 hard-coded the port to be 3000. However, on a different machine, 3000 might not be available, or you might need to run the server on a different port in a production environment. To accomplish that, you can modify the code to use process.env.PORT ?? 3000 instead of just 3000 (in the listen method) and then run the node command with a custom port when you need to:
$ PORT=4000 node index.mjs
Note that if you don’t specify a port, the default port would be 3000 because I used the ?? (nullish) operator to specify a value when process.env.port does not have one. This is a common practice.
You can’t use Node’s process.env object to change an operating system environment variable. It’s basically a copy of all the environment variables available to the process.
The list of environment variables shown toward the end of node -h output are Node’s built-in environment variables. These are variables that Node will look for and use if they have values. Here are a few examples:
NODE_DEBUG can be used to tell Node to output more debugging information when it uses certain libraries. We give it a comma-separated list of modules to debug, for example, with NODE_DEBUG=fs,http, Node will start outputting debugging messages when the code uses either the fs or http modules. Many packages support this environment variable.
NODE_OPTIONS is an alternative way to specify the options Node supports instead of passing them to the command line each time.
NODE_PATH can be used to simplify import statements by using absolute paths instead of relative ones. We’ll see an example of that later in the chapter.
In Node’s REPL mode, as we learned in Chapter 1, you can type any JavaScript code, and Node will execute it and automatically print its result. This is a convenient way to quickly test short JavaScript expressions (and it works for bigger code too). However, there are a few other helpful things you can do in REPL mode beyond the quick tests.
In REPL mode, you usually type an expression (for example: 0.1 + 0.2), and hit Enter to see its result. You can also type statements that are not expressions (for example: let v = 21;) and when you hit Enter, the variable v` will be defined, and the REPL mode will print undefined since that statement does not evaluate to anything. If you need to clear the screen, you can do so with CTRL+L.
If you try to define a function, you can write the first line and hit Enter, and the REPL mode will detect that your line is not complete, and it will go into a multiline mode so that you can complete it. Try and define a small function to test that.
The REPL multiline mode is limited but there’s an integrated basic editor available within REPL as well. While in REPL mode, type .editor to start the basic editor mode, then you can type as many lines of code as you need, you can define multiple functions, or paste code from the clipboard, then, when you are done, hit CTRL+D to the have Node execute all the code you typed in the editor.
The .editor command is one of many REPL commands which you can see by typing the .help command:
> .help .break Sometimes you get stuck, this gets you out .clear Alias for .break .editor Enter editor mode .exit Exit the REPL .help Print this help message .load Load JS from a file into the REPL session .save Save all evaluated commands in this REPL session to a file Press Ctrl+C to abort current expression, Ctrl+D to exit the REPL
The .break command (or its .clear alias) lets you get out of some weird cases in REPL sessions. For example, when you paste some code in Node’s multiline mode and you are not sure how many curly braces you need to get to an executable state. You can completely discard your pasted code by using a .break command. This saves you from killing the whole session to get yourself out of situations like these.
The .exit command exits the REPL or you can simply press Ctrl+D.
The .save command enables you to save all the code you typed in one REPL session into a file. The .load command enables you to load JavaScript code from a file and make it all available within the REPL session. Both of these commands take a file name as an argument.
One of my favorite things about Node’s REPL mode is how I can inspect basically everything that’s available natively in Node without needing to require them. All the built-in modules (like fs, http, etc) are defined globally and you can use the TAB key to inspect their APIs.
Just like in a terminal or editor, hitting the TAB key once in a REPL session will attempt to auto-complete anything you partially type. Try typing cr and hit TAB to see it get auto-completed to crypto. Hitting the TAB key twice can be used to see a list of all the possible things you can type from whatever partially-typed text you have. For example, type a and hit TAB twice to see all the available global objects that begin with a.
This is great if you need to type less and avoid typing mistakes, but it gets better. You can use the TAB key to inspect the methods and properties available on any object. For example, type Array. and hit TAB twice to see all the methods and properties that you can use with the JavaScript Array class. This works with Node modules as well. Try it with fs. or http..
It even works with objects that you create. For example, create an empty array using let myArr = [];, then type myArr. and hit TAB twice to see all the methods available on an array instance.
Best of all, TAB discoverability works on the global level, if you hit TAB twice on an empty line, you get a list of everything that is globally available. This is a big list, but it’s a useful one, it has all the globals in the JavaScript language itself (like Array, Number`, and Math), and it has all the globals from Node (like process and setTimeout), and it also lists all the core modules that are available natively in Node (like fs and http).
AbortController AbortSignal AggregateError Array ArrayBuffer Atomics BigInt BigInt64Array BigUint64Array Blob Boolean BroadcastChannel Buffer ByteLengthQueuingStrategy CompressionStream ...
In the list of all global things, you’ll notice an underscore character _. This is a handy little shortcut in REPL that stores the value of the last evaluated expression. For example, after executing a Math.random() line, you can type _ to access that same random value. You can even use it in any place where you use a JavaScript expression. Try let random = _;.
A module is simply a reusable block of code. Something you can include and use in any application, as many times as you need. In Node, a module can be a single file or a group of files with a main one. There’s always a main file in a Node module, and that’s the file that we “require” (or “import”). Modules have public APIs. When we require a module, we usually get back an object that represents the module’s API.
For the rest of this section, I’ll mostly use the terminology and concepts for CommonJS modules. ES modules are similar but I’ll point out some differences too.
To understand one important aspect about Node modules, let’s create a new module, name it main.js and put the following line in it:
console.log(arguments);
What do you think executing this file will output?
If you don’t know that all Node modules are wrapped in special functions, you’d say undefined. But the output of that line will reveal 5 argument values!
You need to run main.js as a CommonJS module. If you have the type property in package.json set to module, you need to change it to commonjs or remove it (the default is commonjs).
Node wraps every module implicitly with a function. When you execute a module, Node calls that function, and - as the output reveals - passes 5 arguments to that function as well.
You can actually see the wrapping function detail if you print the value of require('module').wrapper (You can do that in a REPL session).
(function(exports, require, module, __filename, __dirname) {
// Module code actually lives in here
});
When you use the exports/require/module/__filename/__dirname keywords in a module, you’re not using a “global” variable, you’re just using the implicit wrapper function’s arguments.
Similar to CommonJS module wrapping, ES modules are executed in an implicit function scope, but you can’t access the arguments keyword there, and the 5 arguments are not defined in ES module scopes.
The __filename value has the name of the file. The __dirname value has the path to the directory where the file is hosted.
The exports, require, and module arguments are Node’s way to manage a module’s API. To understand them, let’s create another module in the same directory as main.js, let’s name this one config.js. Usually, you’d put any configuration logic in a separate module like that.
Since config.js is yet another module that will be wrapped by Node, it’ll have the 5 arguments as well. Let’s console.log the exports argument in config.js and execute the file with node config.js:
console.log(exports)
As you can see, the value for exports is simply an empty object, and we can change that object and add properties to it, just like we can change any JavaScript variable.
There are 2 ways to execute a module in Node. So far we used the first and main way, which is to specify the file path for the node command. The other way to execute a module is through the require argument (which is a function), one module can require another module using that function, for example, the main.js module, can require the config.js module:
require('./config.js');
We invoke the require function with the path to the module we’re interested in. The path can be a relative one when it starts with a ., or an absolute one (for example: /Users/samer/efficient-node/main.js).
When we execute main.js now using the node command, we’ll see the console.log line from config.js.
Now we can say that the main module “depends” on the config module, or that the config module is a dependency for the main module. This is where the term “dependency management” comes from. We are managing the dependencies of a module here and bringing one module’s API to use in another module.
Let’s define the API for the config module. Let’s define a static property and a function property:
exports.PORT = process.env.PORT ?? 8080;
exports.SERVER_URL = (host = process.env.HOST ?? "localhost") =>
`http://${host}:${exports.PORT}`;
The exports argument in CommonJS modules is an alias to module.exports which is initialized as an empty object. The official API for the module is the module.exports value. As long as that value is an object, we can use the exports alias to define the API. In some cases, you might need the top-level API object to be a function or a class, or anything else that’s not a simple aliased object. In these cases, you’ll need module.exports to define the API (we’ll see an example of that soon).
When we use the require function in main.js to get the API for config.js, we’re basically invoking the wrapping function for config and getting back the value of module.exports. It’s a bit more complicated than that, but that is a good simplification to remember.
Let’s capture that value and print it:
const config = require('./config.js');
console.log(config);
When you execute main.js now, you’ll see the 2 properties we defined in config.js (PORT and SERVER_URL).
Note how I used process.env variables to make the configurations customizable on different environments. I also made SERVER_URL a function that receives a host argument, which is customizable through the environment as well. Making a configuration value a function allows it to be customizable at run time.
To understand another concept about how Node modules work, let’s repeat the require line in main.js multiple times:
require('./config.js');
require('./config.js');
require('./config.js');
Given these 3 require lines, when we execute main.js, how many times will the console.log in config.js be outputted?
The answer is not 3 times. It’ll only be outputted once.
Modules in Node are cached after the first call. A module is executed the first time you require it, then when you require it again, Node loads it up from a cache.
If you look at front-end applications, like React for example, all component files require the React module, and that’s okay, because only the first require will do the work, the rest will use the cache.
But what if I do want the console.log message to show up multiple times every time we require config.js?
You can actually clear the modules cache, but generally, that’s not a good practice. However, you can make the top export of config.js a function instead of an object, put all the code there inside the function, and call the function every time you need the code to be executed. The cache, in that case, will cache the definition of the function. The Non-object APIs sidebar has an example of that.
When you require a module in Node, Node uses the following procedure to determine how and where to look for the required module:
If the module does not start with a . (denoting a relative path) or a / (denoting an absolute path), Node will first check if the module is a core one (like fs or http). If it is, it’ll load it directly.
If the module is not a core one, Node will look for it under node_modules folders starting from the directory where the requiring module is, and going up in the folders hierarchy. For example, if the requiring module is in /User/samer/efficient-node/src, Node will first look under src for a node_modules folder, if it does not find one, it’ll look next under efficient-node, and so on all the way to the root path.
You can use this lookup nature to localize modules dependencies by having multiple node_modules folders in your project, but that generally increases the complexity of the project.
You can also use this lookup nature to have multiple projects share a node_modules folder by placing that folder in a parent directory common to all projects, or even have a global node_module folder for all projects on your machine. While this might be useful in some cases, having a single node_modules folder per project is the standard and recommended practice.
If the required module starts with a . or /, Node will look for it in the relative or absolute directory specified by the path.
If you set the NODE_PATH environment variable before executing a script. Node will first look for required modules in the paths specified by NODE_PATH (which can be a single path, or multiple paths separated by a comma). This can be useful to use short absolute paths instead of confusing relative ones. For example, with NODE_PATH set to src, you can require a module under src using require('module') even when the requiring module is multiple levels deep under src, instead of doing something like require('../../../module').
Besides JavaScript files, you can also require JSON files in Node. When you require a JSON file, you get back a JavaScript object representing the data in the JSON file.
You can delay the execution of a code block, or make it repeat regularly using timer functions in Node like setTimeout or setInterval. These functions behave very similarly to how they do in browser environments.
A timer function receives a function as an argument. Here’s an example:
const printGreeting = () => console.log('Hello');
setTimeout(printGreeting, 4_000);
This code uses the setTimeout timer function to delay the printing of “Hello” by 4 seconds (the second argument to setTimeout is the delay period in milliseconds).
The printGreeting function (which is passed as the first argument to setTimeout) is the function whose execution will be delayed. This is usually referred to in Node as a callback function.
If we run this script with the node command, Node will pause for 4 seconds and then it’ll print the greeting and exit after that.
If you need to delay the execution of a function that receives arguments, you can pass its arguments starting from the third argument to setTimeout.
To repeat the execution of a block of code, you can use the setInterval timer function. If we replace setTimeout with setInterval in the last example, Node will print the “Hello” message every 4 seconds, forever.
All timer functions can be canceled once they are defined. When you call a timer function, you get back a unique timer ID. You can use that timer ID to cancel the scheduled timer. We can use clearTimeout(timerId) to stop timers started by setTimeout, and clearInterval(timerId) to stop timers started by setInterval.
For example, in this code:
const timerId = setTimeout(
() => console.log('Hello'),
0,
);
clearTimeout(timerId);
Even though we started a timer to print a message after 0 milliseconds, that message will not be printed at all because we canceled the timers right after it was defined.
Node supports another function that sets a timer with a 0 milliseconds delay, it’s named setImmediate and it does not receive a delay argument.
0-milliseconds delayed code is a way to “schedule” code to be immediately invoked when all the synchronous code defined after it is done executing. This is an example of why Node is “non-blocking”. You can basically define code to be executed in a way that does not block the code after it. Here’s an example to understand that:
setTimeout(
() => {
for(let i=0; i <= 1_000_000_000; i++) {
// ...
}
},
0,
);
console.log('Hello');
In this example, although we defined a loop that ticks 1 billion times, that code will not block the printing of the Hello message. The printing will happen first, then the big loop will be executed.
I’m using a big loop here as a simplification of something that’ll take a long time to execute, but in practice, you should never use a big loop like that synchronously in Node, because Node is single-thread, any loop like that will actually block the code after it. For example:
setTimeout(
() => console.log('Hello!'),
0,
);
for(let i=0; i <= 1_000_000_000; i++) {
// ...
};
Here, even though the printing of “Hello” is scheduled to be executed immediately, it will not. Node will have to wait on the for loop to finish first, and then, a few seconds later (when I tested this on my machine), it’ll execute the delayed function.
This is a general observation about timer functions, their delays are not guaranteed to be exact, but rather a minimum amount. Delaying a function by 10 milliseconds, means that the execution of the function will happen after a minimum of 10 milliseconds, but possibly longer depending on the code that comes after it!
Why exactly does a for loop block the code that was scheduled before it? It’s time to dive into the details of Node’s concurrency model.
We learned that Node uses a single-threaded event loop for its non-blocking nature. To understand how that is achieved, we need to learn about a stack and a queue! The stack is known as “The Call Stack”, and the queue is known as “The Event Queue”.
The call stack is part of V8 (not Node), and it’s how V8 manages function calls. A stack is a last-in/first-out data structure. Every time we call a function in our code, a reference to that function is placed on the call stack. When you nest function calls (when functions call other functions), the function references are stacked in the call stack. Then V8 will pop one function at a time (from the top of the stack) to complete the initial call.
Any JavaScript code you write in Node has to be placed in the call stack for V8 to execute it. The call stack is single-threaded, which means when there are functions in the call stack, everything else (including event-driven callbacks) will have to wait until the call stack is available again.
This is exactly why the for loop in the previous example blocked the execution of a function that was scheduled to be immediately executed. We simply made the call stack busy with that loop and you should never do that. Any code that needs to run for a long time should be done with either asynchronous tools, or in its own worker thread (more on that later).
When an asynchronous function like setTimeout is placed on the call stack and it’s time for it to be popped, Node will take control of it, freeing the call stack to pop the next stacked function if any. Asynchronous functions usually have a callback function that needs to be invoked once the asynchronous function is done.
Callback functions can be generalized under the “event” terminology. We define an event, and a function to be executed after that event. For the timer case, the event was “time has passed”, but other events can include user input, changes in system state, or messages from other parts of the program.
This is why there is an “Event Queue” in this structure, Node queues the event functions that are ready to be executed in a queue. When the timer is ready, Node will queue its callback function into the event queue. Multiple event functions can be queued to later be processed in order (a queue is a first-in/first-out structure).
This is where the event loop comes into action. The event loop is a simple infinite loop, continuously ticking to monitor both the call stack and the event queue. When the call stack is free, and there are queued functions in the event queue, the event loop takes the top function in the queue, and places it on the call stack for V8 to execute it in our program. The event loop keeps doing that until there are no functions left in the event queue, in which case, the Node process will exit.
Node CLI has many powerful options that we can control. We can also pass arguments to it, set environment variables before running it, and both of these options allow us to pass data from the operating system environment to a running Node process. Node’s process object is the bridge.
Node’s REPL mode is a good way to explore everything you can use in Node, and take a quick look at the API of anything, including core modules, installed modules, and even objects you instantiate.
CommonJS Modules in Node are implicitly wrapped in a function and are passed 5 arguments. We use the require function (which is one of the 5 arguments), to make modules depend on each other and get access to their APIs. Node manages a cache for all required modules. To discover where a required module is, Node follows a predefined set of rules depending on the path of the module. A path can be a relative one, an absolute one, or just a name. For the latter case, Node looks for the module in node_modules folders.
Node’s event loop handles asynchronous tasks using the call stack and event queue. The call stack is a data structure managed by V8 that tracks function calls. Any JavaScript code in Node must be placed in the call stack for V8 to execute it. The event queue is used to handle asynchronous tasks such as timers or I/O operations. When an asynchronous function is ready, its callback function is registered to the event queue. The event loop monitors the call stack and event queue, and when the call stack is free, it pops the first function off the event queue and adds it to the call stack for execution.
In Chapter 1, we briefly learned about Node’s default package manager, npm. It’s now time to take a deeper look and get comfortable finding, using, and creating packages for Node.
The term “package” is what the software world uses to describe a folder that contains code. In Node, that folder will also have a package.json file that describes the metadata and dependencies of the package.
The term “module” refers to a single file or a collection of related files that encapsulate a set of functionality. Modules allow developers to organize their code into separate and reusable units. A Node package often represents a single Node module, but some packages have more than one module.
A package usually refers to “external” code that a project depends on, but I think a better word to describe package code is “generic”. You can make pieces of your own code generic and extract them into a package that you can then use in many projects.
If packages are just folders, why exactly do we need a “package manager” for them?
Keeping track of these package folders becomes challenging when there are many of them, and when these packages depend on other packages. This is especially true for a team of developers working on the same Node project. Package management tools provide a simplified and systemic approach to handling the common tasks around packages. They provide simple commands to install, update, and remove packages, and to ensure that a project has exactly what it needs to function correctly, and similarly on all machines that are running it.
More importantly, package management tools can manage any conflicts among all the dependencies in the project, which are usually referred to as “the dependency tree”. It’s a tree because a project has a main list of dependencies, and these main dependencies have their own dependencies, and so on. The term “transitive dependency” is often used to refer to all the dependencies in a project that are beyond the first level of the dependency tree.
npm has long been the default tool for managing packages and their dependencies in Node projects, but today, there are a few alternatives, the most popular of which is yarn. npm alternatives have their unique features and advantages. They often offer improvements on performance, disk space usage, and version management. This healthy competition has pushed npm to improve as well. In this book, we’re only covering npm, but you might end up using a different package manager. The basic concepts of package management are all similar, but the command interfaces and what happens behind the scenes are a bit different.
The term “npm” is mainly used to refer to the CLI (npm command) that ships with Node and provides tools to manage Node packages. There is also an npm website (npmjs.com) which hosts the public registry of many open-source npm packages. The npm registry is like a big warehouse full of JavaScript packages, offering many options for common features and functionalities that you might need to add to your projects. For example, if you need your project to handle web requests, web sockets, or connect to a database, you do not need to build these features from scratch or deal with low-level code. You can download and use ready-made and often battle-tested generic solutions from a package registry, and then build your custom needs on top of them.
But, are these ready-made solutions to be trusted? You need to be the judge of that, but many of these packages have already established the trust and respect of the JavaScript and Node communities. All npm packages are open-source, so you can do your own research. There have been bad actors in the space before, so pick your packages carefully and keep an eye on their updates. Even a trusted package might be hacked but looking at the source code changes, and the activities around the code changes (like GitHub issues, pull requests, etc) helps mitigate the risk.
The npmjs.com registry is the default registry for the npm command, but npm is highly configurable. You can for example configure it to use a different registry.
Adopting a systemic approach to managing package dependencies is essential for a team project. With npm, because all package dependencies are configured in the project’s package.json file (which is shared among all developers), it becomes easy to set up a new environment, or update an older one. All developers on the team use similar versions for the project packages, and when conflicts happen, they can be detected early.
Packages usually get updated often to fix bugs, add new features, and improve things overall. With a package manager, you’re in control of how to handle these updates. You can specify which exact versions of packages the project needs, to ensure compatibility and prevent conflicts. You can also automate installing important security patches.
The npm project started with a small set of Node scripts to manage common tasks around folders that contain code for Node and it has since evolved into a fully featured package manager that is useful for all JavaScript code. Not just Node. If you browse the packages that are hosted on https://npmjs.com[npmjs.com], you’ll find packages that are for Node and packages that are libraries and frameworks meant to be used in a browser or a mobile application. If you dig deep enough you’ll even see examples of apps for robots, routers, and countless other places where JavaScript can be executed.
The https://npmjs.com[npmjs.com] registry has lots of useless packages. Anyone can publish packages, and there is no quality control. Don’t take the presence of a package on that registry as a trust signal. Always do your research and look at how the package is used in other open-source projects, and preferably, inspect its code yourself.
Node packages come in all shapes and sizes. Some represent big frameworks, some represent smaller libraries of a certain utility, and many others provide small isolated utility functions. A typical Node project will have hundreds of npm packages managed under the dependency tree.
The npm CLI is the main tool you need to learn. It’s a powerful one that supports many commands. To see usage instructions and the list of all the available commands, you can run npm --help.
~ $ npm --help
npm <command>
Usage:
npm install install all the dependencies in your project
npm install <foo> add the <foo> dependency to your project
npm test run this project's tests
npm run <foo> run the script named <foo>
npm <command> -h quick help on <command>
npm -l display usage info for all commands
npm help <term> search for help on <term>
npm help npm more involved overview
All commands:
access, adduser, audit, bin, bugs, cache, ci, completion,
config, dedupe, deprecate, diff, dist-tag, docs, doctor,
edit, exec, explain, explore, find-dupes, fund, get, help,
hook, init, install, install-ci-test, install-test, link,
ll, login, logout, ls, org, outdated, owner, pack, ping,
pkg, prefix, profile, prune, publish, rebuild, repo,
restart, root, run-script, search, set, set-script,
shrinkwrap, star, stars, start, stop, team, test, token,
uninstall, unpublish, unstar, update, version, view, whoami
Don’t be overwhelmed by the amount of commands you see here. You don’t really need many of them. The commands you’ll use often are install, and update. You’ll also probably use run commands like start and test which we’ll learn about later in the chapter.
Most other commands you’ll use infrequently. Here are a few highlights:
npm init: Initializes a new npm package in a project folder. We used this one in Chapter 1. It asks a few questions about the project, like the name, version, description, and more. It also tries to detect some information about the project and include them as default answers to the interactive questions. It’ll then use the answers to create a package.json file.
npm search <search terms>: Searches the npm registry for packages based on the provided search query. For example: Try npm search lodash.
npm list: Displays a tree-like view of installed packages and their dependencies. A common alias here is npm ls.
npm publish: Publishes your package to the npm registry, making it available for others to install. We’ll see an example of that later in the chapter.
npm link: Creates a symbolic link between a package in your local file system and a package installed under node_modules (or globally). This allows you to develop and test packages locally without the need for publishing or reinstalling.
npm cache clean: Clears the npm cache, which can help resolve certain installation issues or outdated package versions.
You can get further details and instructions on any npm command using npm <command> -h. Here’s an example help summary for the npm install command:
~ $ npm install -h Install a package Usage: npm install [<@scope>/]<pkg> npm install [<@scope>/]<pkg>@<tag> npm install [<@scope>/]<pkg>@<version> npm install [<@scope>/]<pkg>@<version range> npm install <alias>@npm:<name> npm install <folder> npm install <tarball file> npm install <tarball url> npm install <git:// url> npm install <github username>/<github project> Options: [-S|--save|--no-save|--save-prod|--save-dev|--save-optional|.. ... aliases: add, i, in, ins, inst, ...
As you can see, we can use the install command in many ways and with many options. It also has many aliases, like i for example (so you can use npm i express to install the express package).
You don’t need to remember all of the usage ways and options, but a quick scan for later reference is certainly helpful. This is actually the summarized version of the install help page. You can see the full help page using npm help install.
Here are a few challenges for you to figure out from the help text of npm install:
Install a package that’s hosted under a scope. An npm scope is a way to group related packages under a specific namespace or organization. An example scope is @babel. An example package under that scope is core.
Install a package directly from GitHub. Try to install lodash from GitHub. To verify, look at the dependencies section of the project’s package.json file. lodash should have a github label.
Install a package globally to make it available to any Node project on the machine. This option is commonly used for command-line tools. For example, you can install the yarn package globally using npm, and that will make the yarn command available everywhere.
Avoid installing npm packages globally unless you really need to. Installing packages globally reduces the modularity of your projects, and can lead to version conflicts between different projects. It can also cause your projects to behave inconsistently across different environments.
The npm update command can be used to update packages listed in package.json to their latest version- as constrained in the file. To understand that, we first need to learn about Semantic Versioning (or SemVer for short).
npm uses SemVer when it’s time to update packages. Every package has a version, this is one of the required information about a package. That version in npm is written with the SemVer format. For example, when we installed the lodash package in Chapter 1, the line that was added to package.json dependencies section was:
"lodash": "^4.17.21"
The 4.7.21 part is the SemVer string and it’s basically a simple contract between a package author and the users of that package. When that number gets bumped up to release a new version of the package, the SemVer communicates how big of a change to the package will that new release be.
The first number, which is called the MAJOR number, is used to communicate that breaking changes happened in the release. Those are changes that will require users to change their code to make it work with the new release. The next time that happens for lodash, it’ll be released with a SemVer string that begins with 5 instead of 4.
The second number, which is called the MINOR number, is used to communicate that new features were added in a release but older features should still work as is. A minor version release might also include warnings about future deprecations and API changes. Minor versions updates should still be backward-compatible and it should be safe for users to update to them without needing to make any changes to their projects.
The last number, which is called the PATCH number, is used to communicate that the release only contains bug fixes and security improvements. They should not introduce any new features or breaking changes.
You’ll often see special characters before the version strings in the package.json file, these special characters represent a range of acceptable versions and are put to use when you instruct npm to update your dependency tree.
For example, the tilde (~) character means that an update can install the most recent patch version (remember patch is the third number). The caret (^) character is a more relaxed constraint that means that an update can install the most recent minor version. If we update the lodash package while its version string is “^4.17.21”, it’ll try to find the latest version that begins with the 4 major number. So it might install a 4.19.1 package, but it will not install a 5.1.2 package.
Other special characters are =, >, >=, <, <=. If no special character is used, it means the version to be used should always be the exact one that’s specified by the SemVer string.
Instead of a version string, a * can be used to mean the latest version available.
Another way to specify the version constraint is with an x in the string. For example, a 4.x version string means any version that begins with a 4. A 4.17.x string means any version that begins with 4.17.
You can also manually specify a range using the - character, for example: “4.15.0 - 4.17.0”.
For more details on version strings and for an interactive way to test them, checkout semver.npmjs.com. You can enter a version string for a particular npm package to see all the available versions constrained by that string.
I think SemVer is great. Responsible npm developers should respect it when they release new versions of their code, but it’s good to treat what it communicates as a promise rather than a guarantee because even a patch release might leak breaking changes through its own dependencies. A minor version for example might introduce new elements that conflict with elements you previously thought are okay to use. Testing your code is the only way to provide some form of guarantee that it’s not broken after an update.
When the packages your project depends on get updates, you can issue the npm update <package-name> command to update a single package, or the npm update command to update all the packages in the dependency tree.
Let’s simulate a case where an update is going to happen by installing an older version of lodash. To da that, we just specify the exact version we are interested in by adding it after an @ character:
$ npm install lodash@3.9.1
You can verify which version npm installed using the npm ls command. It should be 3.9.1.
Now take a look at package.json and note how the version string starts with a ^ character, this permits npm to update the package to the latest minor version available.
To see what version will be installed using the npm update command, you can first run the npm outdated command. It’ll list all packages and if any of them has a valid update (permitted by the version strings constraints), the updated version will be listed under the “Wanted” column. The output will also include the latest version.
Now because of the ^ constraint, the Wanted version in this case will be 3.10.1. That was the last version released under the 3 major branch.
If you change the ^ into a ~ and run the npm outdated command, the Wanted version will be 3.9.3. That was the last version released under the 3.9 minor branch.
If you change ~ into > and run the npm outdated command, the Wanted version will match the latest one.
The outdated command is like a dry run for you to verify what packages will be updated. It does not do the update. To update, you run the npm update command.
Experiment with the outdated, update, and ls command with a package like “express” that has its own dependencies. Install an older version of that as well, for example:
$ npm i express@3
Note the version I used there. That 3 is the major version and the syntax here means install the latest “express” version that begins with 3. See what version was installed with the npm ls command.
Now what happens if you change the version string in package.json to something older? For example, change the “express” version string to “~3.10.0”. Since that constraint specifies something older than what you currently have installed, running the npm update command will actually downgrade the express package. Verify that with npm ls.
The update command will update all dependencies, including transitive ones, based on the version strings constraints specified in the package.json files of the packages that depend on them.
To make the outdated command show all the dependencies to be updated, run it with the -a flag:
$ npm outdated -a
Let’s say that we decided we no longer want to use “express”. You can remove it from package.json manually but that will not remove it from node_modules. To remove it from both package.json and node_modules, you can run the npm uninstall <package-name>. The uninstall command is the better way here.
However, if someone on the team used the uninstall command, and you pulled that code change, all you’re seeing is the line being removed from package.json. The node_modules folder is not usually shared in Git repos. You’ll need to run npm commands to sync your node_modules folder with the updates in package.json.
To simulate that, remove the “express” line from package.json. You now have packages installed but no longer needed (according to package.json). If you run the npm ls command now, it’ll list these packages with an “extraneous” label next to them.
$ npm ls efficient-node@1.0.0 /Users/samer/efficient-node ├── accepts@1.3.8 extraneous ├── array-flatten@1.1.1 extraneous ├── body-parser@1.20.1 extraneous ...
To remove all unused packages from the project, you can use the npm prune command:
$ npm prune removed 58 packages, and audited 2 packages in 1s found 0 vulnerabilities
Now if you run the npm ls command again, there should not be any extraneous packages.
To ensure that a project’s dependencies are in sync with changes in package.json, whenever you pull new code and notice changes to package.json, run both the prune and install commands.
However, the npm install command will always install the latest version of a package as permitted by version string constraint. That means between the time a dependency is added by one developer, and another developer pulling the code to install it, a new version of that dependency might have been released, and if the version string specified in package.json allows it, npm install will install that new version, which is different from the one that’s installed on the machine that added the dependency in the first place.
That’s why npm automatically maintains another file in the root of the project, the package-lock.json file. The purpose of that file is to lock versions of packages so that all project developers use the exact same versions of all the packages. This is true for both direct dependencies, and transitive ones.
Every time a dependency is added, updated, or removed, npm will modify the package-lock.json file to describe the entire tree of dependencies (direct and transitive), along with what exact versions to install.
Because the package-lock.json file should be part of the project Git repository for others to use it, its change history can be used to go back to previous states of what was exactly under the node_modules folder.
npm also uses the package-lock.json file to optimize its operations.
Let’s create and then publish a simple npm package that provides a function named printInFrame. That function takes a string argument and outputs that string within a frame made of * characters.
Let’s name the package “print-in-frame”.
Here’s an example of how we’d use it:
import printInFrame from "print-in-frame";
printInFrame("Hello World");
This should output:
*************** * Hello World * ***************
First, make a new folder to host this package code. The name of the folder usually matches the name of the package (although that’s not a requirement):
$ mkdir print-in-frame & cd print-in-frame
Next step is to make this empty folder into an npm package. We do that by adding a package.json file. We can use npm init for that.
$ npm init
Answer the questions and confirm. You can use the default answers. After the file is created, manually add the "type": "module" to instruct Node that this project will exclusively use ES modules.
Open up your code editor on this folder. Then create an index.js file in the root of the project, and define an empty printInFrame function in there, and make it the default export:
const printInFrame = (text) => {
// ...
};
export default printInFrame;
To write an implementation for printInFrame, let’s start with a test. Node has a few built-in tools to write and run tests.
Create an index.test.js file in the root of the project, and start it by importing the Node test and assert objects, and the printInFrame function that we need to test:
import test from "node:test"; import assert from "node:assert/strict"; import printInFrame from "./index.js";
The “node:test” module provides a way to organize your tests and describe them. The “node:assert” module provides assertion methods to implement the logic of the tests.
Here’s what I came up with to implement the “Hello World” test:
// ...
const output = printInFrame("Hello World");
const expectedOutput = `
***************
* Hello World *
***************
`.trim();
test("printInFrame", (t) => {
assert.equal(output, expectedOutput);
});
To run tests with Node, you can use the --test argument:
$ node --test
This will run all tests by scanning through all files to locate the ones named using certain patterns. I like to use the file.test.js pattern. You can also add the --watch command to make Node rerun the tests everytime you change the code.
The one test we have here should obviously fail. To implement the printInFrame function, we basically needs to read the length of the text and use that to print a set of * characters before and after. This can be done in many ways. Here’s what I did:
import times from "lodash.times";
const printInFrame = (text) => {
const frameWidth = text.length + 4; // 2 stars + 2 spaces
let textToPrint = "";
times(frameWidth, () => (textToPrint = textToPrint + "*"));
textToPrint = textToPrint + "\n" + "* " + text + " *" + "\n";
times(frameWidth, () => (textToPrint = textToPrint + "*"));
console.log(textToPrint);
return textToPrint;
};
export default printInFrame;
I made the function depend on “lodash.times” which provides a function that can repeat a block of code any number of times. I used that to prepare the frame header and footer lines.
You need to npm install lodash.times. After that, running the test again should make it pass.
To use the “print-in-frame” package in a Node project, we need to install it. We can actually install it directly from the file system:
$ npm install ../print-in-frame
While this works okay, when you share your code with others, you’ll have to share the “print-in-frame” folder as well. To keep them separate, we’ll need to use an npm registry and publish the package there.
If you want to publish your package on npmjs.com, you need to have an account there. Then you can use the npm login command to authenticate your local npm client with your account. It’ll ask you for your username and password.
Since the package name is unique at the npm registry, to avoid conflict, add a prefix to your package name. I changed the name property in package.json to “samer-print-in-frame”. While you’re there, add a description to the package as well. It’s optional, but it makes the package easier to discover.
When you’re ready, run the npm publish command. If everything works, your package will be available at npmjs.com (use the UI search there to find it). You can also use the npm search command to find it.
With the package published, in your main Node project, install it with: npm install PREFIX-print-in-frame, replacing “PREFIX” with the prefix that you used.
Now look at the output of npm ls. You should see 2 new dependencies: print-in-frame, and lodash.times.
Now to make updates to your package and test them in a project before you publish a new version, you can use the npm link command to temporarily make a project use a local package rather than the one installed through the registry. In the print-in-frame folder, run npm link, then in the main project folder, run npm link PREFIX-print-in-frame.
Now you can make changes to your local package folder and test them in your main project. Once you’re done, you can increment the version property in the package.json file under your package, and run npm publish again.
I used ES modules for print-in-frame. This means it can only be used under projects that use ES modules. If you want to create a package that can be used in any Node project, you’ll need to create a CommonJS version as well. You can use tools like Babel or TypeScript to automate tasks like these.
npm run scripts are a feature in npm that enables developers to easily perform (or automate) common tasks like building, testing, and deploying applications.
You can define a run script under the “scripts” section in package.json. When you run the npm init command, it’ll include an example run script:
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
You can use that “test” script by running npm run test. A few common run script names (like “test”, “start”, “stop”) have a shortcut alias as well. You can run the “test” script here with just npm test.
If you run the npm run command without any arguments, it’ll list all defined scripts under the project.
The sample “test” script just outputs an error message, but note how it used shell commands like echo and exit. You can use any of the shell commands available on your machine. For example, try a script to ls -al or to npm ls | grep 'extraneous'. The latter is a good example of how a common project task can be simplified into a run script, and documented for other team members who don’t know about it. What’s a good intuitive name for that task? maybe “list-unused-packages”?
"scripts": {
...
"list-unused-packages": "npm ls | grep 'extraneous'"
},
Now a developer who does not know about this extraneous label, can look at this run script and figure out how to list any unused packages in the project. They just need to npm run list-unused-packages.
This becomes more important when you publish packages for other teams to use. npm run scripts are the best place to communicate to developers using your packages how to use them.
npm run scripts help developers automate running tasks. First, if you need to run something repeatedly for the project, for example, run all integration tests, you’ll have a simple and intuitive way of doing it, rather than trying to figure out the exact command every time. More importantly, an npm run script will make running this task consistent among all developers. All developers should be using the exact same command to run all integration tests. Even more importantly, if the way to run all integration tests needs to change, instead of manually announcing this change in a chat channel, you can communicate it with a change to package.json, that’s forever kept in the project’s Git history.
You can even make the automation official and adopt a way to run tasks automatically before or after other tasks. For example, I often forget to run npm prune && npm install after pulling new code and trying to run all tests. npm run script can be used to automatically run the pruning and installing every time you run the tests.
To do that, you can define script names using a pre or post prefix. For this example, we can define a “pretest” script to prune/install:
"scripts": {
...
"pretest": "npm prune && npm install"
},
With that special script in place, everytime you run npm test, the prune/install commands will be executed before running the tests.
This works with any script name. If you have “dosomething” name, you can define the “predosomething” and “postdosomething” scripts to execute tasks before or after your run “dosomething”.
This is great for many use cases. To name a few examples, you can automate running tests before you can push new code, formatting/linting/complaining of code, or generating documentation.
One other cool thing about npm run scripts is that they’ll execute any command-line tools installed under the project. You don’t need to explicitly specify the path to these commands
For example, run npm i eslint under the project to install the eslint command-line.
Now if you’re in the project folder, and you try to execute the eslint command, it would not be available. That command is somewhere under the node_modules folder, but npm does not make it globally available. However, npm run scripts recognize them. To test that, add the following script:
"scripts": {
...
"lint": "eslint"
},
Now you can npm run lint and npm will find the eslint command and execute it. You can even include arguments and npm will pass them to what you’re executing. Try:
$ npm run lint --help
I named the script “lint” (instead of “eslint”) intentionally. Generic names are better under npm scripts. Maybe in the future we’ll use something other than eslint to lint. Changing a run script name might break things in the future, especially automated tasks.
If you just need to execute a command-line tool that’s installed in the project for one time, you can also use the npx command. For example, running npx eslint --help will also work.
Executing local command-line tools is just one of the many use cases for npx. You can actually use npx to execute a remote command-line tool as well. If you npm uninstall eslint from the project and then run the npx eslint --help again, it would still work. npx will automatically install a temporary copy of eslint to use.
You can even use npx with specific versions. For example, let’s say that you need to find out which of eslint options (which you can see in the help page) existed early on, since the first available version of eslint.
You can use the npm view command to find out the earliest available version of eslint:
npm view eslint versions
When I tested this command, the earliest version of eslint was 0.4.0. Note that an earlier version might have been available but the maintainers of eslint decided to purge it from the registry.
To see the help page of the 0.4.0 eslint command, you can run npx eslint@0.4.0 --help.
npx is commonly used to bootstrap a project from a template. An example of a package that can be used that way is create-react-app. You can use it through the npx command to generate a working React application using one of the many supported templates:
npx create-react-app your-app-name-here
Not only will this download a temporary copy of the “create-react-app” package, it’ll then recognize that this is a “generator” package, with a default command to create a project. It’ll execute that default command.
Generator packages can even have multiple commands. Checkout the help page for the @vue/cli generator package.
npx @vue/cli --help
A package manager like npm is an important part of working on a Node project. It introduces a simple and standard way to deal with project external dependencies and keep them updated, consistent, and conflict-free.
npm packages are hosted on a public registry and the npm command is configured to work with that registry. npm commands like install, update, search, and more work with that registry.
The package.json and package-lock.json files are automatically modified by npm every time there is a change to the project dependency tree. These files store what versions of packages are installed and what range of versions to use when updating packages.
In addition to the npm command, there’s also an npx command that can be used to execute local or remote command-line tools.