Node.js

Streams & Buffers

15 Questions

Streams are objects that let you read or write data piece by piece (chunks) instead of loading everything into memory at once, making them essential for large files, network data, and real-time processing without exhausting memory. They follow an event-driven architecture and are fundamental to Node.js I/O operations; any input or output operation in Node.js is backed by a stream internally. The difference between loading an entire file and streaming it is critical for memory efficiency — a 1GB file read with readFileSync consumes 1GB of RAM, but a stream processes it in small chunks.

const fs = require('fs');

// Without streams — loads entire file into memory
const data = fs.readFileSync('large-file.txt');

// With streams — processes in chunks
const stream = fs.createReadStream('large-file.txt');
stream.on('data', (chunk) => console.log(chunk.length));
stream.on('end', () => console.log('Done'));

Why it matters: Processing multi-gigabyte files, serving large downloads, or handling high-throughput network connections requires streams; without them, Node.js would need to buffer entire payloads in memory, leading to out-of-memory crashes under load.

Real applications: Video streaming services pipe file read streams directly to HTTP responses; data ETL pipelines chain transform streams to parse, filter, and write CSV records without loading the full file; real-time log processing uses readable streams with transform stages.

Common mistakes: Concatenating chunks inside a data event handler into a string instead of using pipeline() — this defeats the memory benefits of streaming and can cause encoding issues when chunks split multi-byte characters at chunk boundaries.

Node.js has four fundamental stream types: Readable (source of data, e.g., fs.createReadStream), Writable (destination, e.g., fs.createWriteStream), Duplex (both readable and writable, e.g., TCP sockets), and Transform (duplex stream that modifies data passing through, e.g., zlib.createGzip). Each type extends EventEmitter and emits standard events (data, end, error, finish); understanding which type to use determines whether you implement _read(), _write(), or _transform().

const { Transform } = require('stream');
const upperCase = new Transform({
  transform(chunk, encoding, callback) {
    callback(null, chunk.toString().toUpperCase());
  }
});

Why it matters: Choosing the wrong stream type leads to unnecessary complexity — a Transform when you need a Readable, or a Duplex when Transform suffices; knowing each type's contract (what methods to implement) directly impacts performance and correctness.

Real applications: HTTP servers use Readable (request) and Writable (response) streams; WebSocket connections are Duplex streams; gzip compression, CSV parsing, and encryption are implemented as Transform streams that can be plugged into any pipeline.

Common mistakes: Implementing a Duplex when you only need a Transform — Transform is simpler because it handles the relationship between the read and write sides internally, while Duplex requires you to manage them independently.

pipe() connects a readable stream to a writable stream, handling data flow and backpressure automatically so the producer never overwhelms the consumer. It returns the destination stream, which enables chaining multiple pipe calls together to build processing pipelines for compression, encryption, or format conversion. This pattern is used extensively for file processing, HTTP responses, and data transformation without loading entire payloads into memory.

const fs = require('fs');
const zlib = require('zlib');

// Simple pipe: read → write
fs.createReadStream('input.txt')
  .pipe(fs.createWriteStream('output.txt'));

// Chained pipes: read → gzip → write
fs.createReadStream('data.txt')
  .pipe(zlib.createGzip())
  .pipe(fs.createWriteStream('data.txt.gz'));

// Pipe to HTTP response
app.get('/file', (req, res) => {
  fs.createReadStream('large.pdf').pipe(res);
});

Why it matters: pipe() is the simplest way to connect streams but has a critical flaw — errors in intermediate streams are not forwarded to the destination, so a failed transform leaves the writable stream open and leaking file handles or network connections.

Real applications: Serving a compressed file download chains fs.createReadStream().pipe(zlib.createGzip()).pipe(res); however, production code uses pipeline() from stream/promises so that any failure automatically destroys all streams and triggers cleanup.

Common mistakes: Using pipe() without attaching an error event handler on each intermediate stream — an unhandled error event causes an uncaught exception and crashes the process; prefer pipeline() which handles error propagation and cleanup automatically.

Backpressure occurs when a writable stream cannot process data as fast as the readable stream produces it, causing data to accumulate in internal buffers and potentially exhausting process memory. Without proper handling, the producer keeps reading and pushing data regardless of whether the consumer has processed previous chunks; the process can run out of memory silently. Manual backpressure handling involves pausing the readable stream when write() returns false and resuming on the drain event when the buffer clears.

const readable = fs.createReadStream('large-file.txt');
const writable = fs.createWriteStream('output.txt');

readable.on('data', (chunk) => {
  const canContinue = writable.write(chunk);
  if (!canContinue) {
    readable.pause(); // Stop reading until drain
  }
});

writable.on('drain', () => {
  readable.resume(); // Resume reading
});

Why it matters: Ignoring backpressure is one of the most common causes of memory leaks in Node.js stream applications — a fast disk read piped to a slow network write will buffer gigabytes in memory before the slow consumer can process it, eventually crashing the process.

Real applications: Uploading to a slow S3 endpoint while reading from a fast local disk requires backpressure-aware writing; pipeline() handles this automatically; database write streams implement backpressure by pausing the input until the current batch commit completes.

Common mistakes: Ignoring the return value of writable.write() — when it returns false the internal buffer is full and you must stop writing until the drain event fires; continuing to write after a false return bypasses backpressure and leads to unbounded memory growth.

A Buffer is a fixed-size chunk of memory allocated outside the V8 heap, used to handle raw binary data like files, network packets, and images that JavaScript's string type cannot represent losslessly. Buffers were introduced because JavaScript originally had no mechanism for reading or manipulating binary data — all data in browsers was text, but server-side I/O requires byte-level access. They provide methods for creating, reading, writing, and comparing byte sequences, and integrate directly with Node.js stream operations.

// Create buffers
const buf1 = Buffer.from('Hello');
const buf2 = Buffer.alloc(10);          // 10 zero-filled bytes
const buf3 = Buffer.from([72, 101]);    // from byte array

// Read/write
console.log(buf1.toString());           // 'Hello'
console.log(buf1.length);               // 5
console.log(buf1[0]);                   // 72 (ASCII 'H')

// Compare
buf1.equals(Buffer.from('Hello'));      // true

Why it matters: Every file read, network packet, and TLS handshake in Node.js involves Buffers; mishandling binary data by converting to strings with the wrong encoding corrupts binary files, image data, or encrypted payloads in ways that are hard to debug.

Real applications: Image processing reads files into Buffers before passing to Sharp or Jimp; JWT tokens decode the Base64url payload using Buffer.from(token, 'base64url').toString('utf8'); binary protocol implementations (MQTT, Redis) parse packet headers by reading specific byte offsets from Buffers.

Common mistakes: Using Buffer.concat() to assemble chunks inside a data event handler without tracking the total length — Buffer.concat(chunks) in the loop is O(n²); collect all chunks in an array and concat once at the end, or use stream.pipeline() with a writable that handles accumulation.

stream.pipeline() connects streams and provides proper error handling and automatic cleanup as a drop-in replacement for chained .pipe() calls. If any stream in the pipeline errors, all streams are automatically destroyed to prevent file handle and memory leaks. The promise-based version from stream/promises integrates cleanly with async/await, making stream pipelines as straightforward to write as synchronous code.

const { pipeline } = require('stream/promises');
const fs = require('fs');
const zlib = require('zlib');

// Promise-based pipeline
async function compress(input, output) {
  await pipeline(
    fs.createReadStream(input),
    zlib.createGzip(),
    fs.createWriteStream(output)
  );
  console.log('Compression complete');
}

compress('data.txt', 'data.txt.gz')
  .catch(err => console.error('Pipeline failed:', err));

Why it matters: Without pipeline(), an error in one stream of a pipe() chain leaves the other streams open and still consuming file descriptors — this kind of resource leak is silent, cumulative, and eventually causes EMFILE (too many open files) errors under load.

Real applications: Every file compression, upload processing, or ETL pipeline in production Node.js uses pipeline() from stream/promises; the async/await version makes it easy to handle errors with try/catch and log exactly which step of the pipeline failed.

Common mistakes: Using require('stream').pipeline (the callback version) and forgetting to handle the error in the callback — the callback form requires explicit error handling; prefer the promise version from require('stream/promises') so errors are handled by the surrounding try/catch.

Custom Readable streams are created by extending the Readable class and implementing the _read() method, which is called automatically by the stream machinery when the consumer needs more data. You push data chunks into the internal buffer using this.push(chunk) and signal the end of the stream with this.push(null); the consumer receives chunks via data events or async iteration. The highWaterMark option controls how many bytes are buffered before backpressure is applied to the producer.

const { Readable } = require('stream');

class Counter extends Readable {
  constructor(max) {
    super();
    this.current = 0;
    this.max = max;
  }
  _read() {
    if (this.current <= this.max) {
      this.push(String(this.current++) + '\n');
    } else {
      this.push(null); // Signal end of stream
    }
  }
}

const counter = new Counter(5);
counter.pipe(process.stdout); // 0 1 2 3 4 5

Why it matters: Custom Readable streams are needed when consuming non-standard data sources like database cursor results, sensor data, computed sequences, or paginated API responses that don't map to file or network streams Node.js provides out of the box.

Real applications: Database cursor streams push query result rows one at a time so large result sets can be piped to CSV writers without buffering all rows in memory; timer-based streams push data on intervals for server-sent events or metrics broadcasting.

Common mistakes: Calling this.push() synchronously in a tight loop inside _read() without yielding control — this floods the internal buffer and ignores backpressure; _read() should push only one chunk per call and wait to be called again when the consumer is ready for more.

Transform streams modify data as it passes through by implementing the _transform() method, which receives each incoming chunk, processes it, and optionally pushes output using this.push() or the callback's second argument. Setting objectMode: true allows the stream to pass JavaScript objects instead of strings or Buffers, enabling object-to-object transformations in a pipeline. The _flush() method handles any data that must be pushed after the last chunk is processed, such as closing delimiters or aggregated summaries.

const { Transform } = require('stream');

class CSVToJSON extends Transform {
  constructor() {
    super({ objectMode: true });
    this.headers = null;
  }
  _transform(chunk, encoding, callback) {
    const line = chunk.toString().trim();
    if (!this.headers) {
      this.headers = line.split(',');
    } else {
      const values = line.split(',');
      const obj = {};
      this.headers.forEach((h, i) => obj[h] = values[i]);
      this.push(JSON.stringify(obj) + '\n');
    }
    callback();
  }
}

Why it matters: Transform streams enable reusable, composable data processing stages that can be plugged into any pipeline — a single CSV-to-JSON Transform built once can be used with any readable data source and any writable destination without code duplication.

Real applications: Data ETL pipelines chain CSV parsing, validation, and database write transforms; compression middleware wraps responses in a gzip transform before sending to clients; logging transforms add timestamps and severity to structured log objects before serializing to JSON.

Common mistakes: Not calling callback() in _transform() when a chunk is intentionally filtered out (not pushed) — the stream stalls waiting for the callback and stops processing any further chunks, causing a pipeline deadlock that is difficult to diagnose.

Piping a readable stream directly to the HTTP response object efficiently sends large files or generated data without loading the entire payload into memory first. The response object in Express (and raw HTTP) is a writable stream, so it works seamlessly with both pipe() and pipeline(); this approach supports range requests and progressive transfer encoding automatically. Streaming is especially important for video, audio, and large file downloads where loading the full content first would exhaust server memory.

const fs = require('fs');
const path = require('path');

app.get('/download/:file', (req, res) => {
  const filePath = path.join(__dirname, 'files', req.params.file);
  const stat = fs.statSync(filePath);

  res.set({
    'Content-Type': 'application/octet-stream',
    'Content-Length': stat.size,
    'Content-Disposition': `attachment; filename="${req.params.file}"`
  });

  fs.createReadStream(filePath).pipe(res);
});

// Stream JSON array
app.get('/users', (req, res) => {
  const cursor = User.find().cursor();
  cursor.pipe(new JSONArrayTransform()).pipe(res);
});

Why it matters: Streaming HTTP responses lets a 10GB video file be served by a Node.js process using only megabytes of RAM; without streaming, res.send(bigBuffer) would require the full file to be in memory simultaneously for every concurrent download request.

Real applications: File download endpoints pipe fs.createReadStream through optional transform stages (gzip, encryption) to the res writable stream; video-on-demand endpoints use range request headers to seek to specific byte positions and stream only the requested segment.

Common mistakes: Not handling errors on the read stream when piping to a response — if the file is deleted mid-transfer, the read stream emits an error that is not automatically propagated to the response stream, leaving the connection hung; use pipeline() so errors close the response properly.

Buffer.alloc() creates a zero-filled buffer, guaranteeing that no old data from previous memory allocations is exposed, while Buffer.allocUnsafe() skips initialization for better performance but may contain sensitive residual data. The key tradeoff is security vs. speed — uninitialized buffers can inadvertently expose passwords, keys, or other secrets that were previously stored in the reused memory region. Always prefer alloc() by default and only use allocUnsafe() when performance profiling proves it necessary and you immediately overwrite the entire buffer.

// Safe — filled with zeros
const safe = Buffer.alloc(10);
console.log(safe); // <Buffer 00 00 00 00 00 00 00 00 00 00>

// Unsafe — may contain old data, faster
const unsafe = Buffer.allocUnsafe(10);
// Must fill before reading to avoid data leaks
unsafe.fill(0);

// From existing data (always safe)
const buf = Buffer.from('Hello World');

Why it matters: In 2016, a widely publicized vulnerability in some npm packages demonstrated that Buffer(size) (the old unsafe constructor, now deprecated) could return uninitialized memory containing sensitive data from other parts of the app — this is why the safe alloc() API was introduced.

Real applications: Cryptographic operations that pre-allocate a buffer for a random key should always use Buffer.alloc() before filling with crypto.randomFillSync(); video chunk assemblers in tight loops may use allocUnsafe after benchmarking confirms it provides meaningful throughput improvement.

Common mistakes: Using the deprecated new Buffer(size) constructor (removed in newer Node.js) which behaved like allocUnsafe — always use the explicit Buffer.alloc() or Buffer.allocUnsafe() factory methods so the intent and safety guarantee are clear.

Async iterators provide a modern, clean way to consume readable streams using for await...of loops, eliminating the need to manually manage data, end, and error event listeners. Every readable stream in Node.js implements the async iterable protocol since Node.js 10, making streams first-class citizens in async/await code without any adapter layer. Backpressure is handled automatically — the stream pauses while the loop body executes and resumes when the iteration moves to the next chunk.

const fs = require('fs');
const readline = require('readline');

// Async iteration over a readable stream
async function processFile(filePath) {
  const stream = fs.createReadStream(filePath, { encoding: 'utf8' });
  
  for await (const chunk of stream) {
    console.log('Chunk:', chunk.length, 'bytes');
  }
  console.log('Stream finished');
}

// Line-by-line reading with readline
async function readLines(filePath) {
  const rl = readline.createInterface({
    input: fs.createReadStream(filePath),
    crlfDelay: Infinity
  });
  
  let lineNum = 0;
  for await (const line of rl) {
    lineNum++;
    console.log(lineNum + ':', line);
  }
}

Why it matters: Event-listener-based stream consumption requires careful management of multiple events (data, end, error, close) and manual error handling; async iteration replaces all of this with a single try/catch around a for await loop that is far easier to read and maintain.

Real applications: Line-by-line file processing using readline.createInterface as an async iterable; streaming database cursor results with for await (const row of cursor); consuming Server-Sent Events or WebSocket messages in a maintainable loop.

Common mistakes: Not wrapping the for await loop in a try/catch — stream errors are thrown as exceptions from the loop and without a catch block they become unhandled promise rejections; always catch errors from async iteration to allow proper cleanup and error reporting.

The highWaterMark option sets the maximum number of bytes (or objects in object mode) that a stream will buffer internally before signaling backpressure to the producer. It acts as a threshold — when the internal buffer fills to this level, a readable stream pauses reading and a writable's write() returns false. The default is 16KB for byte streams and 16 objects for object mode streams; tuning this value directly affects memory usage and I/O throughput.

const fs = require('fs');
const { Readable } = require('stream');

// Custom highWaterMark for reading large files
const stream = fs.createReadStream('large.log', {
  highWaterMark: 64 * 1024  // 64KB chunks instead of default 16KB
});

// Object mode stream with custom highWaterMark
const objectStream = new Readable({
  objectMode: true,
  highWaterMark: 100,  // Buffer up to 100 objects
  read() {}
});

// Writable stream — write() returns false when buffer is full
const writable = fs.createWriteStream('output.txt', {
  highWaterMark: 32 * 1024  // 32KB write buffer
});

const canWrite = writable.write(data);
if (!canWrite) {
  // Buffer full, wait for drain event
  await new Promise(resolve => writable.once('drain', resolve));
}

Why it matters: Setting highWaterMark too high causes large memory allocations per stream connection; setting it too low increases the number of system calls per operation; finding the right balance for your specific I/O pattern (file size, network speed, CPU cost) is key to optimal throughput.

Real applications: Log file processors increase highWaterMark to 64KB for faster disk reads; database batch inserters use object mode with highWaterMark: 500 to collect rows before flushing; real-time streaming applications use small buffers (1KB) to minimize latency between production and consumption.

Common mistakes: Setting a very high highWaterMark on a server that handles many concurrent connections — if each connection buffers 10MB, 1000 concurrent connections will allocate 10GB just in stream buffers; keep the default unless you have measured that a different value improves performance.

Custom Writable streams are created by extending the Writable class and implementing the _write() method, which receives each chunk and must call the callback to signal completion before the next chunk is delivered. The optional _final() method runs before the stream closes, making it ideal for flushing remaining data, committing transactions, or closing external connections. For batch processing, _writev() can handle multiple buffered chunks simultaneously when the previous write was slow.

const { Writable } = require('stream');

class DatabaseWriter extends Writable {
  constructor(db) {
    super({ objectMode: true });
    this.db = db;
    this.batch = [];
  }
  
  _write(record, encoding, callback) {
    this.batch.push(record);
    if (this.batch.length >= 100) {
      this.db.insertMany(this.batch)
        .then(() => { this.batch = []; callback(); })
        .catch(callback);
    } else {
      callback();
    }
  }
  
  _final(callback) {
    // Flush remaining records
    if (this.batch.length > 0) {
      this.db.insertMany(this.batch)
        .then(() => callback())
        .catch(callback);
    } else {
      callback();
    }
  }
}

const writer = new DatabaseWriter(db);
dataStream.pipe(writer);

Why it matters: Custom Writable streams are needed when writing to non-standard destinations like databases, message queues, cloud storage APIs, or in-memory aggregators that don't have built-in stream support; they encapsulate the write logic so it can be composed with any readable source.

Real applications: MongoDB bulk insert writers collect chunks into batches of 500 records before each insertMany call; Elasticsearch indexing streams implement _writev() to process entire queued batches in a single bulk index request for maximum throughput.

Common mistakes: Forgetting to call callback() in all code paths inside _write() — if an async operation succeeds but the callback is only called in the then handler and not the catch, a rejected promise silently stalls the entire stream with no error event.

Readable.from() creates a readable stream from any iterable or async iterable, including arrays, generators, and async generators, without requiring a custom stream class. It provides a convenient bridge between the generator/iterator ecosystem and the stream ecosystem, automatically wrapping each yielded value as a stream chunk. This utility, available since Node.js 12, is commonly used for testing pipelines with in-memory data and for converting paginated async data sources into streams.

const { Readable } = require('stream');
const { pipeline } = require('stream/promises');
const fs = require('fs');

// From an array
const arrayStream = Readable.from(['hello\n', 'world\n']);
arrayStream.pipe(process.stdout);

// From a generator function
function* generateNumbers(max) {
  for (let i = 0; i <= max; i++) {
    yield i + '\n';
  }
}
Readable.from(generateNumbers(5)).pipe(process.stdout);

// From an async generator (e.g., paginated API)
async function* fetchPages(url) {
  let page = 1;
  while (true) {
    const res = await fetch(url + '?page=' + page);
    const data = await res.json();
    if (data.length === 0) break;
    yield JSON.stringify(data) + '\n';
    page++;
  }
}

await pipeline(
  Readable.from(fetchPages('https://api.example.com/items')),
  fs.createWriteStream('all-items.json')
);

Why it matters: Readable.from() eliminates the need to write a full Readable subclass when the data source is an existing iterable — coupling arrays, generator functions, and async generators directly into stream pipelines without boilerplate dramatically reduces code complexity.

Real applications: Test suites use Readable.from(['chunk1', 'chunk2']) to feed mock data into pipeline tests without touching the filesystem; async generator functions that paginate REST API responses stream all records through a transform pipeline to a database writer.

Common mistakes: Using Readable.from() with a synchronous array of very large strings without considering that the array must be fully in memory — if the goal is memory efficiency, the data source should be an async generator that yields values lazily rather than a pre-built array of all items.

stream.compose() combines multiple Transform and Duplex streams into a single composite stream that behaves as one unified transform, introduced in Node.js 16. This creates a reusable, encapsulated pipeline that can be stored in a variable, passed to functions, and composed with other streams as if it were a single atomic unit. The pattern promotes separation of concerns by letting each processing stage be developed and tested independently before composing them for production use.

const { compose } = require('stream');
const { Transform } = require('stream');

// Individual transform stages
const parseCSV = new Transform({
  objectMode: true,
  transform(chunk, enc, cb) {
    const fields = chunk.toString().trim().split(',');
    cb(null, { name: fields[0], age: Number(fields[1]) });
  }
});

const filterAdults = new Transform({
  objectMode: true,
  transform(obj, enc, cb) {
    if (obj.age >= 18) cb(null, obj);
    else cb(); // Skip minors
  }
});

const toJSON = new Transform({
  objectMode: true,
  transform(obj, enc, cb) {
    cb(null, JSON.stringify(obj) + '\n');
  }
});

// Compose into a single reusable stream
const processCSV = compose(parseCSV, filterAdults, toJSON);

// Use the composed stream in a pipeline
fs.createReadStream('people.csv')
  .pipe(processCSV)
  .pipe(fs.createWriteStream('adults.json'));

Why it matters: Without compose(), reusable multi-stage pipelines require either wrapping chains of .pipe() in factory functions (with broken error propagation) or custom Duplex subclasses; compose() provides a clean, error-safe way to package a pipeline as a single reusable stream.

Real applications: A CSV-processing pipeline that parses, validates, and normalizes records can be composed into a single const processCSV = compose(parse, validate, normalize) that is reused across multiple import routes; logging pipelines compose JSON serialization, metadata injection, and rotation into one transport stream.

Common mistakes: Passing non-Transform Writable or Readable streams to compose() in the wrong positions — only the first stream can be Readable, only the last can be Writable, and all intermediate streams must be the same type (Transform); mixing types incorrectly results in a TypeError at runtime.

Previous File System Next Authentication & JWT

Streams & Buffers

1What are Streams in Node.js and why are they useful?

2What are the four types of streams in Node.js?

3How does the pipe() method work with streams?

4What is backpressure in streams and how is it handled?

5What is a Buffer in Node.js?

6How do you use the pipeline() function for stream error handling?

7How do you create a custom Readable stream?

8How do you create a Transform stream?

9How do you stream data in an HTTP response?

10What is the difference between Buffer.alloc() and Buffer.allocUnsafe()?

11How do you use async iterators with streams in Node.js?

12What is the highWaterMark option in streams?

13How do you implement a Writable stream in Node.js?

14How do you use the Readable.from() utility to create streams from iterables?

15How do you use the stream.compose() method to build stream pipelines?