TongoRender — HTML to PDF, Image & Screenshot API for Developers

When your application generates PDFs or screenshots at scale, you will inevitably hit API rate limits. Understanding how rate limiting works — and designing your application to handle it gracefully — is the difference between a robust production system and one that fails under load. This guide covers rate limiting concepts, batch processing strategies, queue management, and error handling patterns.

Understanding Rate Limits

Rate limits protect both the API provider and its users. They ensure fair resource allocation and prevent any single client from overwhelming the service. Rate limits are typically expressed as:

Requests per second (RPS) — e.g., 10 requests/second
Requests per minute (RPM) — e.g., 100 requests/minute
Concurrent requests — e.g., 5 simultaneous requests
Monthly quota — e.g., 10,000 renders/month

TongoRender returns rate limit information in response headers:

X-RateLimit-Limit: 100        // Max requests per window
X-RateLimit-Remaining: 73     // Requests remaining in current window
X-RateLimit-Reset: 1679529600 // Unix timestamp when the window resets
Retry-After: 30               // Seconds to wait (only on 429 responses)

Implementing Rate-Aware Clients

Build a client that respects rate limits proactively, rather than waiting for 429 errors:

class RateLimitedClient {
  constructor(apiKey, maxConcurrent = 5) {
    this.apiKey = apiKey;
    this.maxConcurrent = maxConcurrent;
    this.activeRequests = 0;
    this.queue = [];
    this.remaining = Infinity;
    this.resetTime = 0;
  }

  async request(endpoint, body) {
    // Wait if we've hit the rate limit
    if (this.remaining <= 1) {
      const waitTime = (this.resetTime * 1000) - Date.now();
      if (waitTime > 0) {
        console.log(`Rate limit reached. Waiting ${Math.ceil(waitTime / 1000)}s...`);
        await new Promise(resolve => setTimeout(resolve, waitTime));
      }
    }

    // Wait if too many concurrent requests
    while (this.activeRequests >= this.maxConcurrent) {
      await new Promise(resolve => setTimeout(resolve, 100));
    }

    this.activeRequests++;
    try {
      const response = await fetch(`https://api.tongorender.io/v1/${endpoint}`, {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'X-API-Key': this.apiKey,
        },
        body: JSON.stringify(body),
      });

      // Update rate limit tracking
      this.remaining = parseInt(response.headers.get('X-RateLimit-Remaining') || '100');
      this.resetTime = parseInt(response.headers.get('X-RateLimit-Reset') || '0');

      if (response.status === 429) {
        const retryAfter = parseInt(response.headers.get('Retry-After') || '30');
        await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
        return this.request(endpoint, body); // Retry
      }

      return response;
    } finally {
      this.activeRequests--;
    }
  }
}

const client = new RateLimitedClient(process.env.TONGORENDER_API_KEY, 5);

Batch Processing Strategies

When processing large volumes, use these strategies to stay within rate limits while maximizing throughput:

Strategy 1: Controlled Concurrency

Process items in parallel with a concurrency limit using p-limit:

const pLimit = require('p-limit');
const limit = pLimit(5); // Max 5 concurrent requests

async function batchGenerate(items) {
  const results = await Promise.allSettled(
    items.map(item =>
      limit(() => generatePDF(item))
    )
  );

  const succeeded = results.filter(r => r.status === 'fulfilled');
  const failed = results.filter(r => r.status === 'rejected');

  console.log(`Completed: ${succeeded.length}/${items.length}`);
  if (failed.length > 0) {
    console.log(`Failed: ${failed.length} items`);
  }

  return { succeeded, failed };
}

Strategy 2: Chunked Processing

Split large batches into smaller chunks with delays between them:

async function processInChunks(items, chunkSize = 10, delayMs = 2000) {
  const results = [];

  for (let i = 0; i < items.length; i += chunkSize) {
    const chunk = items.slice(i, i + chunkSize);
    console.log(`Processing chunk ${Math.floor(i / chunkSize) + 1} of ${Math.ceil(items.length / chunkSize)}`);

    const chunkResults = await Promise.allSettled(
      chunk.map(item => generatePDF(item))
    );

    results.push(...chunkResults);

    // Wait between chunks to respect rate limits
    if (i + chunkSize < items.length) {
      await new Promise(resolve => setTimeout(resolve, delayMs));
    }
  }

  return results;
}

Strategy 3: Queue-Based Processing

For truly large volumes (thousands of items), use a job queue like BullMQ:

const { Queue, Worker } = require('bullmq');
const Redis = require('ioredis');

const connection = new Redis(process.env.REDIS_URL);
const pdfQueue = new Queue('pdf-generation', { connection });

// Add jobs to the queue
async function queuePDFGeneration(items) {
  const jobs = items.map(item => ({
    name: 'generate-pdf',
    data: item,
    opts: {
      attempts: 3,
      backoff: { type: 'exponential', delay: 5000 },
    },
  }));

  await pdfQueue.addBulk(jobs);
  console.log(`Queued ${jobs.length} PDF generation jobs`);
}

// Process jobs with controlled concurrency
const worker = new Worker('pdf-generation', async (job) => {
  const pdf = await generatePDF(job.data);
  await savePDF(pdf, job.data.id);
  return { id: job.data.id, size: pdf.length };
}, {
  connection,
  concurrency: 5,
  limiter: { max: 10, duration: 1000 }, // Max 10 jobs per second
});

Error Handling and Retries

Robust error handling is essential for batch operations. Implement exponential backoff for transient failures:

async function withRetry(fn, maxRetries = 3, baseDelay = 1000) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      const isRetryable = error.status === 429 || error.status >= 500;

      if (!isRetryable || attempt === maxRetries) {
        throw error;
      }

      const delay = baseDelay * Math.pow(2, attempt - 1) + Math.random() * 1000;
      console.log(`Attempt ${attempt} failed. Retrying in ${Math.round(delay)}ms...`);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
}

// Usage
const pdf = await withRetry(() => generatePDF(html), 3, 2000);

Monitoring and Observability

Track your API usage to prevent surprises:

class APIUsageTracker {
  constructor() {
    this.stats = { total: 0, succeeded: 0, failed: 0, rateLimited: 0 };
    this.startTime = Date.now();
  }

  record(status) {
    this.stats.total++;
    if (status === 200) this.stats.succeeded++;
    else if (status === 429) this.stats.rateLimited++;
    else this.stats.failed++;
  }

  report() {
    const elapsed = (Date.now() - this.startTime) / 1000;
    return {
      ...this.stats,
      elapsedSeconds: elapsed,
      requestsPerSecond: (this.stats.total / elapsed).toFixed(2),
      successRate: ((this.stats.succeeded / this.stats.total) * 100).toFixed(1) + '%',
    };
  }
}

Best Practices Summary

Read rate limit headers — Do not guess limits; use the values the API tells you.
Implement exponential backoff — Never retry immediately after a 429 or 500 error.
Add jitter to delays — Random jitter prevents thundering herd problems when multiple workers retry simultaneously.
Use concurrency limits — Never fire unlimited parallel requests. Cap concurrency at the API's documented limit.
Track usage — Monitor your daily and monthly consumption to avoid quota exhaustion.
Fail gracefully — Store failed items for later retry instead of losing them.
Use webhooks — For large batch jobs, prefer async processing with webhook callbacks over synchronous polling.

TongoRender provides generous rate limits, clear response headers, and a batch endpoint for high-volume use cases. By following these patterns, your application will handle any volume reliably.

Scale with TongoRender — 100 free renders per month, no credit card required.