Dynamic Sitemap Generation

A static XML sitemap goes stale the moment content scales across nested taxonomies, locales, and draft states — in a headless stack the sitemap is a build artifact, not a file. This guide treats it as a data pipeline: a status-filtered CMS query, framework-native route enumeration, and edge caching that keeps the sitemap fresh without rebuilding the site. The goal is to feed crawlers only canonical, published URLs. It’s a building block of Localization & SEO Optimization; the implementation lives in Generating XML sitemaps from headless CMS routes.

Route Discovery & CMS Queries

The sitemap moves through four stages — a status-filtered query, route enumeration, edge caching, and post-build validation — each feeding the next.

flowchart LR
  Q["Status-filtered CMS query (published, locale, slug)"] --> E["Framework route enumeration per locale"]
  E --> X["Serialize / stream XML"]
  X --> C["Edge cache: stale-while-revalidate"]
  C --> V["Post-build audit: cross-check live routes"]
  V --> Crawl["Crawlers see canonical published URLs"]

Start with a deterministic fetch of every routable entity. Contentful, Sanity, and Strapi expose GraphQL or REST endpoints built for bulk retrieval. Filter strictly by publication status, locale, and slug; exclude drafts and archived entries unless you’re targeting a preview environment. Request only slug, updatedAt, locale, and changefreq via projection queries, and flatten nested structures at the query layer to avoid expensive client-side recursion.

GROQ
*[_type in ["post", "page", "category"] && defined(slug.current) && status == "published"] {
  "slug": slug.current,
  "type": _type,
  "lastmod": _updatedAt,
  "locale": coalesce(locale, "default"),
  "priority": select(
    _type == "page" => 1.0,
    _type == "post" => 0.8,
    0.5
  )
}

Missing localized routes need Content Fallback & Routing so fallback URLs don’t pollute the index. Run queries against read-optimized CDN endpoints with retry logic for transient failures.

Framework Implementation

The framework dictates how the sitemap reaches crawlers. In the Next.js App Router, generateSitemaps() and generateStaticParams() enumerate routes programmatically: return locale identifiers, fetch per locale, and stream the XML response to avoid memory spikes on large builds.

TypeScript
// app/sitemap.ts
import { MetadataRoute } from 'next';
import { createClient } from '@sanity/client';

const client = createClient({
  projectId: process.env.SANITY_PROJECT_ID!,
  dataset: 'production',
  apiVersion: '2024-01-01',
  useCdn: true,
});

export async function generateSitemaps() {
  return [{ id: 'en' }, { id: 'es' }, { id: 'fr' }];
}

export default async function sitemap({ id }: { id: string }): Promise<MetadataRoute.Sitemap> {
  const query = `*[_type in ["post", "page"] && locale == $locale && defined(slug.current) && status == "published"] {
    "url": "/" + $locale + "/" + slug.current,
    "lastModified": _updatedAt
  }`;

  const routes = await client.fetch(query, { locale: id });

  return routes.map((route: { url: string; lastModified: string }) => ({
    url: route.url,
    lastModified: new Date(route.lastModified),
    changeFrequency: 'weekly',
    priority: 0.8,
  }));
}

Nuxt 3 uses server routes or @nuxtjs/sitemap; Astro uses getStaticPaths() plus community plugins. Whatever the framework, explicit locale routing prevents duplicate indexing — and Route Mapping for Multilingual Sites keeps hreflang annotations aligned with sitemap entries.

Caching & Edge Delivery

Sitemap endpoints trade freshness against CDN efficiency. Use stale-while-revalidate with a conservative max-age (around 1 hour) to absorb traffic without serving stale data indefinitely. At scale, pre-generate sitemaps in CI/CD and persist them to object storage (S3, Cloudflare R2) to shed origin load — which pairs with Incremental sitemap regeneration for dynamic CMS routes so invalidation fires only when a content type actually changes.

Google’s sitemap guidelines cap a sitemap at 50MB uncompressed and 50,000 URLs. Past that, use a sitemap index (sitemap_index.xml) referencing locale- or type-chunked sitemaps, and let edge functions serve the right chunk by Accept-Language or query parameter.

Validation

Generating the sitemap is half the job; keeping it accurate needs continuous checks. Validate XML structure, URL accessibility, and lastmod formatting in the deploy pipeline, and lint for mixed HTTP/HTTPS, trailing-slash inconsistencies, and preview URLs leaking into production. Run Automated SEO audits for headless CMS deployments post-build to cross-reference sitemap URLs against live routes, validate robots.txt, and confirm canonical alignment. The Next.js sitemap file convention covers framework-level generation.

Conclusion

Strict query projections, framework-native routing APIs, and edge-aware caching turn the sitemap from a stale file into a live reflection of your published content graph. Paired with fallback routing and post-build audits, content velocity stops costing you crawl efficiency.