Guillermo de la Puente

How to make large dynamic sitemaps with Next.js and next-sitemap

Development

Published on / Updated on

RSS Feed RSS Feed

Working on dynamically generated sitemaps
Working on dynamically generated sitemaps

In After, a platform to create beautiful online memorials, each memorial has its own dynamic URL. Each hour, we have dozens of new memorials that need to be indexed for family members and friends to easily find them, so we need to keep our sitemaps up to date.

How to dynamically generate sitemaps for thousands of URLs of user-generated content?


An introduction to sitemaps

  • A sitemap is an XML file containing the list of indexable URLs of a domain.

  • next-sitemap is a library that conveniently generates the sitemap XML document after reading the Next.js build manifests or when given a list of URLs.

Check out some real examples, like the Google sitemap index or the sitemap of this website.


Static sitemaps generated at build time with next-sitemap

Static routes generated at build time are automatically picked up by next-sitemap. That is the case for both static pages or paths generated by getStaticPaths. It works out of the box!

// next-sitemap.config.js

/** @type {import('next-sitemap').IConfig} */
const config = {
  siteUrl: process.env.SITE_URL, // or whatever your domain is
};

module.exports = config;

You may add other options, like paths to exclude, additionalPaths or generateRobotsTxt.

Then, you’d automate its generation after building the project. To do that, simply add it to the postbuild step in package.json:

// package.json
{
  ...
	"scripts": {
    "build": "next build",
    "postbuild": "next-sitemap",
    ...
  },
  ...
}

Adding additional info to a statically generated sitemap

If you need to add additional information to a sitemap, like last modification dates, you might need to call an API endpoint from the sitemap generation config file.

That is the case for the sitemap of this website. I wanted each item to contain lastmod, the date of the last modification, so that Google can crawl again the post pages when they are updated.

Check out the configuration here:

https://github.com/guillermodlpa/site/blob/main/next-sitemap.config.js


How to build large dynamic sitemaps at runtime

Let’s say you have user generated URLs. You might go with pulling all URLs at build time in the next-sitemap config file, but then your sitemaps would only be updated when deploying. So let’s switch the approach to generate them on demand.

You’ll need new routes to render the sitemap index and each of the sitemap pages. Sitemaps should be at the root level, with clean URLs like /dynamic-sitemap.xml and /dynamic-sitemap-0.xml, /dynamic-sitemap-1.xml, etc. Since Next.js doesn’t let us do dynamic page names like dynamic-sitemap-[page].ts, we can leverage rewrites.

Create the following pages:

/pages
  /dynamic-sitemap
    /index.ts <-- this corresponds to the sitemap index
    /[page].ts <-- this corresponds to an individual sitemap

Then, add the rewrites in the Next.js config:

// next.config.js

/** @type {import('next').NextConfig} */
const config = {
  ...
  rewrites: async () => [
    {
      source: '/dynamic-sitemap.xml',
      destination: '/dynamic-sitemap',
    },
    {
      source: '/dynamic-sitemap-:page.xml',
      destination: '/dynamic-sitemap/:page',
    },
  ],
};

next-sitemap provides two APIs to generate server side sitemaps:

  • getServerSideSitemapIndex to generate the sitemap index file.

  • getServerSideSitemap to generate a single sitemap file.

For the index file, we just need to pull the amount of sitemap pages that will exist, and pass their URLs to getServerSideSitemapIndexLegacy.

// dynamic-sitemap/index.ts
// route rewritten from /dynamic-sitemap.xml

const URLS_PER_SITEMAP = 10000;

export const getServerSideProps: GetServerSideProps = async ctx => {
  // obtain the count hitting an API endpoint or checking the DB
  const count = await fetchCountOfDynamicPages();
  const amountOfSitemapFiles = Math.ceil(count / URLS_PER_SITEMAP);

  const sitemaps = Array(totalSitemaps)
    .fill('')
    .map((v, index) => `${getBaseUrl()}/dynamic-sitemap-${index}.xml`);

  return getServerSideSitemapIndexLegacy(ctx, sitemaps);
};

// Default export to prevent Next.js errors
export default function MemorialSitemapIndexPage() {}

For the individual sitemaps, we need to fetch their corresponding page and pass the URLs getServerSideSitemapLegacy.

// dynamic-sitemap/[page].ts
// route rewritten from /dynamic-sitemap-[page].xml

const URLS_PER_SITEMAP = 10000;

export const getServerSideProps: GetServerSideProps<
  any,
  { page: string }
> = async ctx => {
  if (!ctx.params?.page || isNaN(Number(ctx.params?.page))) {
    return { notFound: true };
  }
  const page = Number(ctx.params?.page);

  // this would load the items that make dynamic pages
  const response = await fetchDynamicPagesForSitemap({
    page,
    pageSize: URLS_PER_SITEMAP,
  });

  const total = response.data.pageData.total;
  const totalSitemaps = Math.ceil(total / URLS_PER_SITEMAP);

  if (response.data.items.length === 0) {
    return { notFound: true };
  }

  const fields = response.data.items.map(items => ({
    loc: `${getSiteUrl()}/${memorial.slug}`,
    lastmod: items.created_at,
  }));

  return getServerSideSitemapLegacy(ctx, fields);
};

// Default export to prevent next.js errors
export default function MemorialSitemapPage() {}

Caching the dynamic sitemaps

Since the sitemaps are hitting our API or DB to load a lot of items, we don’t want to execute those queries too often.

With the Cache-Control header, Next.js allows caching at the framework level the result of server-side functions, including getServerSideProps. It works automatically when deployed to Vercel. Otherwise, you’ll need to set it up with Redis or similar.

...

const cacheMaxAgeUntilStaleSeconds =  60; // 1 minute
const cacheMaxAgeStaleDataReturnSeconds =  15 * 60; // 15 minutes

ctx.res.setHeader(
  'Cache-Control',
  `public, s-maxage=${cacheMaxAgeUntilStaleSeconds}, stale-while-revalidate=${cacheMaxAgeStaleDataReturnSeconds}`
);

return ...

Learn more about Vercel caching here. Note that the response size can’t exceed 10 MB!


Real-world example

Check out the dynamic sitemap index at After, https://after.io/memorial-sitemap.xml, and one of the sitemap pages, https://after.io/memorial-sitemap-0.xml. These are generated on the fly and cached with the strategy explained above!

Photo of dynamically generated sitemap
Photo of dynamically generated sitemap

Back to all posts