Creating multiple sitemaps for 50,000+ URLs

Are you also infuriated when you see a website without an XML sitemap? Or are you mad because you see tens of thousands of URLs in just one sitemap URL? Well, you’re not alone. As SEOs, we all know that sitemaps help assist bots in crawling a website. Thus, making the discovery (and possibly ranking) of your pages faster. 

The Benefits of Multiple Sitemaps

As we all know, Google recommends keeping a single sitemap size of 50MB (uncompressed) and 50,000 URLs.The problem is, that a single XML sitemap makes it impossible to distinguish between pages that are and are not indexed, but, with multiple sitemaps, you can view in Google Search Console how many pages on your site have been submitted vs. indexed.

If your site is small, it’s not that hard to figure out which pages are the problem, but when you have a large website, diagnosing indexing issues can be quite challenging. With the use of multiple XML sitemaps—categorized by the various sections of your website—you’ll be able to determine which pages are causing indexation issues.

How to create a sitemap by category

Step 1.  Crawl the site using an SEO Tool

My go-to tool for crawling websites is Screaming Frog. You can also use other tools like DeepCrawl if this is your preferred tool. 

Note: Given that we’re creating sitemaps for multiple URLs, it is required to use paid tools for this 

Step 2. Extract all URLs from GSC and GA

For GSC, do this:

  1. Go to “Coverage”
  2. Make sure to click only “Valid” and “Excluded”

Note: Why do we include “Excluded”? Because we also need to analyze the URLs that are within the “Crawled – currently not indexed” section and “Discovered – currently not indexed

For “Valid”: Get all the URLs listed under “Submitted and indexed” and “Indexed, not submitted in sitemap” section.

For “Excluded”: Get all the URLs listed under “Crawled – currently not indexed” and “Discovered – currently not indexed” section.

  1. Compile them in Google Sheet or Excel.

For GA, do this:

  1. Select the website account and the appropriate View. In most cases, it’s either the Raw View or the Filtered View based on your settings. If you’re not familiar with these types of views, you can learn more here. <!– Create a blog post about GA views and paste them here →
  2. If you’re using Universal Analytics, go to Behavior > Site Content > All Pages
  3. Choose the number of rows that is greater than or equal to the total number of pages.

Example: You have a total of 10 pages registered in your GA, it’s better to choose either 10 or 25 rows when exporting.

  1. Click Export.

Step 3. Cross-check the data you’ve gathered from GSC, GA, and your favorite SEO Tool.

“Why do I need to gather data from other sources if I can get those in one crawl using (insert SEO tool)?” 

— Probably someone

The reason we’re cross-checking the URLs is that we want to make sure it’s complete.

What do I mean by complete?

Based on my experience, there were instances when Screaming Frog wasn’t able to crawl a few pages but was detected by GSC or GA. We want to be sure that we have all the needed URLs for our sitemap by cross-checking those URLs with different tools. 

To do this, I usually combine the data I found from GSC and GA, then, I remove the duplicate URLs, URLs with parameters, and all pages that don’t need to be indexed. I’ll do the same thing with Screaming Frog — removing the unnecessary URLs.

After that, this is where I cross-check the URLs from all three of them using Excel. In the first column, is where I have all the URLs from GA + GSC. Then, in the second column, is where I have all the URLs from Screaming Frog. Highlight both of the columns then click on Conditional Formatting > Highlight Cell Rules > Duplicate Values.

All the colored cells mean URLs were both seen on GSC + GA data, and Screaming Frog. While the non-colored ones mean these are the URLs that are not seen on either GSC + GA or Screaming Frog. Now, take note of these non-colored ones as these are the ones we’re gonna include in our final compilation.

Step 4. Compile and categorize

On your Excel or Google Sheet, create a new tab where you compile all the needed URLs for your sitemap. What I usually do is get all the needed URLs from the Screaming Frog crawl, then, add the non-colored URLs which are under the GSC + GA column from the previous step.

Once I have all the URLs that I need, this is where I’ll categorize them. I’ll create another tab with each column having different categories.

If you have an e-commerce store, I usually categorize them into four parts: Products, Categories, Blogs, and Pages.

The first three are self-explanatory, but the last one is where I usually include all the miscellaneous pages (About Us, Contact Us, Privacy Policy, etc.).

Once you’ve done this, you’re ready for the final step.

Step 5: Create, upload, and submit.

For each category, you need to create a sitemap. Take a look at this for example:

You’ll see the four categories I’ve mentioned a while ago and the main sitemap. The main sitemap only consists of all the category sitemaps. Take note that the main sitemap is optional. You can upload and submit the category sitemaps in your cPanel and GSC without combining all of them into the main one. To know more about sitemap splitting, you can take a look at Google’s documentation on splitting large sitemaps.

In case you do not know how to create an XML file, you can read about different sitemap creation methods here. <!– Create a blog post different sitemap creation methods and paste them here →

Once you’re done creating your sitemaps, you can now upload them in your cPanel (you can ask your dev for help if you do not know how) and submit them to Google Search Console.

Is It Worth It To Manually Create Multiple Sitemaps?

My straightforward answer is, no. It takes a lot of time and energy to filter data and update multiple sitemaps. I’d rather let a plugin like Yoast do it for me. It can be worth it if you don’t update your site that often, but considering we’re in the SEO game, not updating your site means the total downfall of your website.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *