XML sitemaps and SEO: What you need to know

XML sitemaps just turned 18 in June, and in honor of its full-fledged introduction into adulthood, I’m breaking down how XML sitemaps work, how to build one, and why XML sitemaps offer a boatload of SEO benefits. 

What is an XML sitemap?

In a nutshell, an XML sitemap is a protocol in XML format that tells search engines about the URLs on a website available to crawl.

XML sitemaps were meant to share information about each URL, like:

When it was last updated.

How often it changes.

How important it is in relation to other URLs across your website. 

XML sitemaps allow search engines to crawl your website more efficiently if certain URLs are isolated. 

If you’re an SEO professional, you probably already use XML sitemaps built into your Yoast WordPress plugin. But there are plenty of extras that you might be missing. 

How to build a dynamic XML sitemap

If you’re using WordPress, leverage SEO plugins like Yoast and RankMath to generate a dynamic XML sitemap. 

If you’re not using WordPress, follow these steps to build a dynamic XML sitemap. 

Understand your navigational structure 

Pull together a list of all our URLs on the website and group them to align to navigational structure and content groupings. 

Partner with your developer team 

Share the list of URLs grouped together with your dev team to gain their help in creating a dynamic XML sitemap using Python, PHP, or JavaScript. 

Here is an example of how to code a dynamic XML sitemap using Python: 

import xml.etree.ElementTree as ET

import datetime

# List of URLs (replace these with your actual URLs)

urls = [

    “https://www.example.com/page1”,

    “https://www.example.com/page2”,

    “https://www.example.com/page3”,

    # Add more URLs here

]

# Create the root element

urlset = ET.Element(“urlset”)

urlset.set(“xmlns”, “http://www.sitemaps.org/schemas/sitemap/0.9”)

# Loop through URLs and create URL entries

for url in urls:

    url_elem = ET.SubElement(urlset, “url”)

    loc_elem = ET.SubElement(url_elem, “loc”)

    loc_elem.text = url

    lastmod_elem = ET.SubElement(url_elem, “lastmod”)

    lastmod_elem.text = datetime.datetime.now().isoformat()

    changefreq_elem = ET.SubElement(url_elem, “changefreq”)

    changefreq_elem.text = “daily”  # You can set this to the desired frequency

    priority_elem = ET.SubElement(url_elem, “priority”)

    priority_elem.text = “0.5”  # You can set this to a value between 0 and 1

# Create the XML tree

tree = ET.ElementTree(urlset)

# Save the XML to a file

tree.write(“sitemap.xml”, encoding=”utf-8″, xml_declaration=True)

When working with developers, I’ve leveraged Flask and Python together to create a truly dynamic XML sitemap that changes whenever the website’s content changes. 

If we can define the dynamic sitemap route based on the website’s content using a function like get_dynamic_urls(), we should be able to retrieve all URLs. 

@app.route(‘/sitemap.xml’, methods=[‘GET’])

def sitemap():

    urls = get_dynamic_urls()  # Replace with your dynamic URL retrieval logic

    urlset = ET.Element(“urlset”)

    urlset.set(“xmlns”, “http://www.sitemaps.org/schemas/sitemap/0.9”)

    for url in urls:

        url_elem = ET.SubElement(urlset, “url”)

        loc_elem = ET.SubElement(url_elem, “loc”)

        loc_elem.text = url

        lastmod_elem = ET.SubElement(url_elem, “lastmod”)

        lastmod_elem.text = datetime.datetime.now().isoformat()

        changefreq_elem = ET.SubElement(url_elem, “changefreq”)

        changefreq_elem.text = “daily”

        priority_elem = ET.SubElement(url_elem, “priority”)

        priority_elem.text = “0.5”

    xml_content = ET.tostring(urlset, encoding=”utf-8″)

    return xml_content, {‘Content-Type’: ‘application/xml’}

If you’re sharing code across multiple domain properties, ensure the files are coded properly on your HTML pages. 

For example, your CSS and JavaScript files should be linked using <link> and <script> tags. 

Before sharing the code across different domains, you’ll need to create a centralized server to host the shared XML sitemap rules. 

You will have multiple phases of your dynamic XML sitemap with your developers until all the rules get confirmed. 

Don’t forget to check the dynamic URLs are actually getting retrieved and integrated into your sitemap logic. 

Upload to your server 

Once your XML sitemap is compressed, upload your dynamic XML sitemap to your website’s root directory. 

Submit to search engines 

Let search engines know about your XML sitemaps by submitting them through Google Search Console and Bing Webmaster Tools. 

Ongoing maintenance 

An XML sitemap is not just a “set it and forget it” SEO tactic. It’s essential to update your XML sitemap as your website changes. 

15 tips to keep in mind when building an XML sitemap

1. XML sitemaps are not a ranking factor

When asked on X (previously Twitter) if there is a ranking advantage to using XML sitemaps, Google’s Gary Illyes responded:

“No, not direct anyway.”

2. Google can find your pages without a sitemap

Websites don’t need a sitemap if they are small and linked properly, Daniel Waisberg, Search Advocate at Google, shared in a Google Search Console training video. 

However, he followed up by saying a sitemap might be especially helpful if: 

The site is really large.

The site’s pages are isolated.

The site is new or changes quickly.

Also, it’s important to note that Google’s John Muller stated on X (previously Twitter) that having an XML sitemap is a “minimal baseline for any serious website.”

It seems like something to fix regardless. Making a sitemap file automatically seems like a minimal baseline for any serious website, imo.

— John Mueller (official) · #MaybeABot (@JohnMu) November 10, 2020

3. Google checks to see if your XML sitemap has changed before reprocessing

Google does check to see if your XML sitemap has changed before it wastes resources parsing and processing the file again, Illyes confirmed. 

He stated in an SEO Office Hours

“As soon as you change something in your sitemap, be that the URL element or last mod, the sitemap will be parsed again and generally reprocessed. That doesn’t mean that the URLs will be surely crawled of course they are still subject to the quality evaluations like any other URL. It’s also worth to mention that if you remove a URL from the sitemap because perhaps it doesn’t exist anymore that doesn’t mean it’s automatically going to be dropped from the index or even prioritized for crawling so it can be dropped sooner.”

4. XML sitemaps can get your new content indexed quicker

Mueller confirmed that telling Google your XML sitemap is updated can help get your new content indexed quicker. 

He shared

“Pinging a sitemap is a bit different than just providing it in the robots.txt — by pinging, you’re actively flagging a change in the sitemap file. If you’re keen on having new content indexed quickly, that’s a good practice.”

@JohnMu for a news site – that uses RSS feeds, as well as a news XML sitemap – and specifies the XML sitemap locations in the robots.txt – should it also use the ping tool option? Or is the above 3/4 sufficient to notify Google of new/updated content?

— Dan Smullen (@dansmull) October 12, 2022

5. XML sitemaps must be UTF-8 encoded 

If your XML sitemap isn’t using UTF-8, search engines can’t read it. 

6. Keep the file size under 50 MB (uncompressed) or 50,000 URLs

There are size limits to your XML sitemaps. Keep it under 50,000 URLs per sitemap or 50 MB (uncompressed). 

If you have a larger file size or more URLs, use multiple XML sitemaps. 

7. You can have multiple XML sitemaps

Google lets you submit multiple sitemaps. Splitting up your sitemaps into different sections of your website to measure performance can be useful. 

For example, if you’re a large ecommerce website, you can segment your XML sitemaps by product type (i.e., women’s shoes vs. men’s shoes). 

Or, if you have multiple languages, you may want an hreflang related to your <loc> XML sitemap. 

If you have alternate languages, it does not count toward the 50,000 URL max. 

Or, if you have a lot of videos or images, you could separate your XML sitemaps into a video XML sitemap and an image XML sitemap. 

There is no rank benefit to having multiple XML sitemaps. 

8. If you have multiple XML sitemaps, remember to use a sitemap index file 

If you have multiple XML sitemaps for a large site, you’ll want to submit a sitemap index file to submit many sitemaps at once. 

9. The lastmod tag should be flawless

The <lastmod> tag is one of the most essential parts of your XML sitemap. 

For instance, my structured data leverages the <lastmod> date of my XML sitemap. 

The <lastmod> tag also signals to search engines whether you’ve refreshed old content or launched new content. 

Mueller confirmed this by stating:

“…If you’re just changing the timestamp  footer, sure, the page has technically changed too, but should that be prioritized – probably not.”

Are you providing something new for search engines that you’d like reflected in search? If so, flag it. If you’re just changing the timestamp in the footer, sure the page has technically changed too, but should that be prioritized — probably not.

— John Mueller (official) · #MaybeABot (@JohnMu) February 24, 2023

Bing has even made an official stance on the lastmod tag in XML sitemap, stating:

“For XML sitemaps, one of the most critical tags you can include in your sitemap is the “lastmod” tag.” 

10. XML sitemaps should only include 200 status URLs that are all self-canonicalized

Remember to keep only live URLs, 200 status, indexable and self-canonicalized. 

If you continue to showcase 404 or 301 redirected pages, search engines could stop crawling your XML sitemap entirely. 

Mueller responded to a comment on X (formerly known as Twitter):

“We’d stop fetching sitemap files if the URLs are invalid, but if you’re returning content or redirecting (which is kinda recommended), we’ll keep trying them. It shouldn’t cause problems, since overall sitemap files are only a tiny-tiny part of all URLs fetched from a site.”

We’d stop fetching sitemap files if the URLs are invalid, but if you’re returning content or redirecting (which is kinda recommended), we’ll keep trying them. It shouldn’t cause problems, since overall sitemap files are only a tiny-tiny part of all URLs fetched from a site.

— John Mueller (official) · #MaybeABot (@JohnMu) January 16, 2019

11. Search engines can check your XML sitemap daily

Fabrice Canel from Microsoft shared that Microsoft checks your XML sitemap daily

Sitemaps is a great way to tell search engines all the relevant URLs on your site. We attempt looking at sitemaps once a day. Do you like to wait 2+ days to see your content indexed? IndexNow https://t.co/glgkf4OfsG allows you to have your content index now, not in few days.

— Fabrice Canel (@facan) October 18, 2021

While Mueller shared that Google checks your XML sitemap based on frequency (a.k.a., your <lastmod> tag). 

It’s algorithmic – some files never change, some change frequently. Also, the ping-functionality is a good way to let us know about changes. (And, not all sitemaps lead to useful / relevant / high-quality content, unfortunately)

— John Mueller (official) · #MaybeABot (@JohnMu) October 19, 2021

12. Page speed indirectly impacts your XML sitemap

We all know how vital page speed is to our SEO efforts. But how does a slow-performing website impact your XML sitemap?

Mueller has confirmed page speed should not impact your XML sitemap by stating:

“As long as it doesn’t time out, you should be fine. Maybe cache the latest version to prevent unnecessary server load?”

As long as it doesn’t time out, you should be fine. Maybe cache the latest version to prevent unnecessary server load?

— John Mueller (official) · #MaybeABot (@JohnMu) September 13, 2017

The truth is, I’ve worked on slow-performing websites where the XML sitemap is not loading when search engines crawl the website causing major ranking fluctuations. 

There is an indirect impact on your XML sitemap if your website does not load properly. 

This is why it’s important to use Gzip compression to reduce file size and speed up loading times. 

13. Remove <priority> tag from your XML sitemaps

Illyes compared the <priority> tag in XML sitemaps to “a bag of noise.” 

we ignore those. It’s essentially a bag of noise

— Gary 鯨理/경리 Illyes (so official, trust me) (@methode) March 28, 2017

So if you’re depending on search engines to crawl your URLs in your XML sitemap based on priority, you’ll be waiting a long time. 

I recommend removing this tag from your XML sitemaps. 

14. If you have navigational structure issues, pair your XML sitemap with an HTML sitemap

There have always been mixed feelings about HTML sitemaps

Mueller shared that HTML sitemaps “should never be needed.” 

He continued:

“I changed my mind on HTML sitemaps over the years, they should never be needed. Sites small & large should always have a clear navigational structure. If you feel the need for a HTML sitemap, spend the time improving your site’s architecture instead. HTML sitemaps fit into the same bin with HTML loading spinners.”

<iframe src=”https://mastodon.social/@johnmu/109477617298107922/embed” class=”mastodon-embed” style=”max-width: 100%; border: 0″ width=”400″ allowfullscreen=”allowfullscreen”></iframe><script src=”https://mastodon.social/embed.js” async=”async”></script>

When working on large websites at an enterprise level, there are often multiple layers of decision-makers on a website navigation structure. And oftentimes, SEO best practices are taken into consideration. 

I pair both my XML sitemap with an HTML sitemap until I’m confident in our website navigation and internal linking hierarchy. 

15. Don’t forget to include all your XML sitemaps in your robots.txt file

Don’t forget to include your XML sitemaps (all of them) as links in your robots.txt file. 

Here is an example: 

Don’t worry about where in the robots.txt file these should go. There is no impact on the location of your XML sitemap in your robots.txt file

Mueller confirmed this: 

No, it doesn’t matter.

— John Mueller (official) · #MaybeABot (@JohnMu) September 23, 2021

XML sitemaps are the blueprint of your website for search engines

Whether you’re launching a new webpage or just adding a canonical tag to a duplicate page, your XML sitemap is something you should update 24/7.

But do so with caution. 

While you shouldn’t be paranoid, your XML sitemap is the blueprint of your house for search engines. 

So don’t forget to submit your XML sitemaps to Google Search Console and Bing Webmaster Tools.

The post XML sitemaps and SEO: What you need to know appeared first on Search Engine Land.