What is a sitemap and why should every publisher have one?

26.01.2022 · By LOYAL AI

A sitemap helps search engines navigate your website in the smoothest way possible. Follow these best web practices to help Google understand your website structure and boost your content’s discoverability. 

Just like a human needs clear signposting to find each article on your website, machines do too. A sitemap is your opportunity to provide search engines with a clear understanding of your website structure and it’s a great way to signal to Google that you have fresh content ready to index and rank.

Although it’s good practice for every website large and small to have a sitemap, it’s especially important if you’re a publishing brand with a large content archive and a team of journalists that regularly produce new content. A strong internal linking strategy coupled with a clear sitemap will give your content the best chance of being discovered, and it will increase your website traffic and overall SEO performance.

In this post we’ll break down exactly what a sitemap is, which types you should be considering as a publisher, and best practices to follow when creating one. 

What exactly is a sitemap?

Just as the name suggests, a sitemap is essentially a map of all of the pages on your website. “Your sitemap should be the most complete tree of all the URLs within your website,” explains Ricardo Ribeiro, Senior Software Engineer at LOYAL AI.

It helps Google connect the dots between pages, understand the relationship between them, and provides important information, such as when each page was last updated so that web crawlers can find new and existing content quickly and easily.

What does a sitemap look like?

Viewing your sitemap for the first time can be a little daunting. You’ll be faced with a long script of code written for machines to easily decode. Fortunately, if you use a CMS like WordPress, you should already automatically have a sitemap created for you which you can view by searching www.yourdomain.com/sitemap.xml.

Here’s an example of LOYAL’s XML sitemap, generated with the Yoast SEO plugin for WordPress. As you can see, we’ve split our pages across seven different sitemaps to make sure the content of each page is as easy to find and read as possible.

XML sitemap generated by YoastSEO

If you click on one of the index sitemaps, you’ll see all of the URLs within that particular sitemap and on the right you’ll find the last modified date. This is key as it tells Google which is the most recent content to crawl and index.

URLs in XML sitemap

Why do you need a sitemap?

If your pages are properly linked then Google should be able to discover your content but a sitemap makes it even easier and is therefore best practice.

Google’s documentation says that sitemaps are beneficial for “websites with large archives,” and “websites which use rich media content.”

If you have a large publishing archive, find out how you can search archived content quickly with LOYAL’s AI-powered tool.

The key benefits of sitemaps to publishers are:

1. Speedy indexation: With a sitemap, search engines will be alerted to new articles much faster and they will therefore be indexed displayed in search results in less time.

2. Pages aren’t missed: Some pages on your website may not have internal links connecting them with other pages. Therefore, search engines won’t be able to find them. With a sitemap you can list all of the pages that you’d like to be crawled.

3. Monitor which pages are being indexed: You can use Google Search Console and your sitemap to monitor which articles are being indexed by Google and identify ways to improve your content’s discoverability.

If you want to learn more about the inner workings of Google Search, check out the Search off the Record podcast. It features key discussions around search and trending conversations within the SEO community.

3 types of sitemaps to consider

There are a few different types of sitemaps and several formats Google supports, including XML, Google News Sitemaps and RSS.

Here are the key ones to consider as a publisher.

Standard sitemaps

These are the largest, most extensive sitemaps that include all of the URLs on your website in XML format.

This is a what a very basic sitemap in XML for one URL looks like:

Basic sitemap in XML for one URL

Once you add all the URLs under each sitemap it’ll look something like this, clearly signposting each article and it’s last modified date to Google:

Complete sitemap with article URLs

You can access a sitemaps report in Google Search Console where you can track its status (we’ll go into a little more detail on how to set this up later). If you need a little more help or have any burning questions, head over to Google’s webmaster help forum section on sitemaps where you can post queries and find a collection of helpful resources. 

You can only have up to 50,000 URLs or a maximum size of 10MB uncompressed in each sitemap so you may want to think about organising your content into several different sitemaps under specific categories.

The Yoast SEO plugin sets the limit even lower at 1,000 URLs so that your sitemap loads as fast as possible. 

Google News Sitemap

This is another type of sitemap every publisher should be aware of if you’re producing new content on a daily basis. Google News Sitemaps contain URLs of all the articles that have recently been published, in the last two days.

Google News sitemaps have some restrictions, no more than 1,000 URLs should be present in a single file. If there are more than 1,000 URLs in a sitemap, break your sitemap into several smaller sitemaps and reference them with a sitemap_index.xml.

Google News sitemap example

The benefits of this type of sitemap is that Google can discover news articles faster which improves content coverage.

Here’s more information on how to create a Google News sitemap and submit it through Google Search Console

RSS feeds

An RSS feed is also useful for publications that publish new content frequently as this type of sitemap contains only the most recent updates to your site, rather than an entire archive of all of your pages.

This means that search engines can quickly pinpoint new content to crawl. Google explains:

For optimal crawling, we recommend using both XML sitemaps and RSS/Atom feeds. XML sitemaps will give Google information about all of the pages on your site. RSS/Atom feeds will provide all updates on your site, helping Google to keep your content fresher in its index.”

Fortunately, every WordPress site has an RSS feed by default (read an introduction to RSS feeds in WordPress) and they offer a way for readers to subscribe and read your content in reader apps like Feedly.

How to submit your sitemap to Google

It’s quick and easy to submit your sitemap to Google, you just need to follow a couple of steps. Firstly, sign in to your Google Search Console account then follow this short tutorial on how to submit a sitemap and track its status with Google Search Console:

We recommend you to set up the Yoast SEO plugin for WordPress and you can find setup instructions here or check out these 10 best XML sitemap generator tools recommended by SEMRush.

Content discovery with LOYAL AI

Here at LOYAL AI we use the latest technologies in order to provide the best, most accurate content extraction solution to date. Our search tools are able to search your entire website archive at speed to retrieve the most relevant information, providing you have sitemaps in place.

Upon your content being discovered we extract and standardise the content within a URL, we then process this data, enrich and index your page/article,” explains Ricardo. 

We are constantly accessing the best practices for content discoverability and extraction. To both help search engines and LOYAL crawl your website, creating and maintaining a sitemap is the type of best practice every publisher should be following, along with strong internal linking.

For more advice on SEO best practice in the newsroom read our blog post A step-by-step guide to SEO in the newsroom and find out how LOYAL’s archive search tool can help you improve your website structure and content discoverability.