CEMEX Technical SEO Best Practice – Page One
Page One Covers; Site Architecture, HTTP Status Codes, Redirects, Duplicate Content and Canonicalization, HTTPS and Handling Old Content
Technical SEO - Page One - Index
The way a website is structured can have a significant impact on how well its pages rank in Google. There are a number of reasons for this:
- A well-organised site structure makes for a good user experience, which means lower bounce rates, longer time-on-page, better CTRs, etc. – all factors Google’s algorithm takes into account when ranking pages
- The better the site structure, the easier it is for Google to access, crawl, index, and return the pages of a site
- Having a good site structure increases your chances of having sitelinks show up in SERPs:
Sitelinks are beneficial for increasing click-through rates, pointing users to relevant information and improving the navigability of your site.
- Choose a logical site hierarchy – keep it simple for the sake of users and search engine crawlers, and make sure that your most important pages are at the top level of your navigation
- Create a descriptive URL structure that aligns with your navigation hierarchy – e.g. example.com/page/sub-page
- Develop a solid approach to internal linking – this helps establish an information hierarchy, improves semantic relevancy and builds link equity
- Include the main navigation in the header of the website
- Use an XML sitemap
Site structure plays a major role in SEO, and reorganizing navigational elements further down the track isn’t always easy. With that in mind, site architecture should be carefully considered and planned ahead when developing any website.
HTTP STATUS CODES
HTTP (Hypertext Transfer Protocol) status codes are 3-digit codes issued by a server in response to a client’s request made to the server. We can think of these codes as a conversation between your browser and the server.
These codes tell us whether things between your browser and server are:
- All good
- Touch and go
For SEO, it’s important to understand the different types of status codes and how they impact your SEO efforts. Learning how to use and respond to these codes will help you to diagnose site errors quickly, allow search engines to crawl your site efficiently and improve site performance.
How to Monitor:
In order to monitor your site and understand what HTTP status codes exist, you can use a crawler such as Screaming Frog or Deepcrawl. Their crawl reports provide useful information about the HTTP status codes the crawler receives from the site and pages that it attempts to crawl.
Different Types of Status Codes
The first digit of each three-digit status code begins with one of five numbers, 1 through 5; each of those ranges encompasses a different class of server response:
- 1xxs – Informational responses: The server is thinking through the request
- 2xxs – Success! The request was successfully completed
- 3xxs –Redirection: You got redirected somewhere else
- 4xxs – Client errors: Page not found. The site or page couldn’t be reached
- 5xxs – Server errors: Failure. A valid request was made but the server failed to complete the request
Despite the potential for lots of codes, only a handful of them are common and need to be focussed on for SEO:
200 – OK
This is your ideal status code and represents a properly functioning page. You don’t need to do anything and technically everything is just as it should be.
301 – Permanent Redirect
A 301 redirect should be utilized any time one URL needs to be redirected to another permanently. This status means visitors and bots that land on that page will be passed to the new URL. Link equity (all the authority from links pointing to your site) is also passed to the new URL through a 301 redirect. A 301 redirect remains the preferred method of choice for permanent page redirects.
HTTP Status Code 302 – Temporary Redirect
A 302 redirect is like a 301 in that visitors and bots are passed to the new page. We do not recommend using 302 redirects for redirection as it’s likely you’ll lose link equity. Depending on the size of your site, it’s wise to regularly crawl (check) you site to ensure you don’t have unwanted 302s and switch them to 301s when appropriate.
HTTP Status Code 404 – Not Found
404 means the resource could not be found at this moment in time. Sometimes this can be a temporary problem and because of this search engines will often try to grab the content from a particular URL again even if it responds with a 404.
As with 302’s it’s worth monitoring what 404’s exist on your website so that you can 301 redirect the user to appropriate content as having many unwanted 404s is a big problem for SEO. We also recommend creating a custom 404 page that features further site navigation so that when a 404 does occur, your visitors get the best user experience possible and can continue to navigate your site.
HTTP Status Code 410 – Gone
A 410 is more permanent than a 404; it means that the page is gone. The page is no longer available from the server and no forwarding address has been set up. Any links you have on your site that are pointing to a 410 page are sending bots and visitors to a dead resource, so keep an eye out for unwanted 410s, though bare in mind they are useful if you want to permanently get rid of page.
HTTP Status Code 500 – Internal Server Error
A 500 error indicates a problem with the server. Both human visitors and bots alike will be lost, and your link equity will also be negatively impacted, so if your website is returning 500 errors at all this needs to be addressed.
HTTP Status Code 503 – Service Unavailable
A server sends a 503 message when it is unable to handle the request due to an outage or overload; users and bots are essentially asked to come back later. This could be due to temporarily overloading the server or maintenance of the server. This status code should be used during planned server maintenance as search engines will know to come back later.
Useful Reference Resource
A complete list of status codes can be found on the W3C website.
A redirect sends both users and search engines (bots) to a different URL from the one they originally requested. Ensuring redirects are implemented correctly will help ensure link equity is distributed across to the new location, and that the user or bot isn’t sent to a dead page.
Key Types of Redirects
301 – Permanent Redirect
A 301 redirect is a permanent redirect which passes between 90-99% of link equity to the redirected page. It’s the SEO’s go to redirect of choice. It’s also generally the best option for redirecting any URLs.
In addition, it’s important to note that any page you are 301 redirecting, must be relevant to the new destination. If you redirect to an irrelevant page, this is a signal of poor quality and could have a negative effect on site performance. If there is no relevant equivalent page, and the page needs to be removed, we suggest using a 410 to simply remove the page.
302 – Temporary Redirect
Despite some Google representatives suggesting that 301s and 302s may be treated equally in the eyes of the search engines, we still recommend using 301s whenever possible to redirect content. 307s are another, more recent, example of a type of temporary redirect where a 301 is generally preferable.
Meta refresh is a method of instructing a web browser to automatically refresh the current web page or frame after a given time interval and are executed on the page level rather than the server level. These redirects typically offer poor user experience as often prevent the “back” button from working, are usually slower, lose link equity and so not a recommended SEO technique.
Whenever you delete a page, change your URL structure or switch to a new domain, you are going to have to redirect your URLs. Redirecting URLs can be complex, especially if it’s a big site migration, and it’s essential to be organised – executed poorly and it can massively impact your SEO visibility. Ensuring good planning and processes are in place will help avoid common pitfalls like redirect chains, redirect loops, or missed URLs.
DUPLICATE CONTENT AND CANONICALIZATION
Duplicate content can be a huge barrier to SEO success, resulting in the wrong version of a page being indexed or may dilute the ranking potential of your site entirely. If you don’t explicitly tell Google which URL is the “canonical”, Google will make the choice for you, might consider them both of equal weight, or ignore and not rank either, which might lead to unwanted results.
Problem with URLs
Publishing duplicate content on your site is easier than you might think. For search engines, the important thing to remember is that every unique URL is a separate page, even if the content is the same on the page, and the URL is similar.
For example, search crawlers will identify each of these pages as different, even if we see them as a single page:
In addition, many CMS’, e-commerce platforms and dynamic, code-driven websites make the problem worse by automatically adding tags, allowing multiple paths and URLs to the same content, and add URL parameters for searches, sorts, currency options, etc. Many large sites have thousands of duplicate URLs without realising it.
what is The Canonical TAG?
The canonical tag (aka “rel canonical”) tells Google and other search engines which version of a URL you want to appear in the search results. This can be an extremely effective tool in battling identical or duplicate content appearing on URLs.
The good news is a canonical tag is an effective solution if implemented correctly, offering webmasters a lot of control when it comes to your content.
When to USE A CANONICAL?
- If the on-page content is extremely similar or exactly duplicate e.g. multiple colors of the same product.
- If the content is serving the same (or nearly the same) searcher intent e.g. the same data as but in a different order based on user selections.
- If you’re republishing, refreshing or updating old content. e.g. a Xmas campaign page.
- If content, a product, an event, etc. is no longer available and there’s a near best match on another URL.
Here are the key things to remember when implementing canonical tags:
1. Proactively canonicalize your home-page
Homepage duplicates are very common, being available under common URLs such as /home or via www, non-www, http and https versions. It’s usually a good idea to put a canonical tag on your homepage template to prevent unforeseen problems.
2. Manually check your dynamic canonical tags
Some sites and ecommerce platforms write a different canonical tag for every version of the URL. This isn’t helpful and misses the point of what the tag should be used for. Make sure to manually review your URLs, especially on e-commerce and CMS-driven sites using a crawler like Screaming Frog.
3. Avoid common pitfalls
As with 301s, unless implemented correctly – following an organised process – many people make mistakes when implementing canonicals. These include:
- Canonicalizing page A -–> page B and then page B -–> page A
- Canonicalizing page A -–> page B and then 301 redirect page B –> page A
- Creating chain canonical tags (A-–>B, B-–>C, C–->D)
4. Be careful canonicalizing near-duplicates
Don’t use canonical tags across pages that are too different. It is possible to use the canonical tag on near-duplicates i.e. pages with very similar content such as a product page that only differs by currency, location, or some product variation, but bear in mind that if the pages are too different, search engines may ignore the tag.
5. Canonicalize cross-domain duplicates
If you control both sites, you can use the canonical tag across domains, but there should be a strong case for doing so.
6. Canonical tags can be self-referential
It’s ok if a canonical tag points to the current URL i.e. it can point to itself.
Google explains in detail how to consolidate duplicate URLs here
HTTPS, or the secure server protocol, is a secure way of transferring data between web servers and browsers, and is effectively an encrypted way of viewing web pages that prevents that exchange from being intercepted during its route across the internet. HTTP was the standard way of browsing the web, non-secure way of browsing pages and was used by default until it was paired with the Secure Socket Layer (SSL) to form HTTPS, making both protocols function in the same way.
From an SEO perspective, secure pages became more of a prominent issue when Google changed over to using the HTTPS protocol for its search results in August 2014 and publicly announced that it was going to use secure servers as a ranking signal. Google claims that web sites who use HTTPS will have a small ranking benefit because of the more secured nature of the websites that use it, however sites using HTTPS will only have the benefit of a “very lightweight signal” within the overall ranking algorithm, carrying less weight than other signals such as good quality content.
Duplicate Content Issues
Many web servers are configured for both HTTP and HTTPS, allowing it’s pages to be visited using either protocol which ultimately resulted in a duplicate content issue as both were considered separate websites. For this reason it’s recommended that you choose one protocol and use it across your entire site.
Referral Data Loss Considerations
Issues can arise when tracking users as they move through your site using both protocols. This isn’t so much of a problem if their activity is tracked via cookies, but it can cause problems with analytics as a users’ referrer (the URL they’ve arrived from) isn’t tracked when users move from non-secure HTTP to an HTTPS server for security reasons. This means that when users move from a ‘logged into’ area to non-secure content or back again, their visits are counted as direct visits rather than existing users simply having moved around your site. The list below shows the different protocol referral circumstances under which referral data is lost or retained:
- HTTP to HTTP >>> Referral data passed
- HTTP to HTTPS >>> Referral data passed
- HTTPS to HTTP >>> Referral data lost
- HTTPS to HTTPS >>> Referral data passed
Benefits of Using HTTPS
HTTPS offers higher levels of site security while also offering the following additional potential benefits:
As stated, Google has confirmed the slight ranking boost of sites using HTTPS. However as with the majority of ranking factors, it’s exact benefits are very hard to isolate, but this is still something to consider. In addition, the benefits of switching to HTTPS are likely to increase over time.
As discussed above, traffic that passes to and from an HTTPS site preserves the referral data.
Security and privacy
HTTPS adds security to a website in several ways:
- It verifies that the website is the one the server it is supposed to be talking to
- It prevents tampering by third parties
- It makes the site more secure for visitors
- It encrypts all communication, including URLs, which protects things like browsing history and log in details
Switching to HTTPS
The following tips are provided as best practice when switching to HTTPS:
- Decide the kind of certificate required
- Options include single, multi-domain, or wildcard certificate
- Security key strength
- Google recommends 2048-bit key certificates.
Internal Linking Considerations
- Use relative URLs for resources that reside on the same secure domain
- Use protocol relative URLs for all other domains, i.e. explicit full URLs including HTTP or HTTPS where required.
HANDLING OLD CONTENT
There is no one set answer for how to determine what to do with old content on your site. It’s a strategic decision that should be made on a case by case basis. There are some specific best practices and scenarios however, that are important to mention from an SEO point of view:
Does the Content Have Value?
When making any decision about what to do with content, the first step is to look at how it’s performing, e.g. does it receive lots of traffic? Or has it got lots of quality links pointing to it? If so, the likelihood is that you’ll want to capture some of this link equity by redirecting or repurposing the page. On the flip side, if it’s poorly performing old content it’s likely you’ll need to take a different action.
Options for Old Content
301 – Permanent Redirect
This will be the likely scenario for the majority of old content especially if the new content is similar i.e. relevant to the user. It will pass link equity and be best practice for user experience. We would recommend using a 301 redirect to the most relevant page whenever handling old content as the first choice if possible. As well as maintaining link equity, it also demonstrates to search engines that your site is well-maintained and up-to-date.
Content should now be thought of as a ‘quality over quantity’ game. If you are effectively pruning bad content, you are making your site more efficient for Google to crawl, condensing SEO authority and helping bots to focus on more important pages.
If the decision has been taken to delete old content, rather than redirect for example, then the options are a 404 or 410 status code. Google recommends using a 404 if it’s a temporary removal of content – but if the page is permanently gone and you know no other page that would substitute it and you don’t have anywhere else that you should point to, then serve a 410. Scenarios might include old campaign pages that are no longer relevant, or low performing blog posts from 5 years ago.
Repurposing & Refreshing Old Content
There are also occasions where the content on the page should just be refreshed without content being deleted or redirected. An example of this could be updating seasonal pages such as a xmas product page or valentines page. If you delete the page each time and start from over every year, then you are building authority from scratch annually. In addition, repurposing or refreshing content is a better option than a 301 redirect, as otherwise, over the course of a few years, you would end up with 301 chains across each page e.g. xmas 2014 >301> xmas 2015 >301> xmas 2016 >301> xmas 2017.
Leaving Pages Alone
Whilst not the most obvious option, some ecommerce sites do this for successful products that have gone out of stock or have been discontinued. If the product has been discontinued, it means you’re able to inform the user of something useful and may be able to continue their journey. We wouldn’t recommend this be done regularly, but there are examples where it works.
Like to know more?
Email: Email us
Croud Inc Ltd, Trinity,
39 Tabernacle Street, London,
Email: Email us
450 Broadway, 2nd Floor
New York, NY, 10013
Email: Email us
Croud Australia Pty Ltd,
Belmont House, 26-28 Wentworth Avenue
Surry Hills, NSW, 2010
Email: Email us
Croud Inc Ltd, The Chancery,
Abbey Lawn, Shrewsbury,