404 error codes happen all the time, but Soft 404s are slightly different from those error pages that we navigate by mistake.
Another aspect of technical SEO that is generally considered to be crawl errors is 404 software. The page is working fine for users, so we can skip it and continue, right?
What is a Soft 404?
Rather than being a real 404 as defined by the HTTP response, a soft 404 occurs when a page is actually a different response code but is treated as a 404.
How does this differ from a “regular” 404?
A 404 soft is most often a 200 response page which will act as a custom 404 page. Sometimes it can be classified as a 404 by crawlers if the page exists but has little or no content on the page – like empty pages, old categories, or template pages.
It can be a bit confusing if you’re unfamiliar with response codes and how they’re read, but Google puts it like this:
“It’s like a giraffe with a name tag that says ‘dog’. Just because the name tag says it’s a dog doesn’t mean it’s actually a dog. Likewise, just because a page says 404 does not mean that it returns a 404 status code. «
So if you imagine a page as having 2 display instances, it is a 404 page – one for the user and one for a crawler.
404 error – looks like a 404 page to users, and crawlers see an HTTP 404 response.
Soft 404 error – may or may not look like a 404 page to users, and crawlers see a 200 response code but it think the page should be a 404.
Soft 404 examples
There are many ways for Soft 404s to emerge, but the two main causes are:
Incorrect custom 404 response
As mentioned above, if your site’s 404 pages display or redirect to a custom 404 page, you might be serving a 200 response page – and I don’t need to remind you that 404 and 200 are different numbers!
You can still have this personalized 404 page, but it must have a 404 response! This can be checked by various tools and plugins, such as this redirect path plugin, list mode in Screaming Frog or services like https://httpstatus.io/.
Moz has a blog on helpful robots.txt directives – with number 5 showing how to set up a custom 404 error page. here.
These are often generated automatically by certain CMS – the most obvious being WordPress.
If you write a blog and add a new tag, it will create a default page for all posts with that tag. If the page is created but there are no actual posts, then it will have created an empty page.
Soft 404 issues
You have Soft 404s in Search Console, but is it important for SEO?
Short answer – yes. Long answer – also yes.
404s are not harmful when properly labeled, because naturally you will have old products, old posts, and other pages that are no longer in use. But if you start getting Soft 404s on pages that should be indexed, crawlers will be confused and won’t know where to serve your pages in the results.
It mainly comes down to a site’s exploration budget. This is the frequency with which your site is crawled and the number of pages crawled. For example, your 4000 page site can be fully crawled every month, but your home page can be crawled daily if it is deemed important enough.
Crawls cost time and energy, so it makes sense for search engines to crawl the most important pages more often and leave those obscure category pages for quieter time.
When we combine crawl budgets and Soft 404s, we end up with crawlers that see 200 pages and crawl them, but in reality they should be 404 pages. This eats up the crawl budget and creates unnecessary chores and will leave some 200 actual pages unexplored, reducing the visibility of your pages.
Now that might not be a problem for a website with a dozen pages, but if a bigger site starts getting Soft 404s, it can quickly multiply and create a big crawl budget problem.
Fix Soft 404 errors
First, you need to identify the pages marked as Soft 404. If there are less than 1000 URLs, you can download them directly from Google Search Console. If there are more, you will either have to wait to fix the first 1000 and work on the next, or use a third-party tool that can download them through Google’s API.
Then you will need to find the actual response code for those pages. As mentioned before, there are plugins, websites, and third-party tools that can help you do this in bulk, with the easiest being everyone’s favorite – Screaming Frog.
Once you have your list of URLs, they will need to be assigned to one of the following three groups:
The page is no longer available – For pages without direct replacement, it should return a 404 response code (not found). This makes it clear to browsers and crawlers that the page no longer exists. It can display a custom 404 page, but this must be done via the robots.txt file method mentioned earlier, do not a catch-all redirect.
The page has moved – The URL should be redirected to the clear replacement via a 301 redirect. Other redirect responses can be used, but the general rule is to use a 301 unless you specifically know how to use another.
The page is incorrectly marked as Soft 404 – This is when the pages are thin or at least considered thin by search engines. Most of the time this will be for pages that have little value, or maybe duplicate content like blog tag pages, filtered pages, etc. This should be handled by the canon and prohibitions of robots.txt.
If you still see pages that you feel are important enough to be indexed, you will need to improve those pages by improving content and links.
How it will contribute to performance
Once you fix your Soft 404 errors, crawlers will have a clear path to your important content without wasting their crawl budget and bandwidth.
The improvements will be all the more marked the larger the site and the larger the quantity of corrected Soft 404.
Ignoring the anecdotal evidence, the traffic jumps we have seen primarily concern ecommerce sites with shifting products, category filters, and other options that help create skinny pages and incorrect redirects.
Unfortunately, this is an area that is often overlooked by SEOs and will not be a consideration for many developers. This is a relatively easy solution from a technical point of view, the bulk of the work being the analysis and identification of these errors.
It is recommended that you try to navigate to a 404 page on your site and see what the response code actually is. If you have a 200 response and are unsure of the next steps, contact us to see how we can help.