Businesses and marketers may view which of their pets have been indexed and can use the Google Search Console to diagnose problems with their website indexability.
Google’s Index Report allows webmasters to understand more about how Google sees and crawls their site, but it also includes a broad range of cautions and warnings. It is vital to maintain search engine optimization to be able to verify the Google index status and resolve any problems on your site. Learning what these messages are and how you should answer them will assist to ensure that search results are not missed in critical portions of your website and that bad practices do not damage your organic traffic.
What is the Coverage Report?
Google’s search console index coverage report principally aims to let webmasters know which of their pages are and are not in the search index.
The fact is, however, that this instrument provides much more information than that. It indicates that Google has visited or tried to visit all the pages of the website. In particular, URLs visited and snapped by Googlebot on your Search Console property.
Here, all pages are divided according to status with a count of Google-validated and indexed pages, the pages excluded from the index, the main issues pages and the warning pages.
The “Summary Page” of the Google index report
Users can choose any row on the overview page to view all URLs that have the same status/motivation and to acquire more data about the problem.
This page provides extra information including the primary website crawler (in particular Googlebot type) and the recent report update.
To obtain Google’s index report, you must must have a search console account. This free tool is specifically designed to educate webmasters and business owners about the management of their website by Google. Find the property on your console and navigate to the “Coverage” report via the left-hand navigation panel.
How to use the index coverage report
Google’s index report is a fantastic tool for webmasters and businesses to check the health of their website. Use the analysis to check your site indexing and to see whether your pages, crawls or indexing the proper URLs are problematic for Google.
This report delivers a good image of the graph correctly indexed. Various SEOs and webmasters perform visual scans to see the many errors and warnings and grab from this diagram. Whenever you try to monitor the index status of your website for Google, you should do the same.
The defects shown in this box should be given priority and shown how problems can be caused. Upon rectification of these problems, your SEO team can monitor for alerts and maintain your pages healthy and indexed. No tools or development resources are provided for dealing with such difficulties in the Search Console index coverage report – the report is mainly useful for monitoring and monitoring.
To test and isolate problem practices on your website, utilize a technical SEO audit list is a terrific technique to address the index report’s “errors” column. Thankfully, a message grouped URLs in the index coverage reports may be used to try to find similar qualities and to limit all common items which can cause problems.
You can also search through a URL search console for specific information regarding frequent difficulties. The URL can be tested live to see for more detailed information any faults with a URL.
Have your webmaster or web-based developer make corrections and then utilize the Search Console ‘Validate Fix’ option to get Google to update the index when errors are fixed.
Here’s every message in the index coverage report
When you check the status of your Google index, you can find multiple notifications – all of them have been included. However, most of the time notices demand no action or change, they just provide you the condition of your website. The truth is that you will meet a handful of them quite often.
To gain a better understanding, read all of the messages in the index report.
- Submitted URL marked ‘noindex’: Like the aforementioned error message. A request was made to index the page, but no index is requested by the on-page robots command (with meta tag or HTTP header).
- Submitted URL seems to be a Soft 404: This message will show a 404 error when Googlebot tries to crawl the page that has been submitted on the index. These pages can be left alone if the “empty” or “out of stock” state is only temporary – nonetheless, checking those pages would be a good idea to ensure that they include valuable material. In rare situations, if the website is blank or nearly blank for Google if it contains little value, a thin content may mistakenly look like a soft 404.
- Submitted URL returns unauthorized request (401): The status 401 message is for “unauthorized” pages that require authentication credentials. For backend domain sections such as login pages, account pages, booths etc. Usually. If the access to these pages is limited, their SEO values are probably also constrained. Review if they should be included on a map and remove them to enhance your budget.
- Submitted URL not found (404): For pages sent for indexing, a 404 error is returned via the sitemap. These pages should be removed from the map and redirected to any SEO page authority.
- Submitted URL has crawl issue: In this message, Google has an unspecified crawling fault, which is not in other categories. The URL Inspection tool can be used by webmasters to check for problems or wait for Googlebot to register the URL.
- Indexed, though blocked by robots.txt: That notice advises webmasters that, although the instructions in the robot.txt file may prevent that page, Google has decided to index the page. Google issues a caution to this message, as they may not be sure whether the block was deliberate. Blocking Index sites with Robots.txt is an outdated SEO approach, given that Google has discontinued to support robots.txt instructions for the noindex commands. Note that Google normally indexes a page because it considers that it is sufficiently strong to include the material in its search results.
- Submitted and indexed: This is the notice for “valid” pages that indicates Google indexed the submitted page. Of course, this is fantastic for SEO marketing because it’s the perfect default. Keep in mind, that a number of “pages that you submit and index” may not always match your expectations, because Google ignores duplicate URLs, non-canonical URLs, and parameter URLs (these can be shown in the URL Inspection tool). There are occasions when the number of page presented here can fluctuate.
- Excluded by ‘noindex’ tag: Google crawled the website, but identified a “noindex” robots meta tag in the HTML, thus the page was not included in the index. For SEO purposes the best technique to block pages from being indexed is using meta-robots however if the page is to be indexed, then webmasters may be forced to check or remove their CMS settings.
- Blocked by page removal tool: This means that someone who uses the URL request removal function in the Google Search Console manually removes the page from the index. This is purely provisional, and although nothing is done, after around 90 days, the page will eventually be rewritten and reindexed. A “noindex” command is a better way to permanently delete a page.
- Blocked by robots.txt: This means the robots.txt domains have commands that block the page, yet the page has not been “uploaded,” only Google has organically browsed through and located the robots.txt file. This is not to say that the page will not be indexed. Since the “noindex” command in robot.txt has been depreciated by Google, a better alternative is to use the meta on-page roobot tag to prevent index.
- Blocked due to unauthorized request (401): The page is banned with the status code 401 for Googlebot which signifies the page must be accessed with authentication authorisation. If this website is to be indexed for SEO purposes, the webmaster or developer of the website should ensure that both browser users and search engine indexors are completely accessible to the content of the page.
- Crawl anomaly: In the Google Index Report, this message suggests there was a kind of error when you crawl the page. It may signify that you have a 4xx, 5xx, or some other issue when you try to load your page. To check for any problems using the URL Inspection tool.
- Crawled – currently not indexed: This is one of the often reported messages in the index report. It is vital to bear in mind that ‘scratch’ does not necessarily mean the page has been added automatically to the Google’s search results index. One probable explanation is that, because of a lack of content or thin content, Google has concluded it is valuable enough to Index the page. Another reason can be that Google does not believe it is a part of the core content of your site. In the URL inspection tool, you can verify your page to determine whether there are rendering difficulties. To communicate to the search engines that this page is of value to your site/business, you can ask that you index or enter the pages in your map. You also need to ensure that your material is visible to search engines – for example, Google cannot understand, some JavaScript or Flash.
- Discovered – currently not indexed: Google knows about the URL, however Googlebot has still not checked or snatched the URL. The URL may be found at this site. Usually this signifies the site or domain server has been overburdened but Google has halted to prevent the site’s performance from becoming inhibited. You are making a “crawl Budget” for your website. There is usually no need to monitor your Google index status here. In most circumstances, nothing is done here. Google’s going to reprogram the crawl and return later.
- Alternate page with the proper canonical tag: Google knows the page in this scenario, however it has not indexed this exact URL, because the canonical URL has instead been indexed. There is nothing here for SEO purposes and it signifies that Google properly understands the site.
- Duplicate without user-selected canonical: This means that several duplicate URLs were detected for Google’s website, or multiple pages with duplicate content — neither were canonized. Google will mostly select its own version and index – the URL inspection tool will help you examine how this page is treated. The URL of the URL inspection tool. Generally no fix here is necessary as a canonical Google picked works the same as a canonically declared user, but if webmasters want a greater level of control or wish to index a particular URL, they can set themselves canonical.
- Duplicate, Google chose different canonical than user: Here the page has a canonical version, but Google instead has chosen another URL. When this Google selection looks fine, webmasters may make canonization match (or let go), otherwise, the site structure will be better optimized to make search engines more meaningful or to reduce duplicate URLs. Also, ensure that the “duplicate” pages content matches and the contents of your canonical URLs match the original page. If not, Google may not regard it as a good canonical.
- Not found (404): A 404 error was returned on this page. This warning is shown on the index coverage report because Googlebot detected a link to the page in the Search Console or a sitemap without a specific request, but the link has returned a 404. This typically means a broken link or a broken backlink from another domain in some part of your site content. The best approach here is usually to redirect the link to the closest page. This can assist preserve and transfer to the original page any associated ranking or page authority. Google states it’s indexing bot may continue try to scan this page a while and it can’t be told to forget or ignore a URL permanently – though at long last it’s less crawled.
- Page removed because of legal complaint: In that scenario, a third party has complained about a copyright breach or about one other violation of Google’s legal regulations such as phisheries, violence or explicit content that Google has removed content from its index. Bearing in mind that the contents that have been stolen, scraped, or plagiarized also risk marketers being subject to a manual penalty for content. It is usually ideal to generate or provide attribution to sources high-quality original content.
- Page with redirect: This notification suggests that you can redirect and not add to the index URL of the Google Index coverage report. Naturally, without any more effort, Google should crawl the destination URL.
- Soft 404: Soft 404 can be 404s when a bespoke 404 page, “User-friendly,” is intended to return the website – like a page that offers automatic choices for the next one. The relevant 404 response from the server is not returned by soft404. Or a soft 404 could be a page with thin or no Google content and wrongly referred to as a soft 404. They can be either a good or a terrible thing depending on the scenario – website owners will want to make sure that their pages. It signifies Google has interpreted the page as a soft 404 when you see this entry in your index coverage report. But if your page appears wrongly as a soft 404, it may be because the page is primarily rendered black – ensure the primary content and key elements are rendered on desktop / mobile and that prohibited material, JavaScript or Flash search engines are not hindered.
- Duplicate, submitted URL not selected as canonical: This signifies that Google did not index the page using a sitemap, since Google thinks an other URL is a better version. The difference between this status and “Google has chosen differently than user canonically” is that the message is displayed in this case, because someone asked to index this URL specifically. This warning is mostly innocuous to SEO, since just another duplicate version is indexed by Google.
- Server error (5xx): This is a server level error which cannot load the page. In this case, Google will not include these sites in the index, which means that the error could have a significant impact on the SEO for high-quality content and big pages.
- Submitted URL blocked by robots.txt: This is a really self-explaining message. If you specifically asked to index the page (in the URL inspection tool or your sitemap), this message appears on the “ERROR” tab on the Google index report, which gives contradictory requests. This message is not displayed. If you don’t want this site indexed, you may need to alter your robotic.txt file otherwise. This notice is innocuous.
Crawled — Currently Not Indexed
The report includes numerous distinct “statuses,” which inform webmasters how Google handles the content of their websites. Although many of the statistics provide insight for Google’s crawling decisions and indexing choices, it is still unclear “Crawled—not indexed” at present.
We heard from a number of website owners questioning about its significance, since the “Crawled — not presently indexed” status was disclosed. One of the advantages of working at the agency is that it has much data, and because we have seen this message on several accounts, we have started collecting trends from reported URLs.
Google’s definition
Let’s begin by defining the official one. The official Google website indicates that this status means: “The page has been crawled, but not indexed by Google. In the future it may or may not be indexed; you need not re-send the URL to crawl.”
So, essentially what we know is that:
- Google can access the page
- Google took time to crawl the page
- After crawling, Google decided not to include it in the index
The key to understanding this status is to consider why Google “intentionally” choose not to index. We know that Google has no issue discovering the website, but it doesn’t feel that users would benefit from finding it for whatever reason.
You may not know why your information is not indexed, and that can be rather annoying. Below I will discuss some of the most common reasons why our staff has found out why your website is affected by this mystery status.
1. False positives
Priority: Low
We begin with a number of spot check-outs of URLs marked in the indexing section “Crawled – presently not indexed.” Nothing is unusual to encounter URLs which are declared excluded but which appear in the index of Google after all.
For example, here’s a URL that’s getting flagged in the report for our website: https://gofishdigital.com/meetup/
However, we can see that the URL in the Google index is included when using the site search operator. The text “site:” might be included prior to your URL for such purpose.
I recommend that you use the site search operator to check whether or not URLs are indexed when you encounter URLs reported under that status! Sometimes it turns out to be wrong.
Solution: Do nothing! You’re good.
2. RSS feed URLs
Priority: Low
This is one of the most frequently seen examples. You may find URLs that appear in the Google report of the “Crawled—Not Indexed Actually” page if your website uses RSS feed. These URLs are often attached to the “/feed/” string.
Google has linked these URLs from the home page to RSS feeds. Often they are connected to a “rel=alternate” element. These URLs can be generated automatically by WordPress plugins like Yoast.
Solution: Do nothing! You’re good.
- Google may choose, for good reason, not to index certain URLs. You will see an XML document if you visit to an RSS feed URL.
- Although this XML document is useful for RSS feeds, it does not have to be included by Google in the index. The content is not meant for users and therefore gives a bad experience.
3. Paginated URLs
Priority: Low
Another extremely common reason for the “Crawled — currently not indexed” exclusion is pagination. We will often see a good number of paginated URLs appear in this report.
Solution: Do nothing! You’re good.
To access the full site, Google will have to crawl via paginated URLs. This is the approach to material like further category pages or description of products pages. While Google does not necessary have to index the paginated URLs themselves, the pagination is a mechanism for accessing their content.
Whenever possible, make sure you do nothing to influence the crawling of each pagination. Make sure your page contains a canonical self-referential tag that is devoid of any “no follow” tags. This page is an avenue for Google to crawl other important pages on your site, therefore you want Google to crawl.
4. Expired products
Priority: Medium
When analyzing individual pages in the report, URLs containing text with an expired or out of stock products are a regular concern for customers. It seems that Google checks to see whether a certain product is available, especially on e-commerce sites. If a product is found to be unavailable, the product shall be excluded from the index.
From a UX viewpoint this is crucial because Google may not wish to put users’ material in the index.
However, if these things are truly present on your website, a lot of SEO opportunities could be overlooked. Your material has no opportunities of classifying itself by eliminating the pages from the index.
Moreover, Google not only checks the content visible on the page. In certain cases, we did not find an indication that the product is not available in visible content. However, when checking the structured data, we can see that the “availability” property is set to “OutOfStock”.
It appears that Google is taking clues from both the visible content and structured data about a particular product’s availability. Thus, you must check both the content and schema.
Solution: Check your inventory availability.
You will want to review all of your products that may not be listed accurately if you see things that are truly available included in this report. Carry out an on-site crawl and utilize a bespoke extraction tool like Screaming Frog to scrape your product pages with your data.
For instance, if you want to see at scale all of your URLs with schema set to “OutOfStock”, you can set the “Regex” to “availability”:”
This: “class=”redactor-auto parserThe nextno index>http://schema.org/OutOfStock” should automatically scrape all of the URLs with this property
Use Excel or business intelligence tools to extract this list and cross reference using inventory data. This should enable you to easily detect differences between the structured details provided on your website and items. In case your visible material says that the products are expired, the identical step can be done.
5. 301 redirects
Priority: Medium
An intriguing example we have seen is the target URLs of redirected pages with this status. We will often see that Google crawles but does not include it in the index of its destination URL. However, we observe that Google redirects a URL while examining the SERP. Since the redirected URL is the indexed one, the target URL is added to the report “Crawling — not yet indexed.”
The question is that Google may still not be aware of the redirect. The URL of the target is therefore considered a “duplicate” because the redirected URL is still indexed.
Solution: Create a temporary sitemap.xml.
When this takes place on a significant number of URLs, it is useful to convey stronger signals to Google. This problem could show that Google doesn’t quickly recognize your redirection, leading to unstable content.
A “temporary sitemap” could be a possibility. This is a map to speed up the crawling of these redirected URLs. This is an approach previously advocated by John Mueller.
To create one, you will need to reverse-engineer redirects that you have created in the past:
- Export all of the URLs from the “Crawled — currently not indexed” report.
- Match them up in Excel with redirects that have been previously set up.
- Find all of the redirects that have a destination URL in the “Crawled — currently not indexed” bucket.
- Create a static sitemap.xml of these URLs with Screaming Frog.
- Upload the sitemap and monitor the “Crawled — currently not indexed” report in Search Console.
The goal here is for Google to crawl the URLs in the temporary sitemap.xml more frequently than it otherwise would have. This will lead to faster consolidation of these redirects.
6. Thin content
Priority: Medium
Sometimes we see URLs that are incredibly light on material listed in this report. These pages can be correctly established with every technological aspect and can even be properly interconnected to them, but there is very little actual content on this page when Google runs into these URLs.
This page has been labeled “Crawled — Not currently indexed.” The content of the page may be quite small.
This page is perhaps too thin to make it work for Google, or there is so little material that Google sees it as a duplicate of another page. The effect is that Google removes the index content.
Here is another example: On the Go Fish Digital site, Google could crawl an accounting component page (shown above). While the contents of this website are unique to us, Google does not think it should be indexable to the single statement.
Again, because of a lack of quality, Google decided to omit the page from the index.
Solution: Add more content or adjust indexation signals.
Next stages rely on the significance of indexing these sites for you.
Consider adding additional content when you think that this page should definitely be featured in the index. This helps Google to view the page as better for users.
While indexing for the contents you find is unnecessary, the major question is whether you should take the further steps to communicate forcefully that this contents should not be indexed. The “Crawled—Not indexed at present” report shows that the item may appear in the Google index, but that Google does not choose to include the content.
Other pages of poor quality that Google does not use this rationale could potentially exist. Searching for indexed information that satisfies the same criteria as the preceding instances can also be carried out in a general “site:” You might wish to consider strengthening initiatives if you detect a large number of such pages in the index to ensure that these pages are removed from the index such as the “noindex” tag, 404 error, or totally removed from your internal link structure.
7. Duplicate content
Priority: High
This is the greatest priority we have seen when assessing this exclusion among a wide number of customers. If Google considers your content to be duplicate, it can scroll around the content but choose not to include it in the index. This is one way Google prevents replicating SERP. Google ensures that users have more unique pages to interact with by removing duplicate content from the index. The report sometimes labels such URLs as ‘Duplicate’ (‘Duplicate, Google has chosen canonical as user’). That’s not always the case, however.
This is a very important topic, in particular on many e-commerce sites. Key pages like product description pages often have same or similar descriptions of the product as many other results on the Web. If Google accepts them internally or externally as too similar to other pages, it may exclude all pages from the index.
Solution: Add unique elements to the duplicate content.
If you think that this situation applies to your site, here’s how you test for it:
- Take a snippet of the potential duplicate text and paste it into Google.
- In the SERP URL, append the following string to the end: “&num=100”. This will show you the top 100 results.
- Use your browser’s “Find” function to see if your result appears in the top 100 results. If it doesn’t, your result might be getting filtered out of the index.
- Go back to the SERP URL and append the following string to the end: “&filter=0”. This should show you Google’s unfiltered result (thanks, Patrick Stox, for the tip).
- Use the “Find” function to search for your URL. If you see your page now appearing, this is a good indication that your content is getting filtered out of the index.
- Repeat this process for a few URLs with potential duplicate or very similar content you’re seeing in the “Crawled — currently not indexed” report.
If you’re consistently seeing your URLs getting filtered out of the index, you’ll need to take steps to make your content more unique.
While there is no one-size-fits-all standard for achieving this, here are some options:
- Rewrite the content to be more unique on high-priority pages.
- Use dynamic properties to automatically inject unique content onto the page.
- Remove large amounts of unnecessary boilerplate content. Pages with more templated text than unique text might be getting read as duplicate.
- If your site is dependent on user-generated content, inform contributors that all provided content should be unique. This may help prevent instances where contributors use the same content across multiple pages or domains.
8. Private-facing content
Priority: High
Some cases include Google’s crawlers gaining access to the stuff that should not be available. Google may include these URLs in this report if it finds dev environments. We have also witnessed Google’s crawling instances of a certain client’s JIRA domain. This caused the site to crawl explosively, concentrating in URLs which should never be taken into account for indexing.
The problem here is that Google is not focused on the website, and it takes time to crawl URLs which don’t serve searchers. This can have enormous consequences on the budget of a site.
Solution: Adjust your crawling and indexing initiatives.
This answer depends totally on what Google can access and what the scenario is. The first thing you would like to accomplish is, typically, ascertain how Google can detect these private URLs, especially if it is via its internal link structure.
Start a crawl from your major domain home page and see if unwanted subdomains may be accessed by conventional screaming frog screaming. If so, Googlebot may find exactly the same manner. It’s safe to say. To reduce Google access, you’re going to want to delete any internal links to this information.
The next step is to check the URLs which are to be omitted for indexing. Google keeps them all out of the index enough, or have they been caught in the index? You can consider changing your file robots.txt immediately to block crawling if Google doesn’t index substantial quantities of this content. If not, the table includes “noindex” tags, canonics and locked password pages.
Case study: duplicate user-generated content
This is an example of how we detected the problem for a real life situation on a customer site. This customer resembles an e-commerce website as many of its contents are described in the product pages. These pages are all content generated by the user, though.
In essence, it is possible for third parties to create listings on this platform. Third parties, however, added very brief descriptions to their pages, leading to small content. The recurring problem was that the “Crawled—currently not indexed” reports contained these product description pages supplied by the user. The outcome was a squandered SEO opportunities,websites,e of the total exclusion from the index of pages capable of generating organic traffic.
We observed that the product description pages of the customer were quite weak in terms of unique material. Only one paragraph or less of a single text occurred on the pages removed. Furthermore, the majority of contents in these page kinds were templated text. Since the contents of this page were quite limited, Google could have seen these pages as duplicates by the template contents. The outcome was that these pages are not included in the index, claiming the status “crawled — not presently indexed.”
We worked with the customer to determine the content of each of the product descriptions that was not necessary in order to resolve these concerns. We were able to delete thousands of URLs from superfluous templates. The “Crackles — not presently indexed” pages resulted in a large reduction as Google started to see each page as more unique.
Conclusion
This enables search vendors perhaps to better comprehend the cryptic status in the index coverage report, “crawled — not currently indexed.” Of course, Google would choose to categorize URLs such as these for many other reasons, but those are the most prevalent ones we have ever encountered with our customers.
The Index Coverage report is one of the most powerful search console tools in general. I would urge search marketers, because we routinely uncover inadequate crawling and indexing, and in particular in large locations, to get to know the facts and reports. If you saw other URLs in the “Crawled — not indexed at the moment,” please let me know in your comments!.
More Stories
A Guide to Google Workspace with Benefits and Comparisons