Intermittent Noindex Tags

harrytwatter

just be nice ffs
Joined
Jan 13, 2017
Messages
299
Likes
211
Degree
1
I keep finding pages from a client site included in their Google Search Console's "Excluded by 'noindex' tag" report. However when I test the live URL and view the page indexability settings they are both set to "index".

Initially thought Cloudflare's default security challenge page was the root cause given it contained a "noindex,nofollow" tag but this seem to not be the case.

Given I can't catch these phantom "noindex" tags live I can't help but continue to suspect this is occurring as Googlebot first hits the page.

Anyone have similar experiences or ideas? Sucks seeing high value content show up in these noindex reports :/
 
Have you tried mimicking Googlebot to see what returns? You can set your user agent in a tool like Screaming Frog or with a browser extension.

You could also try switching Cloudflare to development mode that will turn off the cache temporarily.
 
An initial hypothesis that sending conflicting signals to Google via "noindex,follow" tags + canonicals on parameterized URLS was resulting in the canonical URLs being errantly included in noindex reports was a dud.

Tried mimicking Googlebot with Screaming Frog and receive 200s but also found "noindex,nofollow,noarchive" x-robots in the response headers for some reason.

It thus seems Googlebot is finding the noindex at the point of request on otherwise 200 OK URLs.

@ryandiscord what would a person look for in Cloudflare development mode to identify the cause of the noindex,nofollow,noarchive X-robots directive?
 
Last edited:
what would a person look for in Cloudflare development mode to identify the cause of the noindex,nofollow,noarchive X-robots directive?
Temporarily turn off that security challenge, flip it to development mode so that it isn't cached, then crawl it again with Screaming Frog. That should help you rule out the security challenge page as the issue.

An initial hypothesis that sending conflicting signals to Google via "noindex,follow" tags + canonicals on parameterized URLS was resulting in the canonical URLs being errantly included in noindex reports was a dud.
So there aren't any noindex tags present along when viewing as a regular user agent but when you crawl as Google you do see the noindex tag? I would look to see if the site has some SEO plugin on it that could be rewriting it for search engine user agents. If not there may be a script running that is causing that. Unlikely that it would be in cloudflare but check to see if there are any javascript workers or page rules running that could cause the rewrite.
 
Back