Google crawl error after site migration
-
I think I found the answer here. It's Cloudflare's "Bot Fight Mode" that causes this.
Mode on
Mode off
Evidently, it's a well-known issue. If you want crawlers to ignore your site, just switch that bad boy on with no exceptions
-
-
@julian I mean, the purpose of the product is to fight bad bots, not known good bots (like search engines). But it seems that at least on free plans it blocks spiders a lot. And even on a paid plan you have to do quite a bit of work to get it working properly it seems.
-
@tankerkiller125 that's exactly what it's doing. In addition, it also blocks it's own page speed test! I uploaded over 2000 IP addresses into Cloudflare that should have made a bypass rule for Google's ASN, but that never worked at all - despite being suggested by Cloudflare in the first place.
Even the pro plan has issues with this as you suggest. I've switched it off, and now all crawl errors on my site have cleared down.
-
-
Back again with this. I thought that the issue solved, but it appears not. I've discussed this with Google themselves, and they tell me that a http 307 redirect is being returned by their crawler.
Is there any special permission that needs to be applied to spiders in NodeBB? The page, according to the Googlebot, redirects to /register which it shouldn't of course.
In the browser, everything works as expected and you land up on the page you requested. I have no idea why Googlebot thinks this is a redirect when it isn't. Seems Bing has no issues indexing the site either.
Here's an example of the 307 redirect
https://view.hugo-decoded.be/?scheme=https&url=sudonix.org%2Ftags%2Fjavascript&ua=Googlebot&ref=GoogleEdit - seems Googlebot is returning 307 and redirecting to
/register/complete
. -
I suppose the obvious question here is why every single link when queried by Googlebot is redirected to /register/complete ? I thought that the purpose of spider permissions was so that crawlers would work unhindered?
It seems currently, having checked properly, that every single page generated by NodeBB is seen as a 307 redirect to /register/complete by Google, Bing, and Yandex - they can't all be wrong.
This site
Bulk URL HTTP Status Code, Header & Redirect Checker
Redirect checker to easily check status codes, response headers, and redirect chains.
(httpstatus.io)
Is pretty useful and is at least mobile friendly. If I check the URL of this page for example, I get a HTTP 200 so nothing wrong here, but my install obviously says differently, and I'd like to fix that.
-
@phenomlab likely related to recent changed re: email handling. That's a regression that I should fix, it shouldn't act like that to spiders.
Best guess is I handled it by doing a check for
!= 0
when it should be> 0
-
@julian funny you should say that as my other site hostrisk.com doesn't have this issue and is running 3.01 (needs an upgrade)
But - the caveat here is that hostrisk does not permit registrations, so I guess that regression (if it existed in 3.0.1) won't apply? I'll upgrade that and see if the issue remains without registration enabled)
-