• Home
  • Categories
  • Recent
  • Popular
  • Top
  • Tags
  • Users
  • Groups
  • Documentation
    • Home
    • Read API
    • Write API
    • Plugin Development
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
v3.5.2 Latest
Buy Hosting

Google crawl error after site migration

Scheduled Pinned Locked Moved Unsolved Technical Support
22 Posts 5 Posters 756 Views
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • phenomlabP Online
    phenomlabP Online
    phenomlab
    wrote on last edited by
    #1

    I recently moved my site from sudonix.com to sudonix.org and placed a redirect at Cloudflare's edge to handle anyone using the old domain suffix. This all works as intended, although Google Search console insists that it won't crawl the pages of my forum because it thinks they have been redirected.

    The sitemaps seem ok and reflect the new domain name, but is there anywhere else that may retain the previous .com domain causing Google to think it's a redirect?

    I changed config.json to point to the new domain and everything works as expected. Interestingly, Bing doesn't seem to have any issue and indexes as expected.

    Any thoughts before I raise this with Google directly?

    Thanks

    1 Reply Last reply
    0
  • PitaJP Offline
    PitaJP Offline
    PitaJ Global Moderator Plugin & Theme Dev
    wrote on last edited by
    #2

    Change of Address Tool - Search Console Help

    Move your site from one domain to anotherAbout this tool Use the Change of Address tool when you move your website from one domain or subdomain to another: for instance, from example.c

    favicon

    (support.google.com)

    phenomlabP 1 Reply Last reply
    0
  • phenomlabP Online
    phenomlabP Online
    phenomlab
    replied to PitaJ on last edited by
    #3

    @PitaJ thanks, but I've already read that and it's pretty much useless

    1 Reply Last reply
    0
  • PitaJP Offline
    PitaJP Offline
    PitaJ Global Moderator Plugin & Theme Dev
    wrote on last edited by PitaJ
    #4

    You tried using the change of address tool and it didn't work or what?

    phenomlabP 2 Replies Last reply
    0
  • phenomlabP Online
    phenomlabP Online
    phenomlab
    replied to PitaJ on last edited by
    #5

    @PitaJ yes, sat there and did absolutely nothing for almost 2 weeks. Google absolutely insists the URLs are being redirected when bing and yandex process without issue.

    1 Reply Last reply
    0
  • julianJ Offline
    julianJ Offline
    julian GNU/Linux
    wrote on last edited by
    #6

    A 301 redirect should instruct Google to assign the seo score to the new url, so that should be ok?

    If it shows errors on the old URLs that might be OK insomuch that the new url is already indexed properly?

    May need to confirm with webmaster tools, etc.

    phenomlabP 1 Reply Last reply
    0
  • phenomlabP Online
    phenomlabP Online
    phenomlab
    replied to PitaJ on last edited by
    #7

    @PitaJ and even if I delete everything from Google search and start again, it's the same. I'm thinking the canonical has something to do with this but changing it for every single page isn't an option.

    julianJ 1 Reply Last reply
    0
  • phenomlabP Online
    phenomlabP Online
    phenomlab
    replied to julian on last edited by
    #8

    @julian yes, agree, but the new domain isn't indexed.

    1 Reply Last reply
    0
  • julianJ Offline
    julianJ Offline
    julian GNU/Linux
    replied to phenomlab on last edited by
    #9

    @phenomlab canonical url as defined in the meta tag is still .com?

    phenomlabP 1 Reply Last reply
    0
  • phenomlabP Online
    phenomlabP Online
    phenomlab
    replied to julian on last edited by phenomlab
    #10

    @julian that I'm unsure of, but I think that is going to be the issue. I just can't think of an easy way to do this programmatically

    1 Reply Last reply
    0
  • phenomlabP Online
    phenomlabP Online
    phenomlab
    wrote on last edited by
    #11

    I think I found the answer here. It's Cloudflare's "Bot Fight Mode" that causes this.

    Mode on
    image.png

    Mode off
    image.png

    Evidently, it's a well-known issue. If you want crawlers to ignore your site, just switch that bad boy on with no exceptions 😕

    Just a moment...

    favicon

    (community.cloudflare.com)

    julianJ 1 Reply Last reply
    1
  • julianJ Offline
    julianJ Offline
    julian GNU/Linux
    replied to phenomlab on last edited by
    #12

    @phenomlab what a fun name for a product 😃

    tankerkiller125T 1 Reply Last reply
    1
  • phenomlabP phenomlab has marked this topic as solved on
  • tankerkiller125T Offline
    tankerkiller125T Offline
    tankerkiller125
    replied to julian on last edited by
    #13

    @julian I mean, the purpose of the product is to fight bad bots, not known good bots (like search engines). But it seems that at least on free plans it blocks spiders a lot. And even on a paid plan you have to do quite a bit of work to get it working properly it seems.

    phenomlabP 1 Reply Last reply
    0
  • phenomlabP Online
    phenomlabP Online
    phenomlab
    replied to tankerkiller125 on last edited by
    #14

    @tankerkiller125 that's exactly what it's doing. In addition, it also blocks it's own page speed test! I uploaded over 2000 IP addresses into Cloudflare that should have made a bypass rule for Google's ASN, but that never worked at all - despite being suggested by Cloudflare in the first place.

    Even the pro plan has issues with this as you suggest. I've switched it off, and now all crawl errors on my site have cleared down.

    1 Reply Last reply
    0
  • phenomlabP phenomlab has marked this topic as unsolved on
  • phenomlabP Online
    phenomlabP Online
    phenomlab
    wrote on last edited by phenomlab
    #15

    Back again with this. I thought that the issue solved, but it appears not. I've discussed this with Google themselves, and they tell me that a http 307 redirect is being returned by their crawler.

    Is there any special permission that needs to be applied to spiders in NodeBB? The page, according to the Googlebot, redirects to /register which it shouldn't of course.

    In the browser, everything works as expected and you land up on the page you requested. I have no idea why Googlebot thinks this is a redirect when it isn't. Seems Bing has no issues indexing the site either.

    Here's an example of the 307 redirect
    https://view.hugo-decoded.be/?scheme=https&url=sudonix.org%2Ftags%2Fjavascript&ua=Googlebot&ref=Google

    Edit - seems Googlebot is returning 307 and redirecting to /register/complete.

    Hugo's URL Viewer

    favicon

    (view.hugo-decoded.be)

    1 Reply Last reply
    0
  • phenomlabP Online
    phenomlabP Online
    phenomlab
    wrote on last edited by phenomlab
    #16

    I suppose the obvious question here is why every single link when queried by Googlebot is redirected to /register/complete ? I thought that the purpose of spider permissions was so that crawlers would work unhindered?

    It seems currently, having checked properly, that every single page generated by NodeBB is seen as a 307 redirect to /register/complete by Google, Bing, and Yandex - they can't all be wrong.

    This site

    Bulk URL HTTP Status Code, Header & Redirect Checker

    Redirect checker to easily check status codes, response headers, and redirect chains.

    favicon

    (httpstatus.io)

    Is pretty useful and is at least mobile friendly. If I check the URL of this page for example, I get a HTTP 200 so nothing wrong here, but my install obviously says differently, and I'd like to fix that.

    julianJ 1 Reply Last reply
    0
  • julianJ Offline
    julianJ Offline
    julian GNU/Linux
    replied to phenomlab on last edited by julian
    #17

    @phenomlab likely related to recent changed re: email handling. That's a regression that I should fix, it shouldn't act like that to spiders.

    Best guess is I handled it by doing a check for != 0 when it should be > 0

    phenomlabP 1 Reply Last reply
    0
  • phenomlabP Online
    phenomlabP Online
    phenomlab
    replied to julian on last edited by phenomlab
    #18

    @julian funny you should say that as my other site hostrisk.com doesn't have this issue and is running 3.01 (needs an upgrade)

    But - the caveat here is that hostrisk does not permit registrations, so I guess that regression (if it existed in 3.0.1) won't apply? I'll upgrade that and see if the issue remains without registration enabled)

    1 Reply Last reply
    0
  • julianJ Offline
    julianJ Offline
    julian GNU/Linux
    wrote on last edited by
    #19

    You can disable email requirement in the ACP to work around this temporarily.

    phenomlabP 1 Reply Last reply
    1
  • phenomlabP Online
    phenomlabP Online
    phenomlab
    replied to julian on last edited by phenomlab
    #20

    @julian thanks. Let me test.

    Edit - yep, all good. Disabled and now getting HTTP 200 which I'd expect.

    Just kicked off a validation request in Google Search which I'm hoping will work

    1 Reply Last reply
    1

Copyright © 2023 NodeBB | Contributors
  • Login

  • Don't have an account? Register

  • Login or register to search.
Powered by NodeBB Contributors
  • First post
    Last post
0
  • Home
  • Categories
  • Recent
  • Popular
  • Top
  • Tags
  • Users
  • Groups
  • Documentation
    • Home
    • Read API
    • Write API
    • Plugin Development