Almost forgot about the robots.txt...

General Discussion
  • I was up configuring my robots.txt. The old links I had for my website were tanking my SEO badly. I read up and learned robots.txt in less than 10 min. So here's what I suggest, if you're using NodeBB standalone, you should optimize your robots.txt as such.

    Disallow: *
    
    Allow: /recent
    Allow: /popular
    Allow: /user/*
    Allow: /category/*
    Allow: /topic/*
    Allow: /channel/*
    
    User-Agent: *
    Disallow: /admin*
    
    Disallow: /api/*
    Disallow: /templates/*
    Disallow: /templates/
    Disallow: /language/*
    Disallow: /language/
    Disallow: /plugins/
    
    Sitemap: http://domain.com/sitemap.xml
    

    Okay so the reason I used wildcards is obvious. For example /admin will be cached by Google but it will give an error. I don't want to even see errors because I hate errors. So I threw it in a wildcard as for the other things that are disallowed. I don't want those cached by any search engine - ever. I saw some of thing being cached via Google so I was like you know what, this is getting ridiculous.

    Am I missing something here? Let me know your thoughts on the current robots configuration or how we can make this better but this is what I have (and some other things) in my robots.txt file.

  • Great effort, but slightly confusing to me, from my understanding of robots.txt. Where in the directory structure of NodeBB are /recent, /popular, etc?

  • @planner said:

    Great effort, but slightly confusing to me, from my understanding of robots.txt. Where in the directory structure of NodeBB are /recent, /popular, etc?

    Well see, these are not necessarily directories themselves, but more so a route created by the application.
    If you did have a folder in /public/ such as /public/blah and if it was referenced somehow via a link on your site or another site, Google will cache whatever is in that directory. It would be accessible like this: http://domain.com/blah

  • @trevor You're editing your robots.txt from the ACP, right? We have a default set, so if you leave it blank, the robots.txt will always return this:

    User-agent: *
    Disallow: /admin/
    Sitemap: http://community.nodebb.org/sitemap.xml
    

    Though of course, the sitemap link changes based on your configured URL

  • @trevor You're editing your robots.txt from the ACP, right? We have a default set, so if you leave it blank, the robots.txt will always return this:

    @julian Yeah, from the ACP.


Suggested Topics