Almost forgot about the robots.txt...
-
I was up configuring my robots.txt. The old links I had for my website were tanking my SEO badly. I read up and learned robots.txt in less than 10 min. So here's what I suggest, if you're using NodeBB standalone, you should optimize your robots.txt as such.
Disallow: * Allow: /recent Allow: /popular Allow: /user/* Allow: /category/* Allow: /topic/* Allow: /channel/* User-Agent: * Disallow: /admin* Disallow: /api/* Disallow: /templates/* Disallow: /templates/ Disallow: /language/* Disallow: /language/ Disallow: /plugins/ Sitemap: http://domain.com/sitemap.xml
Okay so the reason I used wildcards is obvious. For example
/admin
will be cached by Google but it will give an error. I don't want to even see errors because I hate errors. So I threw it in a wildcard as for the other things that are disallowed. I don't want those cached by any search engine - ever. I saw some of thing being cached via Google so I was like you know what, this is getting ridiculous.Am I missing something here? Let me know your thoughts on the current robots configuration or how we can make this better but this is what I have (and some other things) in my
robots.txt
file. -
@planner said:
Great effort, but slightly confusing to me, from my understanding of robots.txt. Where in the directory structure of NodeBB are /recent, /popular, etc?
Well see, these are not necessarily directories themselves, but more so a route created by the application.
If you did have a folder in/public/
such as/public/blah
and if it was referenced somehow via a link on your site or another site, Google will cache whatever is in that directory. It would be accessible like this: http://domain.com/blah -
@trevor You're editing your
robots.txt
from the ACP, right? We have a default set, so if you leave it blank, the robots.txt will always return this:User-agent: * Disallow: /admin/ Sitemap: http://community.nodebb.org/sitemap.xml
Though of course, the sitemap link changes based on your configured URL
-