Almost forgot about the robots.txt...

trevor

I was up configuring my robots.txt. The old links I had for my website were tanking my SEO badly. I read up and learned robots.txt in less than 10 min. So here's what I suggest, if you're using NodeBB standalone, you should optimize your robots.txt as such.

Disallow: *

Allow: /recent
Allow: /popular
Allow: /user/*
Allow: /category/*
Allow: /topic/*
Allow: /channel/*

User-Agent: *
Disallow: /admin*

Disallow: /api/*
Disallow: /templates/*
Disallow: /templates/
Disallow: /language/*
Disallow: /language/
Disallow: /plugins/

Sitemap: http://domain.com/sitemap.xml

Okay so the reason I used wildcards is obvious. For example /admin will be cached by Google but it will give an error. I don't want to even see errors because I hate errors. So I threw it in a wildcard as for the other things that are disallowed. I don't want those cached by any search engine - ever. I saw some of thing being cached via Google so I was like you know what, this is getting ridiculous.

Am I missing something here? Let me know your thoughts on the current robots configuration or how we can make this better but this is what I have (and some other things) in my robots.txt file.

finid

Great effort, but slightly confusing to me, from my understanding of robots.txt. Where in the directory structure of NodeBB are /recent, /popular, etc?

trevor

@planner said:

Great effort, but slightly confusing to me, from my understanding of robots.txt. Where in the directory structure of NodeBB are /recent, /popular, etc?

Well see, these are not necessarily directories themselves, but more so a route created by the application.
If you did have a folder in /public/ such as /public/blah and if it was referenced somehow via a link on your site or another site, Google will cache whatever is in that directory. It would be accessible like this: http://domain.com/blah

julian

@trevor You're editing your robots.txt from the ACP, right? We have a default set, so if you leave it blank, the robots.txt will always return this:

User-agent: *
Disallow: /admin/
Sitemap: http://community.nodebb.org/sitemap.xml

Though of course, the sitemap link changes based on your configured URL

trevor

@trevor You're editing your robots.txt from the ACP, right? We have a default set, so if you leave it blank, the robots.txt will always return this:

@julian Yeah, from the ACP.