Securing pages from Bots

**Reacher** · 05-18-2016

I recently checked my website stats and found that some pages were being crawled by bots that I think should not be crawled. These include: pages for login, contact, email sign-ups, cgi-bin, php scripts, thank you note and unsubscribe notes(for email sign-ups), TOS, Privacy statements, Admin blog sign-in (for webmaster only).

I'm wondering, what is the best way to keep these kinds of pages from being publicized outside of those who directly visit the site?

**franko** · 05-18-2016

Put this meta tag in the <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> in the head of the pages you don't want indexed. A more reliable way is to use .htaccess but that requires a. a unix server and b. a certain amount of knowledge. Google for it; there are plenty of tunes on the web.

**Reacher** · 05-18-2016

Originally Posted by franko

Put this meta tag in the <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> in the head of the pages you don't want indexed. A more reliable way is to use .htaccess but that requires a. a unix server and b. a certain amount of knowledge. Google for it; there are plenty of tunes on the web.

Earlier, I found the tag you just mentioned from a googlebot site and will set this up for these pages. I thought there was a way to select this option within NOF 2015 but can't locate it. Maybe I'm thinking too much!. I have .htaccess on my server and can set up private areas, but have reservations. Like if I put a cgi script inside a protected directory, will it be accessible and work correctly? Maybe I should ask my IP host about this.

**gotFusion** · 05-18-2016

You should find these gotFusion tutorials should find helpful

http://www.gotfusion.com/tutorials/tut.cfm?itemID=10

http://www.gotfusion.com/tutsTD/metatags.cfm

You can do a single word search on the GF site using the word meta for a lot more helpful information

**RayC** · 05-19-2016

Originally Posted by franko

Put this meta tag in the <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> in the head of the pages you don't want indexed. A more reliable way is to use .htaccess but that requires a. a unix server and b. a certain amount of knowledge. Google for it; there are plenty of tunes on the web.

This is like putting a sign in your lawn that says "Please don't rob my house."

Legitimate bots (I.E. Google) will follow the instruction, but malicious bots will simply ignore this instruction and try to mine whatever data they can.

**Reacher** · 05-19-2016

Originally Posted by gotFusion

You should find these gotFusion tutorials should find helpful

http://www.gotfusion.com/tutorials/tut.cfm?itemID=10

http://www.gotfusion.com/tutsTD/metatags.cfm

You can do a single word search on the GF site using the word meta for a lot more helpful information

I looked at the GF tut about the "robots .txt file." I just added the robot tag to the "html head" of specific pages on my site. But I'm wondering, is the "robots .txt file" more effective, or is it just easier than adding a tag to each page?

Also, referencing RayC's post - this seems more like a request for legitimate bots to stay away, but doesn't stop malicious bots. The only way to effectively do this would be to password protect a folder.

**Reacher** · 05-19-2016

Originally Posted by RayC

This is like putting a sign in your lawn that says "Please don't rob my house."

Legitimate bots (I.E. Google) will follow the instruction, but malicious bots will simply ignore this instruction and try to mine whatever data they can.

This is what I thought, but it may be better to use the meta tag than to do nothing.

**gotFusion** · 05-19-2016

Only bots that follow "rules" will follow them.

Spambots will crawl.

you can always block the IP address of the spambots but they spoof IPs most often.

I've barred entire IP blocks such as eastern europe, africa, and asia. I do not have any customers from those regions so no point even letting anyone from those regions view any of my content.

**gotFusion** · 05-19-2016

Originally Posted by Reacher

This is what I thought, but it may be better to use the meta tag than to do nothing.

You should have a single robots.txt file in your domain root and a meta robots line in the doc head for the pages you wish to disallow crawling.

They are not the same thing so use them both.

Look at your stats. you should see 404 listings for the bots that call robots.txt upon visiting the domain. All well behaved bots will look at the robots.txt file before starting the crawl as it tells them what to index and what not to index.

**Reacher** · 05-20-2016

Originally Posted by gotFusion

You should have a single robots.txt file in your domain root and a meta robots line in the doc head for the pages you wish to disallow crawling.

They are not the same thing so use them both.

Look at your stats. you should see 404 listings for the bots that call robots.txt upon visiting the domain. All well behaved bots will look at the robots.txt file before starting the crawl as it tells them what to index and what not to index.

Yes, I will add the separate robots .txt file just for greater insurance and monitor the stats. I mostly want google and other legitimate bots from crawling. I don't want certain pages to show up on google and other search engines' pages. As for the malicious bots, I'm learning that there isn't much I can do except password protect, which I have already done to a couple of pages.

Thread: Securing pages from Bots

Thread Tools

Search Thread

Display

Securing pages from Bots

Tags for this Thread

Posting Permissions