Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: Securing pages from Bots

  1. #1
    Banned
    Join Date
    Jul 2013
    Posts
    24

    Default Securing pages from Bots

    I recently checked my website stats and found that some pages were being crawled by bots that I think should not be crawled. These include: pages for login, contact, email sign-ups, cgi-bin, php scripts, thank you note and unsubscribe notes(for email sign-ups), TOS, Privacy statements, Admin blog sign-in (for webmaster only).

    I'm wondering, what is the best way to keep these kinds of pages from being publicized outside of those who directly visit the site?

  2. #2
    Senior Member franko's Avatar
    Join Date
    Apr 2010
    Location
    Tasmania Australia
    Posts
    2,642

    Default

    Put this meta tag in the <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> in the head of the pages you don't want indexed. A more reliable way is to use .htaccess but that requires a. a unix server and b. a certain amount of knowledge. Google for it; there are plenty of tunes on the web.

  3. #3
    Banned
    Join Date
    Jul 2013
    Posts
    24

    Default

    Quote Originally Posted by franko View Post
    Put this meta tag in the <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> in the head of the pages you don't want indexed. A more reliable way is to use .htaccess but that requires a. a unix server and b. a certain amount of knowledge. Google for it; there are plenty of tunes on the web.
    Earlier, I found the tag you just mentioned from a googlebot site and will set this up for these pages. I thought there was a way to select this option within NOF 2015 but can't locate it. Maybe I'm thinking too much!. I have .htaccess on my server and can set up private areas, but have reservations. Like if I put a cgi script inside a protected directory, will it be accessible and work correctly? Maybe I should ask my IP host about this.
    Last edited by Reacher; 05-18-2016 at 10:45 PM.

  4. #4
    Senior Member gotFusion's Avatar
    Join Date
    Jan 2010
    Location
    www.gotHosting.biz
    Posts
    4,529

    Default

    You should find these gotFusion tutorials should find helpful

    http://www.gotfusion.com/tutorials/tut.cfm?itemID=10

    http://www.gotfusion.com/tutsTD/metatags.cfm

    You can do a single word search on the GF site using the word meta for a lot more helpful information
    NetObjects Fusion Cloud Linux enabled Web Hosting, support + training starts at $14.95
    NetObjects Fusion web Hosting and support + ASP + PHP + ColdFusion + MySQL + MS SQL
    FREE NetObjects Fusion Support & training comes with all web hosting accounts
    NetObjects Fusion Web Hosting: http://www.gotHosting.biz

  5. #5
    Senior Member RayC's Avatar
    Join Date
    Apr 2010
    Location
    Toronto-ish, Canada
    Posts
    1,732

    Default

    Quote Originally Posted by franko View Post
    Put this meta tag in the <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> in the head of the pages you don't want indexed. A more reliable way is to use .htaccess but that requires a. a unix server and b. a certain amount of knowledge. Google for it; there are plenty of tunes on the web.
    This is like putting a sign in your lawn that says "Please don't rob my house."

    Legitimate bots (I.E. Google) will follow the instruction, but malicious bots will simply ignore this instruction and try to mine whatever data they can.
    Ray Cambpell
    Sounds In Sync
    Linked in

  6. #6
    Banned
    Join Date
    Jul 2013
    Posts
    24

    Default

    Quote Originally Posted by gotFusion View Post
    You should find these gotFusion tutorials should find helpful

    http://www.gotfusion.com/tutorials/tut.cfm?itemID=10

    http://www.gotfusion.com/tutsTD/metatags.cfm

    You can do a single word search on the GF site using the word meta for a lot more helpful information
    I looked at the GF tut about the "robots .txt file." I just added the robot tag to the "html head" of specific pages on my site. But I'm wondering, is the "robots .txt file" more effective, or is it just easier than adding a tag to each page?

    Also, referencing RayC's post - this seems more like a request for legitimate bots to stay away, but doesn't stop malicious bots. The only way to effectively do this would be to password protect a folder.

  7. #7
    Banned
    Join Date
    Jul 2013
    Posts
    24

    Default

    Quote Originally Posted by RayC View Post
    This is like putting a sign in your lawn that says "Please don't rob my house."

    Legitimate bots (I.E. Google) will follow the instruction, but malicious bots will simply ignore this instruction and try to mine whatever data they can.
    This is what I thought, but it may be better to use the meta tag than to do nothing.

  8. #8
    Senior Member gotFusion's Avatar
    Join Date
    Jan 2010
    Location
    www.gotHosting.biz
    Posts
    4,529

    Default

    Only bots that follow "rules" will follow them.

    Spambots will crawl.

    you can always block the IP address of the spambots but they spoof IPs most often.

    I've barred entire IP blocks such as eastern europe, africa, and asia. I do not have any customers from those regions so no point even letting anyone from those regions view any of my content.
    NetObjects Fusion Cloud Linux enabled Web Hosting, support + training starts at $14.95
    NetObjects Fusion web Hosting and support + ASP + PHP + ColdFusion + MySQL + MS SQL
    FREE NetObjects Fusion Support & training comes with all web hosting accounts
    NetObjects Fusion Web Hosting: http://www.gotHosting.biz

  9. #9
    Senior Member gotFusion's Avatar
    Join Date
    Jan 2010
    Location
    www.gotHosting.biz
    Posts
    4,529

    Default

    Quote Originally Posted by Reacher View Post
    This is what I thought, but it may be better to use the meta tag than to do nothing.
    You should have a single robots.txt file in your domain root and a meta robots line in the doc head for the pages you wish to disallow crawling.

    They are not the same thing so use them both.

    Look at your stats. you should see 404 listings for the bots that call robots.txt upon visiting the domain. All well behaved bots will look at the robots.txt file before starting the crawl as it tells them what to index and what not to index.
    NetObjects Fusion Cloud Linux enabled Web Hosting, support + training starts at $14.95
    NetObjects Fusion web Hosting and support + ASP + PHP + ColdFusion + MySQL + MS SQL
    FREE NetObjects Fusion Support & training comes with all web hosting accounts
    NetObjects Fusion Web Hosting: http://www.gotHosting.biz

  10. #10
    Banned
    Join Date
    Jul 2013
    Posts
    24

    Default

    Quote Originally Posted by gotFusion View Post
    You should have a single robots.txt file in your domain root and a meta robots line in the doc head for the pages you wish to disallow crawling.

    They are not the same thing so use them both.

    Look at your stats. you should see 404 listings for the bots that call robots.txt upon visiting the domain. All well behaved bots will look at the robots.txt file before starting the crawl as it tells them what to index and what not to index.
    Yes, I will add the separate robots .txt file just for greater insurance and monitor the stats. I mostly want google and other legitimate bots from crawling. I don't want certain pages to show up on google and other search engines' pages. As for the malicious bots, I'm learning that there isn't much I can do except password protect, which I have already done to a couple of pages.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •