Page 2 of 2 FirstFirst 12
Results 11 to 12 of 12

Thread: Securing pages from Bots

  1. #11
    Senior Member gotFusion's Avatar
    Join Date
    Jan 2010
    Location
    www.gotHosting.biz
    Posts
    4,529

    Default

    Read my robots.txt tutorial or google for others.

    You can restrict doc and file types as well as entire pages or folders.

    If you want to be ubler stealthy, you can add a restriction to the way back machine so that even the archive bot does not store your content.

    https://web.archive.org/web/*/http://www.gotfusion.com
    Last edited by gotFusion; 05-21-2016 at 12:57 PM.
    NetObjects Fusion Cloud Linux enabled Web Hosting, support + training starts at $14.95
    NetObjects Fusion web Hosting and support + ASP + PHP + ColdFusion + MySQL + MS SQL
    FREE NetObjects Fusion Support & training comes with all web hosting accounts
    NetObjects Fusion Web Hosting: http://www.gotHosting.biz

  2. #12
    Banned
    Join Date
    Jul 2013
    Posts
    24

    Default

    Quote Originally Posted by gotFusion View Post
    You should have a single robots.txt file in your domain root and a meta robots line in the doc head for the pages you wish to disallow crawling.

    They are not the same thing so use them both.

    Look at your stats. you should see 404 listings for the bots that call robots.txt upon visiting the domain. All well behaved bots will look at the robots.txt file before starting the crawl as it tells them what to index and what not to index.
    I have made a robots.txt file from the example on the gotfusion tut. I have a couple of questions.

    (1) If I specify DISALLOW a folder, does this mean that all sub-folders and pages in that folder are included in the disallow.

    (2) I found a robots.txt file in my public-html directory that says "sitemap.xml", that NOF put there by default. I assume this file means that bots can crawl everything on the site. Should I replace this one with my new robots.txt file that includes the "DISALLOW: /sitemap.xml" file? ...and should I leave the existing sitemap.xml there?

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •