robots_txt.html 2.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
  1. <!DOCTYPE html>
  2. <html>
  3. <head>
  4. <link rel="icon" type="image/png" href="https://owlman.neocities.org/favicon.ico" />
  5. <title>robots.txt</title>
  6. <body bgcolor="#CCCCCC" text="#000000">
  7. <!-- -->
  8. <!-- Because you are reading this, it must mean only one thing; you -->
  9. <!-- are looking at our page source, well, hope you like the -->
  10. <!-- look of it! -->
  11. <!-- -->
  12. <!-- The Penny's Pages Wiki was made by members of the Neocities -->
  13. <!-- webhost, for your enjoyment, and our pain. We hope that you are -->
  14. <!-- enjoying reading our articles. -->
  15. <!-- -->
  16. <!-- Penny's Pages is composed of original material, and may be used -->
  17. <!-- as long as you follow CC BY-NC-SA 3.0 -->
  18. <!-- -->
  19. <!-- Our URL: https://thewikion.neocities.org/ -->
  20. <!-- -->
  21. <!-- Enjoy the rest of your night, young Internet search astronaut! -->
  22. <!-- -->
  23. <TABLE WIDTH=750><TD VALIGN=TOP>
  24. <h1>robots.txt</h1>
  25. <p>
  26. Simply put, the robots exclusion standard (also called the robots exclusion protocol or robots.txt protocol) is a easy way of telling Web crawlers and other Web robots what parts of a Web site they can and can not view.
  27. <p>
  28. To give robots instructions about what part of your site they can access, you can put a text (.txt) file called robots.txt in the main directory of their Web site, e.g. <tt><a href="https://owlman.neocities.org/robots.txt">https://owlman.neocities.org/robots.txt</a></tt>. This file tells robots what part of your site they can view, however, some robots can ignore such files, especially malicious (or bad) robots.
  29. <p>
  30. If the robots.txt file does not exist, Web robots assume that they can see all parts of your site.
  31. <p>
  32. An example of a good robot (and a good boy).
  33. <p>
  34. <pre> \ oo
  35. \____|\mm
  36. //_//\ \_\
  37. /K-9/ \/_/
  38. /___/_____\
  39. -----------</pre>
  40. <p>
  41. <b><h1><a href="#Outside links">Outside links</a><a name="Outside links"></a></h1></b>
  42. <p>
  43. Here are some useful links on robots.txt that may help you.
  44. <p>
  45. <a href="https://en.wikipedia.org/wiki/Robots_exclusion_standard">English Wikipedia article on robots.txt</a>
  46. <p>
  47. <a href="https://simple.wikipedia.org/wiki/Robots_exclusion_standard">Simple English Wikipedia article on robots.txt</a>
  48. <p>
  49. <a href="http://www.robotstxt.org/">The Web Robots Pages</a>
  50. </body>
  51. </html>