15.xhtml 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142
  1. <?xml version="1.0" encoding="utf-8"?>
  2. <!--
  3. h t t :: / / t /
  4. h t t :: // // t //
  5. h ttttt ttttt ppppp sssss // // y y sssss ttttt //
  6. hhhh t t p p s // // y y s t //
  7. h hh t t ppppp sssss // // yyyyy sssss t //
  8. h h t t p s :: / / y .. s t .. /
  9. h h t t p sssss :: / / yyyyy .. sssss t .. /
  10. <https://y.st./>
  11. Copyright © 2016 Alex Yst <mailto:copyright@y.st>
  12. This program is free software: you can redistribute it and/or modify
  13. it under the terms of the GNU General Public License as published by
  14. the Free Software Foundation, either version 3 of the License, or
  15. (at your option) any later version.
  16. This program is distributed in the hope that it will be useful,
  17. but WITHOUT ANY WARRANTY; without even the implied warranty of
  18. MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
  19. GNU General Public License for more details.
  20. You should have received a copy of the GNU General Public License
  21. along with this program. If not, see <https://www.gnu.org./licenses/>.
  22. -->
  23. <!DOCTYPE html>
  24. <html xmlns="http://www.w3.org/1999/xhtml">
  25. <head>
  26. <base href="https://y.st./en/weblog/2016/01-January/15.xhtml" />
  27. <title>More specific spider output &lt;https://y.st./en/weblog/2016/01-January/15.xhtml&gt;</title>
  28. <link rel="icon" type="image/png" href="/link/CC_BY-SA_4.0/y.st./icon.png" />
  29. <link rel="stylesheet" type="text/css" href="/link/basic.css" />
  30. <link rel="stylesheet" type="text/css" href="/link/site-specific.css" />
  31. <script type="text/javascript" src="/script/javascript.js" />
  32. <meta name="viewport" content="width=device-width" />
  33. </head>
  34. <body>
  35. <nav>
  36. <p>
  37. <a href="/en/">Home</a> |
  38. <a href="/en/a/about.xhtml">About</a> |
  39. <a href="/en/a/contact.xhtml">Contact</a> |
  40. <a href="/a/canary.txt">Canary</a> |
  41. <a href="/en/URI_research/"><abbr title="Uniform Resource Identifier">URI</abbr> research</a> |
  42. <a href="/en/opinion/">Opinions</a> |
  43. <a href="/en/coursework/">Coursework</a> |
  44. <a href="/en/law/">Law</a> |
  45. <a href="/en/a/links.xhtml">Links</a> |
  46. <a href="/en/weblog/2016/01-January/15.xhtml.asc">{this page}.asc</a>
  47. </p>
  48. <hr/>
  49. <p>
  50. Weblog index:
  51. <a href="/en/weblog/"><abbr title="American Standard Code for Information Interchange">ASCII</abbr> calendars</a> |
  52. <a href="/en/weblog/index_ol_ascending.xhtml">Ascending list</a> |
  53. <a href="/en/weblog/index_ol_descending.xhtml">Descending list</a>
  54. </p>
  55. <hr/>
  56. <p>
  57. Jump to entry:
  58. <a href="/en/weblog/2015/03-March/07.xhtml">&lt;&lt;First</a>
  59. <a rel="prev" href="/en/weblog/2016/01-January/14.xhtml">&lt;Previous</a>
  60. <a rel="next" href="/en/weblog/2016/01-January/16.xhtml">Next&gt;</a>
  61. <a href="/en/weblog/latest.xhtml">Latest&gt;&gt;</a>
  62. </p>
  63. <hr/>
  64. </nav>
  65. <header>
  66. <h1>More specific spider output</h1>
  67. <p>Day 00314: Friday, 2016 January 15</p>
  68. </header>
  69. <p>
  70. I awoke this morning to find that <a href="/en/domains/newdawn.local.xhtml">newdawn</a> had frozen on me.
  71. With the spider rewritten in such a way that interuptions will not do lasting harm to the database, I have not even been trying to run it in such a way that newdawn going down allows the spider to continue running, despite it running from <a href="/en/domains/cepo.local.xhtml">cepo</a>.
  72. It has been crawling a particularly large website over the past couple days though, so now it will have to start crawling that site from the beginning.
  73. I hate not seeing how close to reaching a savable state that the spider is in.
  74. I previously had to remove the progress report feature as it was not compatible with the new MySQL database, but now, the spider is back to working partially from <abbr title="random-access memory">RAM</abbr>.
  75. I have added back the progress report feature, though it only reports the progress in relation to the current site, not the whole crawl.
  76. </p>
  77. <p>
  78. After that, I decided to add a new feature to the <abbr title="Client for URLs/Client URL Request Library/Curl URL Request Library">cURL</abbr> download-limiting class, so once more, I got to work on building another wrapper class.
  79. I went through the <a href="https://secure.php.net/manual/en/resource.php">list of resource types</a> and found eighteen sections of the manual that mention functions in need of wrapping.
  80. Fourteen of these sections document standard <abbr title="PHP: Hypertext Preprocessor">PHP</abbr> extensions while four document <abbr title="PHP Extension Community Library">PECL</abbr> extensions.
  81. I will start with the standard extensions before working on the <abbr title="PHP Extension Community Library">PECL</abbr> stuff.
  82. I wanted to work on the <a href="https://secure.php.net/manual/en/book.sem.php">semaphore, shared memory and <abbr title="inter-process communication">IPC</abbr></a> functions today, as I wanted to see why the functions in this section of the manual have three three different prefixes.
  83. However <a href="https://secure.php.net/manual/en/function.ftok.php"><code>\ftok()</code></a>, the one function in this section of the manual that had no prefix, depends on resources from the <a href="https://secure.php.net/manual/en/ref.shmop.php">shared memory</a> extension, so I worked on the wrapper class for those functions instead.
  84. After actually completing this tiny class, I realized that <code>\ftok()</code> did not really need to be implemented in any class, let alone as a method that required another class.
  85. </p>
  86. <p>
  87. With that out of the way, I added a new feature to my curl_limit class to output information to the command line about the current progress of a download.
  88. The problem with this though is that it would output information whether it is used in a script that this is wanted or not, so I called it a debugging feature, made it optional, and turned it off by default.
  89. With it already being optional, I decided to make it optional in the spider too.
  90. I added a new configuration option, turned it on in the example configuration, and modified the existing output lines in the spider to respect the debug output setting.
  91. Finally, I moved the code that makes the spider work over <abbr title="The Onion Router">Tor</abbr> into its own constant so that it would be reusable.
  92. I made a few other minor adjustments to the spider as well, but nothing noteworthy.
  93. I have a few other somewhat important features that I want to implement in it, but I think that I will mostly put it aside for now and only fix any bugs I find in it.
  94. I want to get to work building some forum software; I have learned almost everything that the spider project had to teach me, though admittedly there is still a little left to learn from it.
  95. When the spider finishes its long crawl, perhaps that is when I will pick it back up again.
  96. </p>
  97. <p>
  98. We made plans to head to Eugene on Sunday or Monday, in order to work on moving what junk we still have in Springfield into a storage unit so we can work on getting that house on the market.
  99. The plan was tentative, as if there was anything that we could do to help with Cyrus&apos; Boy Scout project in that time, we would do that instead.
  100. However, tonight, Cyrus finally got his Boy Scout project approved! We are headed to the library, where he will be organizing some sort of organization effort.
  101. </p>
  102. <p>
  103. Normally, my mother does not want to hear anything about what I am up to.
  104. I have learned to be pretty quiet and not bring things up if I can help it.
  105. However, she actually asked me today, so I took a risk, and told her about the spider.
  106. To avoid having her check back on the progress of it and seeing something disappointing, I explained that given my lack of resources (my lack of disk space), I could not support a powerful spider or build a working search engine around it.
  107. She suggested that we pick up a new hard drive at the second-hand computer store when we are in Eugene! I do not think that she understood just how much hard drive space that I think that I need.
  108. I figure that I need at least a terabyte or two.
  109. Getting a hard drive that large will not be cheap, even second hand.
  110. Now that I think on it, I would probably also need a lot more <abbr title="random-access memory">RAM</abbr> and a better processor, too.
  111. I simple upgrade will not work.
  112. I would need an entire replacement machine, and it would need to be a powerful one.
  113. </p>
  114. <p>
  115. I wrote back to my old school asking again how I can get my password reset.
  116. Hopefully they will actually respond this time.
  117. </p>
  118. <p>
  119. My <a href="/a/canary.txt">canary</a> still sings the tune of freedom and transparency.
  120. </p>
  121. <hr/>
  122. <p>
  123. Copyright © 2016 Alex Yst;
  124. You may modify and/or redistribute this document under the terms of the <a rel="license" href="/license/gpl-3.0-standalone.xhtml"><abbr title="GNU&apos;s Not Unix">GNU</abbr> <abbr title="General Public License version Three or later">GPLv3+</abbr></a>.
  125. If for some reason you would prefer to modify and/or distribute this document under other free copyleft terms, please ask me via email.
  126. My address is in the source comments near the top of this document.
  127. This license also applies to embedded content such as images.
  128. For more information on that, see <a href="/en/a/licensing.xhtml">licensing</a>.
  129. </p>
  130. <p>
  131. <abbr title="World Wide Web Consortium">W3C</abbr> standards are important.
  132. This document conforms to the <a href="https://validator.w3.org./nu/?doc=https%3A%2F%2Fy.st.%2Fen%2Fweblog%2F2016%2F01-January%2F15.xhtml"><abbr title="Extensible Hypertext Markup Language">XHTML</abbr> 5.1</a> specification and uses style sheets that conform to the <a href="http://jigsaw.w3.org./css-validator/validator?uri=https%3A%2F%2Fy.st.%2Fen%2Fweblog%2F2016%2F01-January%2F15.xhtml"><abbr title="Cascading Style Sheets">CSS</abbr>3</a> specification.
  133. </p>
  134. </body>
  135. </html>