05.xhtml 9.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129
  1. <?xml version="1.0" encoding="utf-8"?>
  2. <!--
  3. h t t :: / / t /
  4. h t t :: // // t //
  5. h ttttt ttttt ppppp sssss // // y y sssss ttttt //
  6. hhhh t t p p s // // y y s t //
  7. h hh t t ppppp sssss // // yyyyy sssss t //
  8. h h t t p s :: / / y .. s t .. /
  9. h h t t p sssss :: / / yyyyy .. sssss t .. /
  10. <https://y.st./>
  11. Copyright © 2016 Alex Yst <mailto:copyright@y.st>
  12. This program is free software: you can redistribute it and/or modify
  13. it under the terms of the GNU General Public License as published by
  14. the Free Software Foundation, either version 3 of the License, or
  15. (at your option) any later version.
  16. This program is distributed in the hope that it will be useful,
  17. but WITHOUT ANY WARRANTY; without even the implied warranty of
  18. MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
  19. GNU General Public License for more details.
  20. You should have received a copy of the GNU General Public License
  21. along with this program. If not, see <https://www.gnu.org./licenses/>.
  22. -->
  23. <!DOCTYPE html>
  24. <html xmlns="http://www.w3.org/1999/xhtml">
  25. <head>
  26. <base href="https://y.st./en/weblog/2016/02-February/05.xhtml" />
  27. <title>The spider seems to be running smoothly now &lt;https://y.st./en/weblog/2016/02-February/05.xhtml&gt;</title>
  28. <link rel="icon" type="image/png" href="/link/CC_BY-SA_4.0/y.st./icon.png" />
  29. <link rel="stylesheet" type="text/css" href="/link/basic.css" />
  30. <link rel="stylesheet" type="text/css" href="/link/site-specific.css" />
  31. <script type="text/javascript" src="/script/javascript.js" />
  32. <meta name="viewport" content="width=device-width" />
  33. </head>
  34. <body>
  35. <nav>
  36. <p>
  37. <a href="/en/">Home</a> |
  38. <a href="/en/a/about.xhtml">About</a> |
  39. <a href="/en/a/contact.xhtml">Contact</a> |
  40. <a href="/a/canary.txt">Canary</a> |
  41. <a href="/en/URI_research/"><abbr title="Uniform Resource Identifier">URI</abbr> research</a> |
  42. <a href="/en/opinion/">Opinions</a> |
  43. <a href="/en/coursework/">Coursework</a> |
  44. <a href="/en/law/">Law</a> |
  45. <a href="/en/a/links.xhtml">Links</a> |
  46. <a href="/en/weblog/2016/02-February/05.xhtml.asc">{this page}.asc</a>
  47. </p>
  48. <hr/>
  49. <p>
  50. Weblog index:
  51. <a href="/en/weblog/"><abbr title="American Standard Code for Information Interchange">ASCII</abbr> calendars</a> |
  52. <a href="/en/weblog/index_ol_ascending.xhtml">Ascending list</a> |
  53. <a href="/en/weblog/index_ol_descending.xhtml">Descending list</a>
  54. </p>
  55. <hr/>
  56. <p>
  57. Jump to entry:
  58. <a href="/en/weblog/2015/03-March/07.xhtml">&lt;&lt;First</a>
  59. <a rel="prev" href="/en/weblog/2016/02-February/04.xhtml">&lt;Previous</a>
  60. <a rel="next" href="/en/weblog/2016/02-February/06.xhtml">Next&gt;</a>
  61. <a href="/en/weblog/latest.xhtml">Latest&gt;&gt;</a>
  62. </p>
  63. <hr/>
  64. </nav>
  65. <header>
  66. <h1>The spider seems to be running smoothly now</h1>
  67. <p>Day 00335: Friday, 2016 February 05</p>
  68. </header>
  69. <p>
  70. My task for today was to complete my <abbr title="Free Application for Federal Student Aid">FAFSA</abbr>.
  71. It went very smoothly.
  72. Of note, it was possible to leave the telephone number blank, despite that message I received that claimed that government forms usually require a telephone number even from people that do not have one.
  73. </p>
  74. <p>
  75. I ran into a bug before I went to bed last night in which the spider assumes that every normalized <abbr title="Uniform Resource Identifier">URI</abbr> has a host component, which was then triggered by the presence of a <abbr title="Uniform Resource Identifier">URI</abbr> using the <code>javascript:</code> scheme.
  76. I did not remember seeing that one on the <a href="https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml">official scheme list</a>, so I took another look.
  77. It is not listed! There are a bunch of schemes listed there that almost no one uses, yet the fairly-well-known <code>javascript:</code> scheme did not make the cut.
  78. In any case, I have repaired that bug in the spider.
  79. When I woke up, I found that a bug that I had intentionally coded into the spider had been set off as well.
  80. Namely, I programed it to assume that all <abbr title="Uniform Resource Identifier">URI</abbr>s that it finds are valid.
  81. My functions are designed to throw exceptions in case of malformed <abbr title="Uniform Resource Identifier">URI</abbr>s, but the spider was not designed to catch them.
  82. I knew that this bug would cause an issue eventually, I just did not realize that it would be so soon.
  83. Some people just do not know how to form a valid <abbr title="Uniform Resource Identifier">URI</abbr>, which I think is the case here, but also you should never trust user input.
  84. I hate to blindly ignore these bad <abbr title="Uniform Resource Identifier">URI</abbr>s, but I do not know what else to do.
  85. For now, the spider now tries to make sense of each <abbr title="Uniform Resource Identifier">URI</abbr>, but it the syntax is invalid, it catches the exception, outputs an error message that will not be seen as it will be buried in all the other output, then moves on to the next <abbr title="Uniform Resource Identifier">URI</abbr>.
  86. I suppose that I will need to start greping the log from now on.
  87. </p>
  88. <p>
  89. I wrote a couple of the new scheme-specific normalization functions, but then I decided to delete them and remove the callback hook from the main normalization function.
  90. This is starting to lead off on an unnecessary tangent.
  91. If someone wants to compare two <abbr title="Uniform Resource Identifier">URI</abbr>s for equivalence, they should probably use both the generic <abbr title="Uniform Resource Identifier">URI</abbr> normalization function and, if necessary, a scheme-specific <abbr title="Uniform Resource Identifier">URI</abbr> normalization function.
  92. I also removed the parts of the function that normalize the port component based on the default port of a given scheme and checks for the presence of <abbr title="Uniform Resource Identifier">URI</abbr> components that are required or forbidden by a given scheme as well.
  93. I considered leaving them, as at least the port normalization is useful to me, but they seem like they are outside the scope of the function.
  94. I need a function that can take the place of <abbr title="PHP: Hypertext Preprocessor">PHP</abbr>&apos;s broken <code>\parse_url()</code> function.
  95. The RFC 3986-specified normalization goes a bit beyond that, though it is quite useful when trying to keep a list of pages and sites that the spider has already visited and is generic enough to be useful in many cases.
  96. Anything scheme-specific is outside the scope of that function though and should be delegated to another function if that functionality is required.
  97. </p>
  98. <p>
  99. I received an email from eSmart reminding me to finish filing my taxes through them even though they have not had one of the supervisors get back to me.
  100. Because of their letter, I pestered them on Twitter a bit, as they included a link to their Twitter account in their email.
  101. On Twitter, they gave me the support email address! I doubt that I will pursue this, but if I wanted to, I have another method.
  102. </p>
  103. <p>
  104. It seems that my mother is going to Portland for some training and she will be dropping me off in Springfield along the way.
  105. There, I will work on cleaning up our former residence through Thursday, Friday, and Saturday.
  106. I may not have Internet access during that time though, so I may not update this weblog until I get back.
  107. </p>
  108. <hr/>
  109. <p>
  110. Copyright © 2016 Alex Yst;
  111. You may modify and/or redistribute this document under the terms of the <a rel="license" href="/license/gpl-3.0-standalone.xhtml"><abbr title="GNU&apos;s Not Unix">GNU</abbr> <abbr title="General Public License version Three or later">GPLv3+</abbr></a>.
  112. If for some reason you would prefer to modify and/or distribute this document under other free copyleft terms, please ask me via email.
  113. My address is in the source comments near the top of this document.
  114. This license also applies to embedded content such as images.
  115. For more information on that, see <a href="/en/a/licensing.xhtml">licensing</a>.
  116. </p>
  117. <p>
  118. <abbr title="World Wide Web Consortium">W3C</abbr> standards are important.
  119. This document conforms to the <a href="https://validator.w3.org./nu/?doc=https%3A%2F%2Fy.st.%2Fen%2Fweblog%2F2016%2F02-February%2F05.xhtml"><abbr title="Extensible Hypertext Markup Language">XHTML</abbr> 5.1</a> specification and uses style sheets that conform to the <a href="http://jigsaw.w3.org./css-validator/validator?uri=https%3A%2F%2Fy.st.%2Fen%2Fweblog%2F2016%2F02-February%2F05.xhtml"><abbr title="Cascading Style Sheets">CSS</abbr>3</a> specification.
  120. </p>
  121. </body>
  122. </html>