tutorial.txt 8.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250
  1. Gajja tutorial
  2. ##############
  3. :Author: Ben Finney <ben+python@benfinney.id.au>
  4. :Updated: 2016-02-14
  5. Faking a file's size
  6. ====================
  7. Fake content with `io.BytesIO`
  8. ------------------------------
  9. Sometimes you will need to test a program which responds differently
  10. depending on some file's content::
  11. >>> class BatchingReader:
  12. ... """ A reader which batches its input. """
  13. ... batch_size = 200
  14. ... def __init__(self, infile):
  15. ... self.infile = infile
  16. ... def read(self):
  17. ... data = self.infile.read(self.batch_size)
  18. ... return data
  19. You start by writing a test case that will assert the `read` method on
  20. a large input file returns no more than `batch_size` bytes::
  21. >>> import unittest
  22. >>> class CachingReader_TestCase(unittest.TestCase):
  23. ... """ Test cases for `CachingReader` class. """
  24. ... def setUp(self):
  25. ... """ Set up test fixtures. """
  26. ... self.test_infile = make_fake_file("Lorem ipsum", 100000)
  27. ... def test_single_read_returns_batch_size_bytes(self):
  28. ... """ A single read should return only `batch_size` bytes. """
  29. ... reader = BatchingReader(self.test_infile)
  30. ... read_bytes = reader.read()
  31. ... self.assertLessEqual(len(read_bytes), reader.batch_size)
  32. How to make that fake input file? To fake the file content you can use
  33. a `io.BytesIO`::
  34. >>> import io
  35. >>> def make_fake_file(line_text, num_lines, encoding="utf-8"):
  36. ... """ Make a fake file of `num_lines` lines, each `line_text`. """
  37. ... test_content = "".join([line_text + "\n"] * num_lines)
  38. ... fake_file = io.BytesIO(test_content.encode(encoding))
  39. ... return fake_file
  40. >>> test_case = CachingReader_TestCase(
  41. ... 'test_single_read_returns_batch_size_bytes')
  42. >>> test_case.run()
  43. <unittest.result.TestResult run=1 errors=0 failures=0>
  44. For programs which accept any file-like object, this is often enough.
  45. Fake buffers don't have a corresponding filesystem entry
  46. --------------------------------------------------------
  47. Many programs, though, will not just read the input file, but also
  48. interrogate the corresponding filesystem entry. If our program uses
  49. `os.stat` to request the file size::
  50. >>> import os
  51. >>> class BatchingReader:
  52. ... """ A reader which batches its input. """
  53. ... batch_size = 200
  54. ... def __init__(self, infile):
  55. ... self.infile = infile
  56. ... def read(self):
  57. ... data = self.infile.read(self.batch_size)
  58. ... return data
  59. ... def estimate_batch_count(self):
  60. ... infile_stat = os.stat(self.infile.name)
  61. ... infile_size = infile_stat.st_size
  62. ... batch_count = infile_size / self.batch_size
  63. ... return batch_count
  64. A normal call to `os.stat` with the path of a real file will return a
  65. stat result object. The file size is one of the attributes::
  66. >>> import sys
  67. >>> os.path.exists(sys.executable)
  68. True
  69. >>> stat_result = os.stat(sys.executable)
  70. >>> stat_result.st_size > 1000
  71. True
  72. For testing that `BatchingReader.estimate_batch_count` method, the
  73. `io.BytesIO` instance won't help. It doesn't have a filesystem entry
  74. name, so interrogating its name will fail::
  75. >>> test_file = io.BytesIO("Lorem ipsum".encode("utf-8"))
  76. >>> reader = BatchingReader(test_file)
  77. >>> reader.estimate_batch_count()
  78. Traceback (most recent call last):
  79. ...
  80. AttributeError: '_io.BytesIO' object has no attribute 'name'
  81. We can give a unique name to our fake file, using `tempfile.mktemp`
  82. because we don't actually want to create the filesystem object. But
  83. then, the lack of a corresponding filesystem entry will make `os.stat`
  84. fail::
  85. >>> import tempfile
  86. >>> test_file.name = tempfile.mktemp()
  87. >>> reader = BatchingReader(test_file)
  88. >>> reader.estimate_batch_count()
  89. ... # doctest: +ELLIPSIS
  90. Traceback (most recent call last):
  91. ...
  92. FileNotFoundError: [Errno 2] No such file or directory: '...'
  93. Testing with real files is a bad answer
  94. ---------------------------------------
  95. We want to keep using `io.BytesIO` to offer read and write.
  96. An `io.BytesIO` exists only in program memory and never needs to touch
  97. slower storage. Using the real filesystem for temporary test files
  98. will be slower. By using real files in unit tests, we would create a
  99. disincentive to perform as many file-related test cases as we need.
  100. Using the real filesystem will be more complex. Real files need to be
  101. properly created, handled, and cleaned up after use.
  102. Using the real filesystem introduces more possibilities for unrelated,
  103. intermittent test failure. If a temporary test file is accessible when
  104. it should not be, or is not accessible when it should be, or has
  105. different properties at some time during a test, or in any other way
  106. behaves not as the test author expects, the test failure will be
  107. needlessly difficult to diagnose.
  108. We need to keep using in-memory fake files with constructed content,
  109. *and* be able to construct the filesystem access behaviour of a fake
  110. file.
  111. Solution: `gajja.FileDouble`
  112. ----------------------------
  113. The `gajja` library provides test doubles we need for substituting
  114. behaviour for specific fake files::
  115. >>> import gajja
  116. >>> fake_file = make_fake_file("Lorem ipsum", 100000)
  117. >>> fake_file_path = tempfile.mktemp()
  118. >>> file_double = gajja.FileDouble(fake_file_path, fake_file)
  119. The `FileDouble` instance maintains the behaviour for a fake
  120. filesystem entry. You can omit the `path` argument; it will default to
  121. ``None``. You can omit the `fake_file` argument; it will default to an
  122. empty file-like object::
  123. >>> file_double = gajja.FileDouble(path=fake_file_path)
  124. >>> file_double.fake_file.read()
  125. ''
  126. >>> file_double.fake_file.name == fake_file_path
  127. True
  128. >>> file_double = gajja.FileDouble(fake_file=fake_file)
  129. >>> file_double.path is None
  130. True
  131. >>> file_double.fake_file.name is None
  132. True
  133. For our example `CachingReader` test cases, we don't care about the
  134. filesystem path of the double, but we still need to make our own fake
  135. file object to control its contents.
  136. We construct a file double, specify the fake file with its test
  137. content, and arrange for `os.stat` to pay attention to Gajja's special
  138. handling per filesystem path::
  139. >>> class CachingReader_TestCase(unittest.TestCase):
  140. ... """ Test cases for `CachingReader` class. """
  141. ... def setUp(self):
  142. ... """ Set up test fixtures. """
  143. ... # Patch `os.stat` for this test case.
  144. ... gajja.patch_os_stat(self)
  145. ... # Determine the properties of the fake file.
  146. ... test_infile = make_fake_file("Lorem ipsum", 100000)
  147. ... # Make the `FileDouble` instance and register it.
  148. ... self.infile_double = gajja.FileDouble(fake_file=test_infile)
  149. ... self.infile_double.register_for_testcase(self)
  150. ... def test_estimate_batch_count_returns_expected_result(self):
  151. ... """ `estimate_batch_count` should return expected count. """
  152. ... reader = BatchingReader(self.infile_double.fake_file)
  153. ... result = reader.estimate_batch_count()
  154. ... fake_file_size = len(self.infile_double.fake_file.getvalue())
  155. ... expected_result = fake_file_size / reader.batch_size
  156. ... self.assertEqual(result, expected_result)
  157. When the test cases run, and the program calls `os.stat` with the
  158. filesystem path of our file double, the stat result's `st_size` is the
  159. length of the fake file's content. The program will then act on that
  160. fake file size::
  161. >>> test_case = CachingReader_TestCase(
  162. ... 'test_estimate_batch_count_returns_expected_result')
  163. >>> test_case.run()
  164. <unittest.result.TestResult run=1 errors=0 failures=0>
  165. Other calls to `os.stat` outside our test cases, or with different
  166. file paths, will be handed to the real `os.stat` and behave as
  167. expected::
  168. >>> os.stat(tempfile.mktemp())
  169. ... # doctest: +ELLIPSIS
  170. Traceback (most recent call last):
  171. ...
  172. FileNotFoundError: [Errno 2] No such file or directory: '...'
  173. ..
  174. This document is written using `reStructuredText`_ markup, and can
  175. be rendered with `Docutils`_ to other formats.
  176. .. _Docutils: http://docutils.sourceforge.net/
  177. .. _reStructuredText: http://docutils.sourceforge.net/rst.html
  178. ..
  179. This is free software: you may copy, modify, and/or distribute this work
  180. under the terms of the GNU General Public License as published by the
  181. Free Software Foundation; version 3 of that license or any later version.
  182. No warranty expressed or implied. See the file ‘LICENSE.GPL-3’ for details.
  183. ..
  184. Local variables:
  185. coding: utf-8
  186. mode: text
  187. mode: rst
  188. time-stamp-format: "%:y-%02m-%02d"
  189. time-stamp-start: "^:Updated:[ ]+"
  190. time-stamp-end: "$"
  191. time-stamp-line-limit: 20
  192. End:
  193. vim: fileencoding=utf-8 filetype=rst :