0001-Bug-1384062-Make-SystemResourceMonitor.stop-more-res.patch 4.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118
  1. From 2874ecd82e9671f774bdfda41fe0857fcb916c13 Mon Sep 17 00:00:00 2001
  2. Message-Id: <2874ecd82e9671f774bdfda41fe0857fcb916c13.1506634385.git.jan.steffens@gmail.com>
  3. From: Mike Hommey <mh+mozilla@glandium.org>
  4. Date: Wed, 16 Aug 2017 13:16:16 +0900
  5. Subject: [PATCH] Bug 1384062 - Make SystemResourceMonitor.stop more resilient
  6. to errors. r=ahal,gps
  7. The poll() call in SystemResourceMonitor.stop might fail even though
  8. there is something to read from the pipe, in some corner cases, and
  9. python won't let us know about it. In that case, an exception is thrown,
  10. leaving the SystemResourceMonitor (and its callers) in a weird state. In
  11. practice, this leads BuildMonitor.__exit__ to recall stop, which then
  12. fails.
  13. So when poll() throws an exception, we pretend there's still something
  14. to read, and we try to read anyways. If there is something to read,
  15. recv() will return it, otherwise, it will throw an exception of its own,
  16. which we catch, pretending we're done.
  17. Furthermore, when there is nothing to read from the pipe, poll() simply
  18. returns False, and our loop never sets `done` to True, and we then hit
  19. an assert, which doesn't have its place here, so we remove it.
  20. Finally, the other end of the pipe might have died at any time, making
  21. sending over the pipe fail, so we also protect against that.
  22. With all these changes, it feels like the reason to backout bug 1239939
  23. in bug 1272782 should have been dealt with, and we can drop the timeout
  24. again.
  25. --HG--
  26. extra : rebase_source : ac72dd5b2602cf3ffddfb429f95e02380f939893
  27. ---
  28. .../mozsystemmonitor/resourcemonitor.py | 38 +++++++++++++++-------
  29. 1 file changed, 26 insertions(+), 12 deletions(-)
  30. diff --git a/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py b/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py
  31. index 8f2ac95cbe505540..38f9bc986ac2a120 100644
  32. --- a/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py
  33. +++ b/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py
  34. @@ -289,47 +289,61 @@ class SystemResourceMonitor(object):
  35. assert self._running
  36. assert not self._stopped
  37. - self._pipe.send(('terminate',))
  38. + try:
  39. + self._pipe.send(('terminate',))
  40. + except Exception:
  41. + pass
  42. self._running = False
  43. self._stopped = True
  44. self.measurements = []
  45. - done = False
  46. -
  47. # The child process will send each data sample over the pipe
  48. # as a separate data structure. When it has finished sending
  49. # samples, it sends a special "done" message to indicate it
  50. # is finished.
  51. - while self._pipe.poll(1.0):
  52. - start_time, end_time, io_diff, cpu_diff, cpu_percent, virt_mem, \
  53. - swap_mem = self._pipe.recv()
  54. +
  55. + # multiprocessing.Pipe is not actually a pipe on at least Linux. that
  56. + # has an effect on the expected outcome of reading from it when the
  57. + # other end of the pipe dies, leading to possibly hanging on revc()
  58. + # below. So we must poll().
  59. + def poll():
  60. + try:
  61. + return self._pipe.poll(0.1)
  62. + except Exception:
  63. + # Poll might throw an exception even though there's still
  64. + # data to read. That happens when the underlying system call
  65. + # returns both POLLERR and POLLIN, but python doesn't tell us
  66. + # about it. So assume there is something to read, and we'll
  67. + # get an exception when trying to read the data.
  68. + return True
  69. + while poll():
  70. + try:
  71. + start_time, end_time, io_diff, cpu_diff, cpu_percent, virt_mem, \
  72. + swap_mem = self._pipe.recv()
  73. + except Exception:
  74. + # Let's assume we're done here
  75. + break
  76. # There should be nothing after the "done" message so
  77. # terminate.
  78. if start_time == 'done':
  79. - done = True
  80. break
  81. io = self._io_type(*io_diff)
  82. virt = self._virt_type(*virt_mem)
  83. swap = self._swap_type(*swap_mem)
  84. cpu_times = [self._cpu_times_type(*v) for v in cpu_diff]
  85. self.measurements.append(SystemResourceUsage(start_time, end_time,
  86. cpu_times, cpu_percent, io, virt, swap))
  87. # We establish a timeout so we don't hang forever if the child
  88. # process has crashed.
  89. self._process.join(10)
  90. if self._process.is_alive():
  91. self._process.terminate()
  92. self._process.join(10)
  93. - else:
  94. - # We should have received a "done" message from the
  95. - # child indicating it shut down properly. This only
  96. - # happens if the child shuts down cleanly.
  97. - assert done
  98. if len(self.measurements):
  99. self.start_time = self.measurements[0].start
  100. --
  101. 2.14.2