help.txt 35 KB


  1. [***** This file best-viewed with a 4-column tab setting, with line-wrapping *****]
  2. ================= Mlucas v20.1.1 command line options listing =================
  3. Note: besides the basic post-build self-test flags covered by the online README page
  4. ( http://www.mersenneforum.org/mayer/README.html ),
  5. the most-crucial command-line flags the average user doing GIMPS production work
  6. needs to be familiar with are '-cpu' and (for stage 2 of p-1 work), '-maxalloc'.
  7. ===============================================================================
  8. Sections:
  9. [0] Default mode
  10. [1]: Post-build self-testing for various FFT-length ranges
  11. [2]: FFT-length setting
  12. [3]: FFT radix-set specification
  13. [4]: Mersenne-number primality testing
  14. [5]: Fermat-number primality testing
  15. [6]: Residue shift
  16. [7]: Probable-primality testing mode
  17. [8]: Iteration-number setting
  18. [9]: P-1 Factoring:
  19. [9a]: Setting maximum-percentage of system free-RAM to use per stage 2 instance
  20. [9b]: How to run stage 2 continuation for a given stage 1 bound
  21. [10]: Setting threadcount and CPU core affinity
  22. [11]: User control options in mlucas.ini
  23. [12]: Savefile format and creation
  24. [13]: *** DON'Ts ***
  25. [14]: General troubleshooting
  26. [14a]: How to safely interrupt a running instance
  27. ===============================================================================
  28. Symbol and abbreviation key:
  29. <CR>: carriage return
  30. | : separator for one-of-the-following multiple-choice menus
  31. {} : denotes user-supplied numerical arguments of the type noted:
  32. {int} means nonnegative integer, {+int} = positive int, {float} = float. Args not
  33. enclosed in [] are required, in the sense that "if you invoke the command-line
  34. flag in question, you must follow it with a value (or set thereof) as listed.
  35. {arg1|arg2|...} means "pick one value from the set as the required numeric argument".
  36. [] : encloses optional arguments.
  37. The required/optional syntax is nested, e.g. for -cpu {lo[:hi[:incr]]},
  38. all args {int}, 'lo' is required at a minimum. If the user wants a core-range,
  39. ':hi' is also required, and for a range-with-constant-stride, ':incr' is also
  40. required.
  41. [] May also be used to include optional chars in a given command-line flag name,
  42. e.g. "-fft[len]" means "-fft" is a short-form alternative to "-fftlen".
  43. Vertical stacking indicates argument short 'nickname' options, e.g. for
  44. -argument : [blah blah]
  45. -arg : [blah blah]
  46. The stacking means that '-arg' can be used in place of '-argument'.
  47. Prefixes for binary multiples: The prefixes K, M and G are used herein in the binary sense,
  48. i.e. represent 2^10, 2^20 and 2^30 of the quantity in question. I eschew the more recently
  49. adopted convention (https://physics.nist.gov/cuu/Units/binary.html) of Ki, Mi and Gi for same,
  50. mainly because the corresponding pronounced prefixes sound silly to me: e.g. MiB stands for a
  51. mebibyte, which to me sounds like a computer-geek joke punchline, "mebbe I'll have a byte to eat,
  52. and mebbe I won't. Ha, ha, a million laughs, thanks for coming, you've been a swell audience, and
  53. don't forget to generously tip the waitstaff."
  54. ===============================================================================
  55. Supported command-line arguments:
  56. [0]:
  57. <CR> Default mode: looks for a worktodo.ini file in the local directory;
  58. if none found, prompts for manual keyboard entry
  59. ======================
  60. [1]: Post-build self-testing for various FFT-length ranges:
  61. FOR BEST RESULTS, RUN ANY SELF-TESTS UNDER ZERO- OR CONSTANT-LOAD CONDITIONS.
  62. The following self-test options will cause an mlucas.cfg file containing
  63. the optimal FFT radix set for the runlength(s) tested to be created (if one did not
  64. exist previously) or appended (if one did) with new timing data. Such a file-write is
  65. triggered by each complete set of FFT radices available at a given FFT length being
  66. tested, i.e. by a self-test without a user-specified -radset argument.
  67. A user-specific Mersenne exponent may be supplied via the -m flag; if none is specified,
  68. the program will use the largest permissible exponent for the given FFT length, based on
  69. its internal length-setting algorithm. If the user does use -m to specify an exponent,
  70. the -iters argument is also required.
  71. The default number of iterations for the self-test is 100 for <= 4 threads and 1000 for more
  72. than 4 threads. The user may also override the default via the -iters flag; while it is not
  73. required, it is best to use one of the standard timing-test values of -iters = {100|1000|10000},
  74. with the larger values being preferred for multithreaded timing tests, in order to minimize
  75. noise in timing data.
  76. Similarly, it is recommended to not use the -m flag for such tests, unless
  77. roundoff error levels on a given compute platform are such that the default exponent at one or
  78. more FFT lengths of interest prevents a reasonable sampling of available radix sets at same.
  79. If the user lets the program set the exponent and uses one of the aforementioned standard
  80. self-test iteration counts, the resulting best-timing FFT radix set will only be written to the
  81. resulting mlucas.cfg file if the timing-test residues match the internally-stored precomputed
  82. ones for the given default exponent at the iteration count in question, with eligible radix sets
  83. consisting of those for which the roundoff error also remains below an acceptable threshold.
  84. If the user instead specifies the exponent (only allowed for a single-FFT-length timing test)
  85. and an iteration number (required in this case), the resulting best-timing FFT radix set will
  86. only be written to the resulting mlucas.cfg file if the timing-test results match each other.
  87. This is important for tuning code parameters to your particular platform.
  88. Options - again note the user can override the default iteration count based on #threads via
  89. '-iters {+int}', though only 100|1000|10000-iteration cases have precomputed reference residues.
  90. The (very rough) time estimates are for 1000 iterations done using 4 or more cores.
  91. -s {anything other than the mnemonics below}: Self-test, user must also supply exponent (via -m or -f),
  92. iteration number and (optionally) FFT length to use.
  93. -s tiny Runs 100 (<= 4 threads) or 1000-iteration self-tests on a set of 32 Mersenne exponents,
  94. -s t covering FFT lengths 8K-120K. This will take around 1 minute on a fast CPU.
  95. -s small Runs 100 or 1000-iteration self-tests on a set of 32 Mersenne exponents, covering
  96. -s s FFT lengths 120K-1920K. This will take around 10 minutes on a fast CPU..
  97. **************** THIS IS THE ONLY SELF-TEST ORDINARY USERS ARE RECOMMENDED TO DO: *************
  98. * *
  99. * -s medium Runs 100 or 1000-iteration self-tests on a set of 16 Mersenne exponents, covering *
  100. * -s m FFT lengths from 2M to 7.5M. This will take around an hour on a fast CPU. *
  101. * *
  102. ***********************************************************************************************
  103. -s large Runs 100 or 1000-iteration self-tests on a set of 24 Mersenne exponents, covering
  104. -s l FFT lengths 8M to 60M. This will take around 10 hours on a fast CPU.
  105. -s all Runs 100 or 1000--iteration self-tests of all the above FFT lengths.
  106. -s a This will take around 12 hours on a fast CPU.
  107. -s xl [The 'h' form is short for 'huge'.] Runs 100 or 1000-iteration self-tests on a set
  108. -s h of 16 M(p), covering FFT lengths 64M-240M. This will take several days on a fast CPU.
  109. -s xxl [The 'e' form is short for 'egregious'.] Runs 100 or 1000-iteration self-tests on a set
  110. -s e of 9 M(p), covering FFT lengths 256M-512M. This will take several days on a fast CPU, and
  111. requires '-shift 0' also be added to the command line, which restriction will be removed
  112. at a later date.
  113. ======================
  114. [2]: FFT-length setting:
  115. -fft[len] {+int[K|M]}
  116. If {+int[K|M]} is one of the available FFT lengths, runs all available FFT radices
  117. available at that length, unless the -radset flag is invoked (see below for details).
  118. If -fft is invoked without the -iters flag, it is assumed the user wishes to do a
  119. production run with a non-default FFT length, in this case the program requires a
  120. valid worktodo.ini-file entry with exponent not more than 5% larger than the
  121. default maximum for that FFT length.
  122. Without the optional [K|M] alphabetic suffixes (i.e. with a pure-integer argument),
  123. the code treats the numeric value as representing Kilodoubles. The code also supports
  124. a floating-point numeric argument with either a 'K' or 'M' suffix. For example, the
  125. following are all equivalent: -fftlen 5632, -fft 5632, -fft 5632K, -fft 5.5M, and all
  126. result in an FFT having length 5.5 × 2^20 = 5632 × 2^10 = 5767168 doubles. Any numeric
  127. value must map to a supported FFT length; for Mlucas these are of form k × 2^n, where k
  128. is an odd integer in the set (1,3,5,7,9,11,13,15).
  129. If -fft is invoked with a user-supplied value of -iters but without a
  130. user-supplied exponent, the program will do the specified number of iterations
  131. using the default self-test Mersenne or Fermat-number exponent for that FFT length.
  132. If -fft is invoked with a user-supplied value of -iters and either the
  133. -m or -f flag and a user-supplied exponent, the program will do the specified
  134. number of iterations of either the Lucas-Lehmer test with starting value 4 (-m)
  135. or the Pe'pin test with starting value 3 (-f) on the user-specified modulus.
  136. In either of the latter 2 cases, the program will produce a cfg-file entry based
  137. on the timing results, assuming at least 50% of the available radix sets at the
  138. given FFT length ran the specified #iters to completion without suffering a fatal
  139. error of some kind, e.g. excessive roundoff error, mismatching residues versus-
  140. tabulated, or "No AVX-512 support; Skipping this leading radix" for certain smaller
  141. leading radices.
  142. Use this to find the optimal radix set for a single FFT length on your hardware.
  143. NOTE: If you use other than the default modulus or #iters for such a single-fft-
  144. length timing test, it is up to you to manually verify that the residues output
  145. match for all fft radix combinations and that the roundoff errors are reasonable.
  146. ======================
  147. [3]: FFT radix-set specification:
  148. -radset {+int[comma-separated list +int,...,+int]}
  149. Requires a supported value of -fft {+int}[K|M] to also be specified, and a value of
  150. -iters. If this argument is invoked, a single-FFT-length and single-set-of-FFT-radices
  151. timing test is assumed. If a single {+int} argument is supplied, this indicates the
  152. specific index of a set of complex FFT radices to use, based on the big select table
  153. in the function get_fft_radices().
  154. Optionally, the -radset flag can take an actual set of comma-separated FFT radices.
  155. Said radix set must be one of those present in the aforementioned select table for the
  156. FFT length in question.
  157. For example, the following are equivalent, for -fft 6M:
  158. -radset 1
  159. -radset 192,16,32,32
  160. ======================
  161. [4]: Mersenne-number primality testing:
  162. -m {+int}
  163. Performs a Lucas-Lehmer primality test of the Mersenne number M(int) = 2^int - 1,
  164. where int must be an odd prime. If -iters is also invoked, this indicates a timing test,
  165. and allows for suitable added arguments (optionally -fft, and if that is invoked,
  166. optionally, -radset) to be supplied.
  167. If the -fft option (and optionally -radset) is also invoked but -iters is not, the
  168. program first checks the first eligible (any line whose first non-whitespace caharcter
  169. is alphabetic is treated as such) line of the worktodo.ini file to see if the assignment
  170. specified there is a Lucas-Lehmer test with the same exponent as specified via the -m
  171. argument. If so, the -fft argument is treated as a user override of the default FFT
  172. length for the exponent. If -radset is also invoked, this is similarly treated as a user-
  173. specified radix set for the user-set FFT length; otherwise the program will use the
  174. mlucas.cfg file to select the radix set to be used for the user-forced FFT length.
  175. If the first eligible worktodo.ini file entry does not specify an LL-test (i.e. does not
  176. begin with "Test=" or "DoubleCheck=") of an exponent matching the -m value, a set of
  177. timing self-tests is run on the user-specified Mersenne number using all sets of FFT
  178. radices available at the specified FFT length.
  179. If the -fft option is not invoked, the self-tests use all sets of FFT radices available at
  180. that exponent's default FFT length. Users can use this to find the optimal radix set for a
  181. given Mersenne number exponent on their hardware, similarly to the -fft option. Performs
  182. as many iterations as specified via the -iters flag (which is required if -m is invoked).
  183. ======================
  184. [5]: Fermat-number primality testing:
  185. -f {+int}
  186. Performs a base-3 Pe'pin test on the Fermat number F(num) = 2^(2^num) + 1.
  187. If desired this can be invoked together with the -fft option, as for the Mersenne-number
  188. self-tests (see notes about the -m flag). Note that not all FFT lengths supported for -m
  189. are available for -f; the supported ones are of form k × 2^n, where k is an odd integer
  190. in the set [1,7,15,63].
  191. Optimal radix sets and timings are written to a fermat.cfg file. Performs as many iterations
  192. as specified via the -iters flag (which is required if -f is invoked).
  193. ======================
  194. [6]: Residue shift:
  195. -shift {+int}
  196. Number of bits by which to shift the initial seed (= iteration-0 residue). This initial
  197. shift count is doubled (modulo the binary exponent of the modulus being tested) each
  198. iteration; for Fermat-number tests the mod-doubling is further augmented by addition of a
  199. random bit, in order to keep the shift count from landing on 0 after (Fermat-number index)
  200. iterations and remaining 0. (Cf. https://mersenneforum.org/showthread.php?p=582525#post582525)
  201. Savefile residues are rightward-shifted by the current shift count
  202. before being written to the file; thus savefiles contain the unshifted residue, and
  203. separately the current shift count, which the program uses to leftward-shift the
  204. savefile residue when the program is restarted from interrupt.
  205. The shift count is a 32-bit unsigned int; any modulus having > 2^32 bits (thus using an
  206. FFT length of 256M or larger) requires 0 shift.
  207. ======================
  208. [7]: Probable-primality testing mode:
  209. -prp [(+int)base]
  210. This flag takes no further arguments, just invokes PRP-test mode for the specified
  211. Mersenne-number self-test(s). This means that instead of running the rigorous Lucas-Lehmer
  212. primality test, a "pure-squarings-modified Fermat-PRP test" (cf. next paragraph) is run
  213. instead, to either a base specified via the optional numeric argument following the -prp
  214. flag, or to a default base of 3.
  215. The standard Fermat-PRP test of a number N consists of selecting a base b coprime to N and
  216. checking whether b^(N-1) == 1 (mod N), in which case N is a probable prime (PRP) to base b.
  217. For a Mersenne number N = M(p) = 2^p-1, N-1 has a binary representation 2^p-2 = 111...110,
  218. (p-1) binary ones followed by a single 0. Starting with initial seed x = b - which must be
  219. coprime to M(p) and further not equal to a power of 2, since due to their binary form all
  220. M(p) pass a Fermat-PRP test to base 2^k, whether M(p) is prime or not - the standard Fermat-
  221. PRP test implemented as a left-to-right binary modular powering consisting of (p-2) iterations
  222. of form x := b*x^2 (a squaring and scalar muliply-by-b, mod M(p)) plus a final mod-squaring
  223. x := x^2 (mod M(p)), with M(p) being a probable-prime to base b if the result == 1.
  224. However, the rigorous-as-far-as-we-know "Gerbicz error check" requires a pure sequence
  225. of mod squarings, thus we replace the standard Fermat-PRP test with a modified one whereby
  226. we check whether b^(N+1) == b^2 (mod N). For N = M(p) we have N+1 = 2^p, thus this variant
  227. requires (p-1) pure mod-squarings. Any self-tests done in PRP mode will do the first (iters)
  228. of these.
  229. ======================
  230. [8]: Iteration-number setting:
  231. -iters {+int}
  232. Do {+int} self-test iterations of the type determined by the modulus-related options:
  233. -s/-m means Lucas-Lehmer test iterations with initial seed 4, -f means Pe'pin-test
  234. squarings with initial seed 3.
  235. ======================
  236. [9]: P-1 Factoring:
  237. [9a]: Setting maximum-percentage of system free-RAM to use for p-1 stage 2 work per instance:
  238. -maxalloc {+int}
  239. Maximum percentage of available system RAM to use per instance. Must be >= 10. Default = 90;
  240. user-specified values greater than 90 are allowed, but only recommended if the program is
  241. underestimating system free RAM for some reason (use the free-RAM field in the first few
  242. lines of Linux 'top' output to check this), or if on a system whose amount of RAM allows
  243. only a modest number of stage 2 buffers (less than 100, say), the default allows a buffer
  244. count which is just below the next-larger allowable one. The buffer counts usable by Mlucas
  245. (as of this writing, v20.x.y) are multiples of 24 and of 40; if you consult the table titled
  246. "With small-prime relocation" inside the long comment block preceding the pm1_bigstep_size()
  247. function in the pm1.c source file, you will see that the '%modmul' numbers (measured relative
  248. to stage 2 run with the minimum 24-buffers and without small-prime relocation, which was a
  249. late-added optimization to v20) for 24 and 40 buffers differ by 6%, which is appreciable.
  250. Thus if at start of stage 2 you see something like "Available memory allows up to 39
  251. Stage 2 residue-sized memory buffers. Using 24 Stage 2 memory buffers" in the console output,
  252. I suggest "kill -9"ing the process and restarting with '-maxalloc 100', seeing if that gives
  253. you 40 buffers, then just keeping an eye on the 'top' output to check for swapping-to-disc
  254. once the stage 2 inits have finished and prime-processing starts.
  255. On the other hand '% modmul' counts for 40 and 48 buffers differ by less than 1%, so
  256. it's not worth trying to force the higher value to be used. (The next significant gain to
  257. be had is at 72 = 3*24 buffers.)
  258. Under MacOS the default is 50% of available (not free) RAM.
  259. If the system is swapping between RAM and HD/SSD during stage 2, as evidenced by
  260. free-RAM dropping to near-0 and 'kswapd' entries appearing among the CPU-%-sorted
  261. of the 'top' output, the value needs to be lowered. If the default is leaving plenty
  262. of available RAM and the resulting buffer count is under 100, it is worth trying to force
  263. a higher fraction to be used, by invoking -maxalloc [some value > 50].
  264. -pm1_s2_nbuf {+int}
  265. Since available RAM fluctuates depending on current load, this flag alternatively
  266. allows the user to set the maximum number of p-1 stage 2 buffers to use per instance.
  267. Currently, the number of stage 2 buffers must be a multiple of 24 or 40; if the user-
  268. set maximum value is not such, the largest such multiple <= the user-specified value
  269. is used for stage 2 work. For stage 2 restarts there is an added constraint related
  270. to small-prime relocation, namely that if stage 2 was begun with a multiple of 24 or
  271. 40 buffers, the restart-value must also be a multiple of the same base-count, 24 or 40.
  272. Said constraint will be automatically enforced. If the resulting buffer count exhausts
  273. available memory, performance will suffer due to system memory-swapping, thus this flag
  274. should only be invoked by users who know what they are doing.
  275. Only one of the 2 flags may be set via the command line.
  276. These 2 flags are only important in the context of stage 2 of p-1 factoring, which
  277. will be done automatically before a Lucas-Lehmer primality or probable-primality-test
  278. if the GIMPS assignment in question indicates that some p-1 effort is warranted.
  279. [9b]: How to run stage 2 continuation for a given stage 1 bound:
  280. On completion of an initial p-1 run with stage bounds B1 and B2 and a contiguous stage 2
  281. (i.e. one which covered the interval [B1,B2,b2]), the program leaves the respective p-1 residues
  282. in a pair of savefiles named p[exp].s1 and p[exp].s2, respectively. The former of these may
  283. be re-used for one or more deeper stage 2 interval runs, in distributed fashion (multiple
  284. program instances, each of which processes a given stage 2 interal so as to cover a desired
  285. expanded prime interval), if desired. Say the original run used a worktodo.ini assignment
  286. Pminus1={aid,}k,b,n,c,B1,B2
  287. where {aid} is a 32-hexit assignment ID (which may be omitted or filled with 'n/a' - the
  288. quotes are for emphasis only, they must not appear in the actual assignment - if the user
  289. wishes to create such an assignment for him-or-herself). Here, k,n,b,c define the desired
  290. modulus (for prime-exponent Mersenne number M(p) = 2^p-1, k = 1, n = 2, b = p, c = -1; for
  291. Fermat number F[m] = 2^(2^m)+1, k = 1, n = 2, b = 2^m, c = +1) and B1 and B2 are the p-1
  292. stage bounds. If the original used a Pfactor assignment:
  293. Pfactor={aid},k,b,n,c,TF_BITS,ll_tests_saved_if_factor_found
  294. then you must read the resulting p-1 stage bounds from the run log file, p[exp].stat .
  295. Once you have the stage original-run stage bounds in hand, for a desired stage 2 continuation
  296. interval with bounds [B2_start, B2], the proper assignment syntax is
  297. Pminus1={aid},k,b,n,c,B1,B2,TF_BITS,B2_start[,known_factors]
  298. where any known factors should be in form of a comma-separated list of known bookended
  299. with "", e.g. known factors f1 and f2 appear as "f1,f2". These will not be reported if
  300. found in any of the GCD steps which test for a factor having been found, and the run
  301. will only exit if a new factor (one not appearing in the known_factors list) is found.
  302. ======================
  303. [10]: Setting threadcount and CPU core affinity:
  304. Note: As of this writing (v20.x.y) setting core affinity is not effective when running
  305. on Windows Subsystem for Linux (WSL), presumably due to virtualization. Processes
  306. will core-hop, negatively impacting efficiency. Also, on WSL, core count is limited
  307. to 64 due to MS Windows' "processor group" construct. Effectively though, it is less;
  308. in actuality on a 68-core & x4 HT Xeon Phi 7250 for example, a running Mlucas instance
  309. and the WSL session in which it runs will occupy at most 16 cores x their 4 hyperthreads,
  310. or 4 cores x their 4 hyperthreads, depending on which processor group is allocated.
  311. (Obsolescent - not recommended, please use the -cpu flag instead:)
  312. -nthread {int}
  313. For multithread-enabled builds, run with this many threads.
  314. If the user does not specify a thread count, the default is to run single-threaded
  315. with that thread's affinity set to logical core 0.
  316. AFFINITY: The code will attempt to set the affinity of the resulting threads
  317. 0:n-1 to the same-indexed processor cores - whether this means distinct physical
  318. cores is entirely up to the CPU vendor - E.g. Intel uses such a numbering scheme
  319. but AMD does not. For this reason as of v17 this option is deprecated in favor of
  320. the -cpu flag, whose usage is detailed below, with the online README page providing
  321. guidance for the core-numbering schemes of popular CPU vendors.
  322. If n exceeds the available number of logical processor cores (call it #cpu), the
  323. program will halt with an error message.
  324. For greater control over affinity setting, use the -cpu option, which supports two
  325. distinct core-specification syntaxes (which may be mixed together), as follows:
  326. (Recommended variant of affinity setting:)
  327. -cpu {lo[:hi[:incr]]}
  328. (All args {+int} here.) Set thread/CPU affinity. NOTE: This flag and -nthread are
  329. mutually exclusive! If -cpu is used, the threadcount is inferred from the numeric-argument
  330. triplet which follows. If only the 'lo' argument of the triplet is supplied, it means
  331. "run single-threaded with affinity to logical core {lo}." (Note that in the absence
  332. of hyperthreading, logical and physical cores are the same.)
  333. If the increment (third) argument of the triplet is omitted, it defaults to incr = 1.
  334. The CPU set encoded by the integer-triplet argument to -cpu corresponds to the
  335. values of the integer loop index i in the C-loop for(i = lo; i <= hi; i += incr),
  336. excluding the loop-exit value of i. Thus '-cpu 0:3' and '-cpu 0:3:1' are both
  337. exactly equivalent to '-nthread 4', whereas '-cpu 0:6:2' and '-cpu 0:7:2' both
  338. specify affinity setting to logical cores 0,2,4,6, assuming said cores exist.
  339. Lastly, note that no whitespace is permitted within the colon-separated numeric field.
  340. -cpu {triplet0[,triplet1,...]} This is simply an extended version of the above affinity-
  341. setting syntax in which each of the comma-separated 'triplet' subfields is in the above
  342. form and, analogously to the one-triplet-only version, no whitespace is permitted within
  343. the colon-and-comma-separated numeric field. Thus '-cpu 0:3,8:11' and '-cpu 0:3:1,8:11:1'
  344. both specify an 8-threaded run with affinity set to logical core quartets 0-3 and 8-11,
  345. whereas '-cpu 0:3:2,8:11:2' means run 4-threaded on cores 0,2,8,10. As described for the
  346. -nthread option, it is an error for any core index to exceed the available number of logical
  347. processor cores.
  348. ======================
  349. [11]: User control options in mlucas.ini:
  350. Any mlucas.ini file in the user's run directory will be checked at program (re)start
  351. for the supported options listed below. Each such option is assumed to be formatted as
  352. [ws][optname][ws][=][ws][value][ws][comment]
  353. with [ws] denoting whitespace and [value] a numeric value parseable and representable
  354. as an IEEE-754 64-bit double-precision float. The numeric value may be followed by anything,
  355. so long as it does not affect parsing of the preceding numeric entry. For instance, mlucas.ini
  356. on a low_RAM might contain
  357. CheckInterval = 10000 /* I'm using the space right of the value for a C-style comment. */
  358. LowMem = 1 // But one could also use C++ comment format...
  359. InterimGCD = 0 # Or bash-shell and Python-style comment format...
  360. LowMem = 2 Or no comment delimiter at all. Note that the program uses the first entry
  361. matching a given option name, so the second LowMem entry here will be ignored.
  362. This specifies savefile-writes for LL/PRP/p-1 every 10000 iterations, and allows PRP-testing
  363. but excludes p-1 stage 2. The InterimGCD option setting is moot in this case since it only
  364. applies to p-1 stage 2.
  365. Note that the option-name and comment formats in mlucas.ini differ from those in local.ini -
  366. the latter is read and updated by the several primenet.py work-management scripts described
  367. on the online Mlucas README.html file, thus the flag names therein should be all-lowercase
  368. and comments must conform to Python rules (https://docs.python.org/3/library/configparser.html):
  369. "Configuration files may include comments, prefixed by specific characters (# and ;).
  370. Comments may appear on their own in an otherwise empty line, or may be entered in lines
  371. holding values or section names. In the latter case, they need to be preceded by a
  372. whitespace character to be recognized as a comment. (For backwards compatibility,
  373. only ; starts an inline comment, while # does not.)"
  374. o FREQUENCY OF SAVEFILE WRITES: The default frequency is threadcount-dependent: every 10000
  375. iterations for <= 4 threads; every 100,000 iterations for more than 4 threads. (Note that
  376. as of v20, if exponent ratio in the "p[ = ***]/pmax_rec" informational printed at run-start
  377. is > 0.97, the program will set ITERS_BETWEEN_CHECKPOINTS to the smaller of any user-set
  378. value or 10000, irrespective of the threadcount.
  379. At run-start, you will see this captured in the informational terminal output (which gets
  380. piped to nohup.out if you prefix your instance invocation with "nohup", as is recommended):
  381. Set affinity for the following 4 cores: 0.1.2.3.
  382. NTHREADS = 4
  383. Setting ITERS_BETWEEN_CHECKPOINTS = 10000.
  384. For p-1 factoring, slightly different terminology is used for .stat-file entries documenting
  385. savefile-writes: "S1 bit = ..." reflects which bit of the p-1 stage 1 small-prime-powers
  386. product (whose bitlength is roughly 1.4x the stage 1 bound B1) for stage 1, and "S2 at q = ..."
  387. reflecting which stage 2 primes have been processed (prime less than and nearest the printed
  388. value). However, in all three cases the underlying savefile frequency is based on the same
  389. metric: the number of mod-M(p) multiplies done during the current task. Thus p-1 stage 1
  390. checkpoints will occur at roughly the same wall-clock frequency as for LL and PRP-test ones;
  391. the stage 2 savefile-update frequency will be perhaps 10-20% slower, reflecting the fact that
  392. while LL/PRP/stage-1 all do chains of in-place mod-M(p) "autosquarings", p-1 stage 2 does
  393. 2-input mod-M(p) multiplies, each of which requires 2x more data to stream between the CPU
  394. and the cache+memory subsystem.
  395. The ITERS_BETWEEN_CHECKPOINTS value can be customized by adding a "CheckInterval = [value]"
  396. line to one's mlucas.ini file, but note that there are constraints on the value related to
  397. the Gerbicz-checking done for PRP tests. Specifically, the CheckInterval value must be a
  398. multiple of 1000 and must divide 1 million. Violation of these constraints will trigger an
  399. assertion-exit if a PRP-test is attempted.
  400. o RUNNING ON LOW-MEMORY SYSTEMS: The LowMem option provides two supported low-memory run
  401. modes for low-RAM systems:
  402. LowMem = 1 allows PRP-testing but excludes p-1 stage 2. For a given exponent, this
  403. mode will use 2-4x as much working memory for PRP-tests as it does for LL-tests.
  404. LowMem = 2 excludes both PRP-testing and p-1 stage 2. This is the minimum-memory option.
  405. For QA-testing purposes, this allows self-tests to some modest number of iterations to
  406. be done up to the maximum supported FFT length of 512M on systems with 8GB of memory.
  407. o DISABLING INTERIM GCDs IN P-1 STAGE 2: Set InterimGCD = 0 in mlucas.ini. This will cause
  408. the program to wait until any p-1 stage 2 is finished to take a GCD (check for a factor),
  409. irrespective of the depth of the stage.
  410. ======================
  411. [12]: Savefile format and creation:
  412. Mlucas creates savefiles (a.k.a. "checkpoint" files) at regular intervals - cf. section [11]
  413. for how to override the default value of same - to permit safe restart with in case of a run
  414. being interrupted for any reason, such as the host machine going down or a program crash. All
  415. savefile data are stored in bytewise minimum-size and endian-independent form, according to the
  416. schema implemented in the [read|write]_ppm1_savefiles function pair in the Mlucas.c source file.
  417. (Cf. the long comment "Set of functions to Read/Write full-length residue data in bytewise format"
  418. preceding said set of functions.)
  419. Such savefile writes are reflected in the run logfile (.stat file) latest-progress summary lines.
  420. o PRP tests save both a test current-residue value and a Gerbicz error check residue, thus are
  421. roughly twice the size of those for LL-tests and p-1 factoring savefiles.
  422. o LL, PRP-test and p-1 stage 1 residues are stored in redundant savefile pairs. For work on
  423. the Mersenne number M[exponent] = 2^[exponent] - 1, these files are named p[exponent] and
  424. q[exponent]. At the conclusion of a p-1 factoring run, the primary p[exponent] savefile is
  425. renamed p[exponent].s1. The reason for this is twofold:
  426. [1] So an ensuing LL or PRP-test of the same exponent (assuming a factor was not found),
  427. which uses the same-named savefile pair, does not overwrite the p-1 stage 1 data. "Why
  428. might I want to save those?" you ask. Here's why:
  429. [2] The stage result in the .s1 file permits optional later stage 2 continuation runs:
  430. Say you ran p-1 on a low-memory system, such that only stage 1 was run. Later, you install
  431. more RAM, which allows for more efficient stage 2 work. At that point, it may make sense
  432. to rerun some exponents to deeper stage 2 bounds. See section [9b] for how to do this.
  433. o P-1 runs also store a file p[exponent].s1_prod encoding the precomputed small-prime-powers
  434. product used for the stage 1 left-to-right modular binary powering. This can save time on
  435. restart for large stage 1 bounds, say B1 = 5 million or larger, since as of this writing I have
  436. not made a special effort to optimize the computation of said product beyond use of 64-bit
  437. integer hardware multiply where available. (Speeding things by replacing the O(n^2) "grammar
  438. school" multiply with a subquadratic one is a possible future enhancement.) For a given
  439. stage 1 bound B1, the s1_prod file will be roughly 0.2*B1 bytes in size.
  440. ======================
  441. [13]: *** DON'Ts ***
  442. o DON'T omit actually *reading* - not 'skimming' - the latest version of the README.html.
  443. This is especially important for new users - Mlucas is not a one-or-two-click "do everything
  444. for me" program. Experienced users will still want to peruse the online readme page,
  445. especially for details about the latest releases.
  446. o DON'T skip the post-build self-test step.
  447. o DON'T run multiple Mlucas instances in a given run directory.
  448. ======================
  449. [14]: General troubleshooting:
  450. Please start by looking for posts about your issue in the
  451. mersenneforum.org thread specific to the release you are using in the mersenneforum.org
  452. Mlucas subforum at https://www.mersenneforum.org/forumdisplay.php?f=118.
  453. If you don't find anything, make a post describing your problem.
  454. [14a]: How to safely interrupt a running instance:
  455. Note that halting (as opposed to merely suspending using 'kill -CONT' as described below) an Mlucas
  456. instance should produce a clean "at end of the current iteration, write savefiles and exit" for LL-test,
  457. PRP and p-1 stage 1 processing. For p-1 stage 2, the state machine involved is of sufficient complexity
  458. that I have not (yet) implemented a clean-exit-with-savefile-write: an interrupt will cause you to lose
  459. whatever work has been done since the last stage 2 savefile (p[expo].s2) write.
  460. If you need to halt all Mlucas instances running on a system, use 'killall -STOP Mlucas' to merely suspend
  461. processing and 'killall -CONT Mlucas' to resume; to instead terminate all instances, use 'killall Mlucas'.
  462. To halt just one or a subset of multiple running instances, use 'pidof Mlucas' to find the process ID (pid).
  463. If more than one ID comes up and you only want to interrupt or halt one or a subset thereof, you need
  464. to first figure out which process IDs map to the desired subset. If you used a batch shell script to
  465. start multiple instances, the resulting process IDs should end up being in ascending numeric order.
  466. Once you have identified the desired process IDs:
  467. [1] If you only wish to suspend processing and resume it later, use 'kill -STOP [pid]' to suspend
  468. and 'kill -CONT [pid]' to resume, where [pid] refers to the process ID. Use "man kill" for detailed
  469. info regarding the kill command and the flags it takes - note that it is not Mlucas but rather the
  470. OS which listens for this particular pair of signal types.
  471. [2] If you wish to kill the process(es), use 'Ctrl-c' for a foreground Mlucas process, which sends
  472. a SIGINT signal. For a background process, use 'kill [pid]' -- the default signal for kill is SIGTERM,
  473. which is another one of the signal types the program listens for. To kill all Mlucas instances running
  474. on a given system, use 'killall Mlucas'.
  475. If for any reason you end up with an Mlucas instance which is not responding to the above kinds of
  476. interrupt signal, use the "kill it with fire" option, 'kill -9 [pid]'. If it's a run which you want
  477. to resume at some later time, you can minimize lost runtime by waiting until the current iteration
  478. interval completes, i.e. until the latest savefile update occurs.
  479. For the current list of signal types which the program listens for, see the sig_handler() function near
  480. the top of Mlucas.c . As of this writing (v20.1.1), the program listens for INT,TERM,HUP,ALRM,USR1 and USR2.
  481. ======================
  482. Last updated: 28 Nov 2021