123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616 |
- [***** This file best-viewed with a 4-column tab setting, with line-wrapping *****]
- ================= Mlucas v20.1.1 command line options listing =================
- Note: besides the basic post-build self-test flags covered by the online README page
- ( http://www.mersenneforum.org/mayer/README.html ),
- the most-crucial command-line flags the average user doing GIMPS production work
- needs to be familiar with are '-cpu' and (for stage 2 of p-1 work), '-maxalloc'.
- ===============================================================================
- Sections:
- [0] Default mode
- [1]: Post-build self-testing for various FFT-length ranges
- [2]: FFT-length setting
- [3]: FFT radix-set specification
- [4]: Mersenne-number primality testing
- [5]: Fermat-number primality testing
- [6]: Residue shift
- [7]: Probable-primality testing mode
- [8]: Iteration-number setting
- [9]: P-1 Factoring:
- [9a]: Setting maximum-percentage of system free-RAM to use per stage 2 instance
- [9b]: How to run stage 2 continuation for a given stage 1 bound
- [10]: Setting threadcount and CPU core affinity
- [11]: User control options in mlucas.ini
- [12]: Savefile format and creation
- [13]: *** DON'Ts ***
- [14]: General troubleshooting
- [14a]: How to safely interrupt a running instance
- ===============================================================================
- Symbol and abbreviation key:
- <CR>: carriage return
- | : separator for one-of-the-following multiple-choice menus
- {} : denotes user-supplied numerical arguments of the type noted:
- {int} means nonnegative integer, {+int} = positive int, {float} = float. Args not
- enclosed in [] are required, in the sense that "if you invoke the command-line
- flag in question, you must follow it with a value (or set thereof) as listed.
- {arg1|arg2|...} means "pick one value from the set as the required numeric argument".
- [] : encloses optional arguments.
- The required/optional syntax is nested, e.g. for -cpu {lo[:hi[:incr]]},
- all args {int}, 'lo' is required at a minimum. If the user wants a core-range,
- ':hi' is also required, and for a range-with-constant-stride, ':incr' is also
- required.
- [] May also be used to include optional chars in a given command-line flag name,
- e.g. "-fft[len]" means "-fft" is a short-form alternative to "-fftlen".
- Vertical stacking indicates argument short 'nickname' options, e.g. for
- -argument : [blah blah]
- -arg : [blah blah]
- The stacking means that '-arg' can be used in place of '-argument'.
- Prefixes for binary multiples: The prefixes K, M and G are used herein in the binary sense,
- i.e. represent 2^10, 2^20 and 2^30 of the quantity in question. I eschew the more recently
- adopted convention (https://physics.nist.gov/cuu/Units/binary.html) of Ki, Mi and Gi for same,
- mainly because the corresponding pronounced prefixes sound silly to me: e.g. MiB stands for a
- mebibyte, which to me sounds like a computer-geek joke punchline, "mebbe I'll have a byte to eat,
- and mebbe I won't. Ha, ha, a million laughs, thanks for coming, you've been a swell audience, and
- don't forget to generously tip the waitstaff."
- ===============================================================================
- Supported command-line arguments:
- [0]:
- <CR> Default mode: looks for a worktodo.ini file in the local directory;
- if none found, prompts for manual keyboard entry
- ======================
- [1]: Post-build self-testing for various FFT-length ranges:
- FOR BEST RESULTS, RUN ANY SELF-TESTS UNDER ZERO- OR CONSTANT-LOAD CONDITIONS.
- The following self-test options will cause an mlucas.cfg file containing
- the optimal FFT radix set for the runlength(s) tested to be created (if one did not
- exist previously) or appended (if one did) with new timing data. Such a file-write is
- triggered by each complete set of FFT radices available at a given FFT length being
- tested, i.e. by a self-test without a user-specified -radset argument.
- A user-specific Mersenne exponent may be supplied via the -m flag; if none is specified,
- the program will use the largest permissible exponent for the given FFT length, based on
- its internal length-setting algorithm. If the user does use -m to specify an exponent,
- the -iters argument is also required.
- The default number of iterations for the self-test is 100 for <= 4 threads and 1000 for more
- than 4 threads. The user may also override the default via the -iters flag; while it is not
- required, it is best to use one of the standard timing-test values of -iters = {100|1000|10000},
- with the larger values being preferred for multithreaded timing tests, in order to minimize
- noise in timing data.
- Similarly, it is recommended to not use the -m flag for such tests, unless
- roundoff error levels on a given compute platform are such that the default exponent at one or
- more FFT lengths of interest prevents a reasonable sampling of available radix sets at same.
- If the user lets the program set the exponent and uses one of the aforementioned standard
- self-test iteration counts, the resulting best-timing FFT radix set will only be written to the
- resulting mlucas.cfg file if the timing-test residues match the internally-stored precomputed
- ones for the given default exponent at the iteration count in question, with eligible radix sets
- consisting of those for which the roundoff error also remains below an acceptable threshold.
- If the user instead specifies the exponent (only allowed for a single-FFT-length timing test)
- and an iteration number (required in this case), the resulting best-timing FFT radix set will
- only be written to the resulting mlucas.cfg file if the timing-test results match each other.
- This is important for tuning code parameters to your particular platform.
- Options - again note the user can override the default iteration count based on #threads via
- '-iters {+int}', though only 100|1000|10000-iteration cases have precomputed reference residues.
- The (very rough) time estimates are for 1000 iterations done using 4 or more cores.
- -s {anything other than the mnemonics below}: Self-test, user must also supply exponent (via -m or -f),
- iteration number and (optionally) FFT length to use.
- -s tiny Runs 100 (<= 4 threads) or 1000-iteration self-tests on a set of 32 Mersenne exponents,
- -s t covering FFT lengths 8K-120K. This will take around 1 minute on a fast CPU.
- -s small Runs 100 or 1000-iteration self-tests on a set of 32 Mersenne exponents, covering
- -s s FFT lengths 120K-1920K. This will take around 10 minutes on a fast CPU..
- **************** THIS IS THE ONLY SELF-TEST ORDINARY USERS ARE RECOMMENDED TO DO: *************
- * *
- * -s medium Runs 100 or 1000-iteration self-tests on a set of 16 Mersenne exponents, covering *
- * -s m FFT lengths from 2M to 7.5M. This will take around an hour on a fast CPU. *
- * *
- ***********************************************************************************************
- -s large Runs 100 or 1000-iteration self-tests on a set of 24 Mersenne exponents, covering
- -s l FFT lengths 8M to 60M. This will take around 10 hours on a fast CPU.
- -s all Runs 100 or 1000--iteration self-tests of all the above FFT lengths.
- -s a This will take around 12 hours on a fast CPU.
- -s xl [The 'h' form is short for 'huge'.] Runs 100 or 1000-iteration self-tests on a set
- -s h of 16 M(p), covering FFT lengths 64M-240M. This will take several days on a fast CPU.
- -s xxl [The 'e' form is short for 'egregious'.] Runs 100 or 1000-iteration self-tests on a set
- -s e of 9 M(p), covering FFT lengths 256M-512M. This will take several days on a fast CPU, and
- requires '-shift 0' also be added to the command line, which restriction will be removed
- at a later date.
- ======================
- [2]: FFT-length setting:
- -fft[len] {+int[K|M]}
- If {+int[K|M]} is one of the available FFT lengths, runs all available FFT radices
- available at that length, unless the -radset flag is invoked (see below for details).
- If -fft is invoked without the -iters flag, it is assumed the user wishes to do a
- production run with a non-default FFT length, in this case the program requires a
- valid worktodo.ini-file entry with exponent not more than 5% larger than the
- default maximum for that FFT length.
- Without the optional [K|M] alphabetic suffixes (i.e. with a pure-integer argument),
- the code treats the numeric value as representing Kilodoubles. The code also supports
- a floating-point numeric argument with either a 'K' or 'M' suffix. For example, the
- following are all equivalent: -fftlen 5632, -fft 5632, -fft 5632K, -fft 5.5M, and all
- result in an FFT having length 5.5 × 2^20 = 5632 × 2^10 = 5767168 doubles. Any numeric
- value must map to a supported FFT length; for Mlucas these are of form k × 2^n, where k
- is an odd integer in the set (1,3,5,7,9,11,13,15).
- If -fft is invoked with a user-supplied value of -iters but without a
- user-supplied exponent, the program will do the specified number of iterations
- using the default self-test Mersenne or Fermat-number exponent for that FFT length.
- If -fft is invoked with a user-supplied value of -iters and either the
- -m or -f flag and a user-supplied exponent, the program will do the specified
- number of iterations of either the Lucas-Lehmer test with starting value 4 (-m)
- or the Pe'pin test with starting value 3 (-f) on the user-specified modulus.
- In either of the latter 2 cases, the program will produce a cfg-file entry based
- on the timing results, assuming at least 50% of the available radix sets at the
- given FFT length ran the specified #iters to completion without suffering a fatal
- error of some kind, e.g. excessive roundoff error, mismatching residues versus-
- tabulated, or "No AVX-512 support; Skipping this leading radix" for certain smaller
- leading radices.
- Use this to find the optimal radix set for a single FFT length on your hardware.
- NOTE: If you use other than the default modulus or #iters for such a single-fft-
- length timing test, it is up to you to manually verify that the residues output
- match for all fft radix combinations and that the roundoff errors are reasonable.
- ======================
- [3]: FFT radix-set specification:
- -radset {+int[comma-separated list +int,...,+int]}
- Requires a supported value of -fft {+int}[K|M] to also be specified, and a value of
- -iters. If this argument is invoked, a single-FFT-length and single-set-of-FFT-radices
- timing test is assumed. If a single {+int} argument is supplied, this indicates the
- specific index of a set of complex FFT radices to use, based on the big select table
- in the function get_fft_radices().
- Optionally, the -radset flag can take an actual set of comma-separated FFT radices.
- Said radix set must be one of those present in the aforementioned select table for the
- FFT length in question.
- For example, the following are equivalent, for -fft 6M:
- -radset 1
- -radset 192,16,32,32
- ======================
- [4]: Mersenne-number primality testing:
- -m {+int}
- Performs a Lucas-Lehmer primality test of the Mersenne number M(int) = 2^int - 1,
- where int must be an odd prime. If -iters is also invoked, this indicates a timing test,
- and allows for suitable added arguments (optionally -fft, and if that is invoked,
- optionally, -radset) to be supplied.
- If the -fft option (and optionally -radset) is also invoked but -iters is not, the
- program first checks the first eligible (any line whose first non-whitespace caharcter
- is alphabetic is treated as such) line of the worktodo.ini file to see if the assignment
- specified there is a Lucas-Lehmer test with the same exponent as specified via the -m
- argument. If so, the -fft argument is treated as a user override of the default FFT
- length for the exponent. If -radset is also invoked, this is similarly treated as a user-
- specified radix set for the user-set FFT length; otherwise the program will use the
- mlucas.cfg file to select the radix set to be used for the user-forced FFT length.
- If the first eligible worktodo.ini file entry does not specify an LL-test (i.e. does not
- begin with "Test=" or "DoubleCheck=") of an exponent matching the -m value, a set of
- timing self-tests is run on the user-specified Mersenne number using all sets of FFT
- radices available at the specified FFT length.
- If the -fft option is not invoked, the self-tests use all sets of FFT radices available at
- that exponent's default FFT length. Users can use this to find the optimal radix set for a
- given Mersenne number exponent on their hardware, similarly to the -fft option. Performs
- as many iterations as specified via the -iters flag (which is required if -m is invoked).
- ======================
- [5]: Fermat-number primality testing:
- -f {+int}
- Performs a base-3 Pe'pin test on the Fermat number F(num) = 2^(2^num) + 1.
- If desired this can be invoked together with the -fft option, as for the Mersenne-number
- self-tests (see notes about the -m flag). Note that not all FFT lengths supported for -m
- are available for -f; the supported ones are of form k × 2^n, where k is an odd integer
- in the set [1,7,15,63].
- Optimal radix sets and timings are written to a fermat.cfg file. Performs as many iterations
- as specified via the -iters flag (which is required if -f is invoked).
- ======================
- [6]: Residue shift:
- -shift {+int}
- Number of bits by which to shift the initial seed (= iteration-0 residue). This initial
- shift count is doubled (modulo the binary exponent of the modulus being tested) each
- iteration; for Fermat-number tests the mod-doubling is further augmented by addition of a
- random bit, in order to keep the shift count from landing on 0 after (Fermat-number index)
- iterations and remaining 0. (Cf. https://mersenneforum.org/showthread.php?p=582525#post582525)
- Savefile residues are rightward-shifted by the current shift count
- before being written to the file; thus savefiles contain the unshifted residue, and
- separately the current shift count, which the program uses to leftward-shift the
- savefile residue when the program is restarted from interrupt.
- The shift count is a 32-bit unsigned int; any modulus having > 2^32 bits (thus using an
- FFT length of 256M or larger) requires 0 shift.
- ======================
- [7]: Probable-primality testing mode:
- -prp [(+int)base]
- This flag takes no further arguments, just invokes PRP-test mode for the specified
- Mersenne-number self-test(s). This means that instead of running the rigorous Lucas-Lehmer
- primality test, a "pure-squarings-modified Fermat-PRP test" (cf. next paragraph) is run
- instead, to either a base specified via the optional numeric argument following the -prp
- flag, or to a default base of 3.
- The standard Fermat-PRP test of a number N consists of selecting a base b coprime to N and
- checking whether b^(N-1) == 1 (mod N), in which case N is a probable prime (PRP) to base b.
- For a Mersenne number N = M(p) = 2^p-1, N-1 has a binary representation 2^p-2 = 111...110,
- (p-1) binary ones followed by a single 0. Starting with initial seed x = b - which must be
- coprime to M(p) and further not equal to a power of 2, since due to their binary form all
- M(p) pass a Fermat-PRP test to base 2^k, whether M(p) is prime or not - the standard Fermat-
- PRP test implemented as a left-to-right binary modular powering consisting of (p-2) iterations
- of form x := b*x^2 (a squaring and scalar muliply-by-b, mod M(p)) plus a final mod-squaring
- x := x^2 (mod M(p)), with M(p) being a probable-prime to base b if the result == 1.
- However, the rigorous-as-far-as-we-know "Gerbicz error check" requires a pure sequence
- of mod squarings, thus we replace the standard Fermat-PRP test with a modified one whereby
- we check whether b^(N+1) == b^2 (mod N). For N = M(p) we have N+1 = 2^p, thus this variant
- requires (p-1) pure mod-squarings. Any self-tests done in PRP mode will do the first (iters)
- of these.
- ======================
- [8]: Iteration-number setting:
- -iters {+int}
- Do {+int} self-test iterations of the type determined by the modulus-related options:
- -s/-m means Lucas-Lehmer test iterations with initial seed 4, -f means Pe'pin-test
- squarings with initial seed 3.
- ======================
- [9]: P-1 Factoring:
- [9a]: Setting maximum-percentage of system free-RAM to use for p-1 stage 2 work per instance:
- -maxalloc {+int}
- Maximum percentage of available system RAM to use per instance. Must be >= 10. Default = 90;
- user-specified values greater than 90 are allowed, but only recommended if the program is
- underestimating system free RAM for some reason (use the free-RAM field in the first few
- lines of Linux 'top' output to check this), or if on a system whose amount of RAM allows
- only a modest number of stage 2 buffers (less than 100, say), the default allows a buffer
- count which is just below the next-larger allowable one. The buffer counts usable by Mlucas
- (as of this writing, v20.x.y) are multiples of 24 and of 40; if you consult the table titled
- "With small-prime relocation" inside the long comment block preceding the pm1_bigstep_size()
- function in the pm1.c source file, you will see that the '%modmul' numbers (measured relative
- to stage 2 run with the minimum 24-buffers and without small-prime relocation, which was a
- late-added optimization to v20) for 24 and 40 buffers differ by 6%, which is appreciable.
- Thus if at start of stage 2 you see something like "Available memory allows up to 39
- Stage 2 residue-sized memory buffers. Using 24 Stage 2 memory buffers" in the console output,
- I suggest "kill -9"ing the process and restarting with '-maxalloc 100', seeing if that gives
- you 40 buffers, then just keeping an eye on the 'top' output to check for swapping-to-disc
- once the stage 2 inits have finished and prime-processing starts.
- On the other hand '% modmul' counts for 40 and 48 buffers differ by less than 1%, so
- it's not worth trying to force the higher value to be used. (The next significant gain to
- be had is at 72 = 3*24 buffers.)
- Under MacOS the default is 50% of available (not free) RAM.
- If the system is swapping between RAM and HD/SSD during stage 2, as evidenced by
- free-RAM dropping to near-0 and 'kswapd' entries appearing among the CPU-%-sorted
- of the 'top' output, the value needs to be lowered. If the default is leaving plenty
- of available RAM and the resulting buffer count is under 100, it is worth trying to force
- a higher fraction to be used, by invoking -maxalloc [some value > 50].
- -pm1_s2_nbuf {+int}
- Since available RAM fluctuates depending on current load, this flag alternatively
- allows the user to set the maximum number of p-1 stage 2 buffers to use per instance.
- Currently, the number of stage 2 buffers must be a multiple of 24 or 40; if the user-
- set maximum value is not such, the largest such multiple <= the user-specified value
- is used for stage 2 work. For stage 2 restarts there is an added constraint related
- to small-prime relocation, namely that if stage 2 was begun with a multiple of 24 or
- 40 buffers, the restart-value must also be a multiple of the same base-count, 24 or 40.
- Said constraint will be automatically enforced. If the resulting buffer count exhausts
- available memory, performance will suffer due to system memory-swapping, thus this flag
- should only be invoked by users who know what they are doing.
- Only one of the 2 flags may be set via the command line.
- These 2 flags are only important in the context of stage 2 of p-1 factoring, which
- will be done automatically before a Lucas-Lehmer primality or probable-primality-test
- if the GIMPS assignment in question indicates that some p-1 effort is warranted.
- [9b]: How to run stage 2 continuation for a given stage 1 bound:
- On completion of an initial p-1 run with stage bounds B1 and B2 and a contiguous stage 2
- (i.e. one which covered the interval [B1,B2,b2]), the program leaves the respective p-1 residues
- in a pair of savefiles named p[exp].s1 and p[exp].s2, respectively. The former of these may
- be re-used for one or more deeper stage 2 interval runs, in distributed fashion (multiple
- program instances, each of which processes a given stage 2 interal so as to cover a desired
- expanded prime interval), if desired. Say the original run used a worktodo.ini assignment
- Pminus1={aid,}k,b,n,c,B1,B2
- where {aid} is a 32-hexit assignment ID (which may be omitted or filled with 'n/a' - the
- quotes are for emphasis only, they must not appear in the actual assignment - if the user
- wishes to create such an assignment for him-or-herself). Here, k,n,b,c define the desired
- modulus (for prime-exponent Mersenne number M(p) = 2^p-1, k = 1, n = 2, b = p, c = -1; for
- Fermat number F[m] = 2^(2^m)+1, k = 1, n = 2, b = 2^m, c = +1) and B1 and B2 are the p-1
- stage bounds. If the original used a Pfactor assignment:
- Pfactor={aid},k,b,n,c,TF_BITS,ll_tests_saved_if_factor_found
- then you must read the resulting p-1 stage bounds from the run log file, p[exp].stat .
- Once you have the stage original-run stage bounds in hand, for a desired stage 2 continuation
- interval with bounds [B2_start, B2], the proper assignment syntax is
- Pminus1={aid},k,b,n,c,B1,B2,TF_BITS,B2_start[,known_factors]
- where any known factors should be in form of a comma-separated list of known bookended
- with "", e.g. known factors f1 and f2 appear as "f1,f2". These will not be reported if
- found in any of the GCD steps which test for a factor having been found, and the run
- will only exit if a new factor (one not appearing in the known_factors list) is found.
- ======================
- [10]: Setting threadcount and CPU core affinity:
- Note: As of this writing (v20.x.y) setting core affinity is not effective when running
- on Windows Subsystem for Linux (WSL), presumably due to virtualization. Processes
- will core-hop, negatively impacting efficiency. Also, on WSL, core count is limited
- to 64 due to MS Windows' "processor group" construct. Effectively though, it is less;
- in actuality on a 68-core & x4 HT Xeon Phi 7250 for example, a running Mlucas instance
- and the WSL session in which it runs will occupy at most 16 cores x their 4 hyperthreads,
- or 4 cores x their 4 hyperthreads, depending on which processor group is allocated.
- (Obsolescent - not recommended, please use the -cpu flag instead:)
- -nthread {int}
- For multithread-enabled builds, run with this many threads.
- If the user does not specify a thread count, the default is to run single-threaded
- with that thread's affinity set to logical core 0.
- AFFINITY: The code will attempt to set the affinity of the resulting threads
- 0:n-1 to the same-indexed processor cores - whether this means distinct physical
- cores is entirely up to the CPU vendor - E.g. Intel uses such a numbering scheme
- but AMD does not. For this reason as of v17 this option is deprecated in favor of
- the -cpu flag, whose usage is detailed below, with the online README page providing
- guidance for the core-numbering schemes of popular CPU vendors.
- If n exceeds the available number of logical processor cores (call it #cpu), the
- program will halt with an error message.
- For greater control over affinity setting, use the -cpu option, which supports two
- distinct core-specification syntaxes (which may be mixed together), as follows:
- (Recommended variant of affinity setting:)
- -cpu {lo[:hi[:incr]]}
- (All args {+int} here.) Set thread/CPU affinity. NOTE: This flag and -nthread are
- mutually exclusive! If -cpu is used, the threadcount is inferred from the numeric-argument
- triplet which follows. If only the 'lo' argument of the triplet is supplied, it means
- "run single-threaded with affinity to logical core {lo}." (Note that in the absence
- of hyperthreading, logical and physical cores are the same.)
- If the increment (third) argument of the triplet is omitted, it defaults to incr = 1.
- The CPU set encoded by the integer-triplet argument to -cpu corresponds to the
- values of the integer loop index i in the C-loop for(i = lo; i <= hi; i += incr),
- excluding the loop-exit value of i. Thus '-cpu 0:3' and '-cpu 0:3:1' are both
- exactly equivalent to '-nthread 4', whereas '-cpu 0:6:2' and '-cpu 0:7:2' both
- specify affinity setting to logical cores 0,2,4,6, assuming said cores exist.
- Lastly, note that no whitespace is permitted within the colon-separated numeric field.
- -cpu {triplet0[,triplet1,...]} This is simply an extended version of the above affinity-
- setting syntax in which each of the comma-separated 'triplet' subfields is in the above
- form and, analogously to the one-triplet-only version, no whitespace is permitted within
- the colon-and-comma-separated numeric field. Thus '-cpu 0:3,8:11' and '-cpu 0:3:1,8:11:1'
- both specify an 8-threaded run with affinity set to logical core quartets 0-3 and 8-11,
- whereas '-cpu 0:3:2,8:11:2' means run 4-threaded on cores 0,2,8,10. As described for the
- -nthread option, it is an error for any core index to exceed the available number of logical
- processor cores.
- ======================
- [11]: User control options in mlucas.ini:
- Any mlucas.ini file in the user's run directory will be checked at program (re)start
- for the supported options listed below. Each such option is assumed to be formatted as
- [ws][optname][ws][=][ws][value][ws][comment]
- with [ws] denoting whitespace and [value] a numeric value parseable and representable
- as an IEEE-754 64-bit double-precision float. The numeric value may be followed by anything,
- so long as it does not affect parsing of the preceding numeric entry. For instance, mlucas.ini
- on a low_RAM might contain
- CheckInterval = 10000 /* I'm using the space right of the value for a C-style comment. */
- LowMem = 1 // But one could also use C++ comment format...
- InterimGCD = 0 # Or bash-shell and Python-style comment format...
- LowMem = 2 Or no comment delimiter at all. Note that the program uses the first entry
- matching a given option name, so the second LowMem entry here will be ignored.
- This specifies savefile-writes for LL/PRP/p-1 every 10000 iterations, and allows PRP-testing
- but excludes p-1 stage 2. The InterimGCD option setting is moot in this case since it only
- applies to p-1 stage 2.
- Note that the option-name and comment formats in mlucas.ini differ from those in local.ini -
- the latter is read and updated by the several primenet.py work-management scripts described
- on the online Mlucas README.html file, thus the flag names therein should be all-lowercase
- and comments must conform to Python rules (https://docs.python.org/3/library/configparser.html):
- "Configuration files may include comments, prefixed by specific characters (# and ;).
- Comments may appear on their own in an otherwise empty line, or may be entered in lines
- holding values or section names. In the latter case, they need to be preceded by a
- whitespace character to be recognized as a comment. (For backwards compatibility,
- only ; starts an inline comment, while # does not.)"
- o FREQUENCY OF SAVEFILE WRITES: The default frequency is threadcount-dependent: every 10000
- iterations for <= 4 threads; every 100,000 iterations for more than 4 threads. (Note that
- as of v20, if exponent ratio in the "p[ = ***]/pmax_rec" informational printed at run-start
- is > 0.97, the program will set ITERS_BETWEEN_CHECKPOINTS to the smaller of any user-set
- value or 10000, irrespective of the threadcount.
- At run-start, you will see this captured in the informational terminal output (which gets
- piped to nohup.out if you prefix your instance invocation with "nohup", as is recommended):
- Set affinity for the following 4 cores: 0.1.2.3.
- NTHREADS = 4
- Setting ITERS_BETWEEN_CHECKPOINTS = 10000.
- For p-1 factoring, slightly different terminology is used for .stat-file entries documenting
- savefile-writes: "S1 bit = ..." reflects which bit of the p-1 stage 1 small-prime-powers
- product (whose bitlength is roughly 1.4x the stage 1 bound B1) for stage 1, and "S2 at q = ..."
- reflecting which stage 2 primes have been processed (prime less than and nearest the printed
- value). However, in all three cases the underlying savefile frequency is based on the same
- metric: the number of mod-M(p) multiplies done during the current task. Thus p-1 stage 1
- checkpoints will occur at roughly the same wall-clock frequency as for LL and PRP-test ones;
- the stage 2 savefile-update frequency will be perhaps 10-20% slower, reflecting the fact that
- while LL/PRP/stage-1 all do chains of in-place mod-M(p) "autosquarings", p-1 stage 2 does
- 2-input mod-M(p) multiplies, each of which requires 2x more data to stream between the CPU
- and the cache+memory subsystem.
- The ITERS_BETWEEN_CHECKPOINTS value can be customized by adding a "CheckInterval = [value]"
- line to one's mlucas.ini file, but note that there are constraints on the value related to
- the Gerbicz-checking done for PRP tests. Specifically, the CheckInterval value must be a
- multiple of 1000 and must divide 1 million. Violation of these constraints will trigger an
- assertion-exit if a PRP-test is attempted.
- o RUNNING ON LOW-MEMORY SYSTEMS: The LowMem option provides two supported low-memory run
- modes for low-RAM systems:
- LowMem = 1 allows PRP-testing but excludes p-1 stage 2. For a given exponent, this
- mode will use 2-4x as much working memory for PRP-tests as it does for LL-tests.
- LowMem = 2 excludes both PRP-testing and p-1 stage 2. This is the minimum-memory option.
- For QA-testing purposes, this allows self-tests to some modest number of iterations to
- be done up to the maximum supported FFT length of 512M on systems with 8GB of memory.
- o DISABLING INTERIM GCDs IN P-1 STAGE 2: Set InterimGCD = 0 in mlucas.ini. This will cause
- the program to wait until any p-1 stage 2 is finished to take a GCD (check for a factor),
- irrespective of the depth of the stage.
- ======================
- [12]: Savefile format and creation:
- Mlucas creates savefiles (a.k.a. "checkpoint" files) at regular intervals - cf. section [11]
- for how to override the default value of same - to permit safe restart with in case of a run
- being interrupted for any reason, such as the host machine going down or a program crash. All
- savefile data are stored in bytewise minimum-size and endian-independent form, according to the
- schema implemented in the [read|write]_ppm1_savefiles function pair in the Mlucas.c source file.
- (Cf. the long comment "Set of functions to Read/Write full-length residue data in bytewise format"
- preceding said set of functions.)
- Such savefile writes are reflected in the run logfile (.stat file) latest-progress summary lines.
- o PRP tests save both a test current-residue value and a Gerbicz error check residue, thus are
- roughly twice the size of those for LL-tests and p-1 factoring savefiles.
- o LL, PRP-test and p-1 stage 1 residues are stored in redundant savefile pairs. For work on
- the Mersenne number M[exponent] = 2^[exponent] - 1, these files are named p[exponent] and
- q[exponent]. At the conclusion of a p-1 factoring run, the primary p[exponent] savefile is
- renamed p[exponent].s1. The reason for this is twofold:
- [1] So an ensuing LL or PRP-test of the same exponent (assuming a factor was not found),
- which uses the same-named savefile pair, does not overwrite the p-1 stage 1 data. "Why
- might I want to save those?" you ask. Here's why:
- [2] The stage result in the .s1 file permits optional later stage 2 continuation runs:
- Say you ran p-1 on a low-memory system, such that only stage 1 was run. Later, you install
- more RAM, which allows for more efficient stage 2 work. At that point, it may make sense
- to rerun some exponents to deeper stage 2 bounds. See section [9b] for how to do this.
- o P-1 runs also store a file p[exponent].s1_prod encoding the precomputed small-prime-powers
- product used for the stage 1 left-to-right modular binary powering. This can save time on
- restart for large stage 1 bounds, say B1 = 5 million or larger, since as of this writing I have
- not made a special effort to optimize the computation of said product beyond use of 64-bit
- integer hardware multiply where available. (Speeding things by replacing the O(n^2) "grammar
- school" multiply with a subquadratic one is a possible future enhancement.) For a given
- stage 1 bound B1, the s1_prod file will be roughly 0.2*B1 bytes in size.
- ======================
- [13]: *** DON'Ts ***
- o DON'T omit actually *reading* - not 'skimming' - the latest version of the README.html.
- This is especially important for new users - Mlucas is not a one-or-two-click "do everything
- for me" program. Experienced users will still want to peruse the online readme page,
- especially for details about the latest releases.
- o DON'T skip the post-build self-test step.
- o DON'T run multiple Mlucas instances in a given run directory.
- ======================
- [14]: General troubleshooting:
- Please start by looking for posts about your issue in the
- mersenneforum.org thread specific to the release you are using in the mersenneforum.org
- Mlucas subforum at https://www.mersenneforum.org/forumdisplay.php?f=118.
- If you don't find anything, make a post describing your problem.
- [14a]: How to safely interrupt a running instance:
- Note that halting (as opposed to merely suspending using 'kill -CONT' as described below) an Mlucas
- instance should produce a clean "at end of the current iteration, write savefiles and exit" for LL-test,
- PRP and p-1 stage 1 processing. For p-1 stage 2, the state machine involved is of sufficient complexity
- that I have not (yet) implemented a clean-exit-with-savefile-write: an interrupt will cause you to lose
- whatever work has been done since the last stage 2 savefile (p[expo].s2) write.
- If you need to halt all Mlucas instances running on a system, use 'killall -STOP Mlucas' to merely suspend
- processing and 'killall -CONT Mlucas' to resume; to instead terminate all instances, use 'killall Mlucas'.
- To halt just one or a subset of multiple running instances, use 'pidof Mlucas' to find the process ID (pid).
- If more than one ID comes up and you only want to interrupt or halt one or a subset thereof, you need
- to first figure out which process IDs map to the desired subset. If you used a batch shell script to
- start multiple instances, the resulting process IDs should end up being in ascending numeric order.
- Once you have identified the desired process IDs:
- [1] If you only wish to suspend processing and resume it later, use 'kill -STOP [pid]' to suspend
- and 'kill -CONT [pid]' to resume, where [pid] refers to the process ID. Use "man kill" for detailed
- info regarding the kill command and the flags it takes - note that it is not Mlucas but rather the
- OS which listens for this particular pair of signal types.
- [2] If you wish to kill the process(es), use 'Ctrl-c' for a foreground Mlucas process, which sends
- a SIGINT signal. For a background process, use 'kill [pid]' -- the default signal for kill is SIGTERM,
- which is another one of the signal types the program listens for. To kill all Mlucas instances running
- on a given system, use 'killall Mlucas'.
- If for any reason you end up with an Mlucas instance which is not responding to the above kinds of
- interrupt signal, use the "kill it with fire" option, 'kill -9 [pid]'. If it's a run which you want
- to resume at some later time, you can minimize lost runtime by waiting until the current iteration
- interval completes, i.e. until the latest savefile update occurs.
- For the current list of signal types which the program listens for, see the sig_handler() function near
- the top of Mlucas.c . As of this writing (v20.1.1), the program listens for INT,TERM,HUP,ALRM,USR1 and USR2.
- ======================
- Last updated: 28 Nov 2021
|