123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293 |
- gitdiffcore(7)
- ==============
- NAME
- ----
- gitdiffcore - Tweaking diff output
- SYNOPSIS
- --------
- [verse]
- 'git diff' *
- DESCRIPTION
- -----------
- The diff commands 'git diff-index', 'git diff-files', and 'git diff-tree'
- can be told to manipulate differences they find in
- unconventional ways before showing 'diff' output. The manipulation
- is collectively called "diffcore transformation". This short note
- describes what they are and how to use them to produce 'diff' output
- that is easier to understand than the conventional kind.
- The chain of operation
- ----------------------
- The 'git diff-{asterisk}' family works by first comparing two sets of
- files:
- - 'git diff-index' compares contents of a "tree" object and the
- working directory (when `--cached` flag is not used) or a
- "tree" object and the index file (when `--cached` flag is
- used);
- - 'git diff-files' compares contents of the index file and the
- working directory;
- - 'git diff-tree' compares contents of two "tree" objects;
- In all of these cases, the commands themselves first optionally limit
- the two sets of files by any pathspecs given on their command-lines,
- and compare corresponding paths in the two resulting sets of files.
- The pathspecs are used to limit the world diff operates in. They remove
- the filepairs outside the specified sets of pathnames. E.g. If the
- input set of filepairs included:
- ------------------------------------------------
- :100644 100644 bcd1234... 0123456... M junkfile
- ------------------------------------------------
- but the command invocation was `git diff-files myfile`, then the
- junkfile entry would be removed from the list because only "myfile"
- is under consideration.
- The result of comparison is passed from these commands to what is
- internally called "diffcore", in a format similar to what is output
- when the -p option is not used. E.g.
- ------------------------------------------------
- in-place edit :100644 100644 bcd1234... 0123456... M file0
- create :000000 100644 0000000... 1234567... A file4
- delete :100644 000000 1234567... 0000000... D file5
- unmerged :000000 000000 0000000... 0000000... U file6
- ------------------------------------------------
- The diffcore mechanism is fed a list of such comparison results
- (each of which is called "filepair", although at this point each
- of them talks about a single file), and transforms such a list
- into another list. There are currently 5 such transformations:
- - diffcore-break
- - diffcore-rename
- - diffcore-merge-broken
- - diffcore-pickaxe
- - diffcore-order
- These are applied in sequence. The set of filepairs 'git diff-{asterisk}'
- commands find are used as the input to diffcore-break, and
- the output from diffcore-break is used as the input to the
- next transformation. The final result is then passed to the
- output routine and generates either diff-raw format (see Output
- format sections of the manual for 'git diff-{asterisk}' commands) or
- diff-patch format.
- diffcore-break: For Splitting Up Complete Rewrites
- --------------------------------------------------
- The second transformation in the chain is diffcore-break, and is
- controlled by the -B option to the 'git diff-{asterisk}' commands. This is
- used to detect a filepair that represents "complete rewrite" and
- break such filepair into two filepairs that represent delete and
- create. E.g. If the input contained this filepair:
- ------------------------------------------------
- :100644 100644 bcd1234... 0123456... M file0
- ------------------------------------------------
- and if it detects that the file "file0" is completely rewritten,
- it changes it to:
- ------------------------------------------------
- :100644 000000 bcd1234... 0000000... D file0
- :000000 100644 0000000... 0123456... A file0
- ------------------------------------------------
- For the purpose of breaking a filepair, diffcore-break examines
- the extent of changes between the contents of the files before
- and after modification (i.e. the contents that have "bcd1234..."
- and "0123456..." as their SHA-1 content ID, in the above
- example). The amount of deletion of original contents and
- insertion of new material are added together, and if it exceeds
- the "break score", the filepair is broken into two. The break
- score defaults to 50% of the size of the smaller of the original
- and the result (i.e. if the edit shrinks the file, the size of
- the result is used; if the edit lengthens the file, the size of
- the original is used), and can be customized by giving a number
- after "-B" option (e.g. "-B75" to tell it to use 75%).
- diffcore-rename: For Detecting Renames and Copies
- -------------------------------------------------
- This transformation is used to detect renames and copies, and is
- controlled by the -M option (to detect renames) and the -C option
- (to detect copies as well) to the 'git diff-{asterisk}' commands. If the
- input contained these filepairs:
- ------------------------------------------------
- :100644 000000 0123456... 0000000... D fileX
- :000000 100644 0000000... 0123456... A file0
- ------------------------------------------------
- and the contents of the deleted file fileX is similar enough to
- the contents of the created file file0, then rename detection
- merges these filepairs and creates:
- ------------------------------------------------
- :100644 100644 0123456... 0123456... R100 fileX file0
- ------------------------------------------------
- When the "-C" option is used, the original contents of modified files,
- and deleted files (and also unmodified files, if the
- "--find-copies-harder" option is used) are considered as candidates
- of the source files in rename/copy operation. If the input were like
- these filepairs, that talk about a modified file fileY and a newly
- created file file0:
- ------------------------------------------------
- :100644 100644 0123456... 1234567... M fileY
- :000000 100644 0000000... bcd3456... A file0
- ------------------------------------------------
- the original contents of fileY and the resulting contents of
- file0 are compared, and if they are similar enough, they are
- changed to:
- ------------------------------------------------
- :100644 100644 0123456... 1234567... M fileY
- :100644 100644 0123456... bcd3456... C100 fileY file0
- ------------------------------------------------
- In both rename and copy detection, the same "extent of changes"
- algorithm used in diffcore-break is used to determine if two
- files are "similar enough", and can be customized to use
- a similarity score different from the default of 50% by giving a
- number after the "-M" or "-C" option (e.g. "-M8" to tell it to use
- 8/10 = 80%).
- Note. When the "-C" option is used with `--find-copies-harder`
- option, 'git diff-{asterisk}' commands feed unmodified filepairs to
- diffcore mechanism as well as modified ones. This lets the copy
- detector consider unmodified files as copy source candidates at
- the expense of making it slower. Without `--find-copies-harder`,
- 'git diff-{asterisk}' commands can detect copies only if the file that was
- copied happened to have been modified in the same changeset.
- diffcore-merge-broken: For Putting Complete Rewrites Back Together
- ------------------------------------------------------------------
- This transformation is used to merge filepairs broken by
- diffcore-break, and not transformed into rename/copy by
- diffcore-rename, back into a single modification. This always
- runs when diffcore-break is used.
- For the purpose of merging broken filepairs back, it uses a
- different "extent of changes" computation from the ones used by
- diffcore-break and diffcore-rename. It counts only the deletion
- from the original, and does not count insertion. If you removed
- only 10 lines from a 100-line document, even if you added 910
- new lines to make a new 1000-line document, you did not do a
- complete rewrite. diffcore-break breaks such a case in order to
- help diffcore-rename to consider such filepairs as candidate of
- rename/copy detection, but if filepairs broken that way were not
- matched with other filepairs to create rename/copy, then this
- transformation merges them back into the original
- "modification".
- The "extent of changes" parameter can be tweaked from the
- default 80% (that is, unless more than 80% of the original
- material is deleted, the broken pairs are merged back into a
- single modification) by giving a second number to -B option,
- like these:
- * -B50/60 (give 50% "break score" to diffcore-break, use 60%
- for diffcore-merge-broken).
- * -B/60 (the same as above, since diffcore-break defaults to 50%).
- Note that earlier implementation left a broken pair as a separate
- creation and deletion patches. This was an unnecessary hack and
- the latest implementation always merges all the broken pairs
- back into modifications, but the resulting patch output is
- formatted differently for easier review in case of such
- a complete rewrite by showing the entire contents of old version
- prefixed with '-', followed by the entire contents of new
- version prefixed with '+'.
- diffcore-pickaxe: For Detecting Addition/Deletion of Specified String
- ---------------------------------------------------------------------
- This transformation limits the set of filepairs to those that change
- specified strings between the preimage and the postimage in a certain
- way. -S<block of text> and -G<regular expression> options are used to
- specify different ways these strings are sought.
- "-S<block of text>" detects filepairs whose preimage and postimage
- have different number of occurrences of the specified block of text.
- By definition, it will not detect in-file moves. Also, when a
- changeset moves a file wholesale without affecting the interesting
- string, diffcore-rename kicks in as usual, and `-S` omits the filepair
- (since the number of occurrences of that string didn't change in that
- rename-detected filepair). When used with `--pickaxe-regex`, treat
- the <block of text> as an extended POSIX regular expression to match,
- instead of a literal string.
- "-G<regular expression>" (mnemonic: grep) detects filepairs whose
- textual diff has an added or a deleted line that matches the given
- regular expression. This means that it will detect in-file (or what
- rename-detection considers the same file) moves, which is noise. The
- implementation runs diff twice and greps, and this can be quite
- expensive. To speed things up binary files without textconv filters
- will be ignored.
- When `-S` or `-G` are used without `--pickaxe-all`, only filepairs
- that match their respective criterion are kept in the output. When
- `--pickaxe-all` is used, if even one filepair matches their respective
- criterion in a changeset, the entire changeset is kept. This behavior
- is designed to make reviewing changes in the context of the whole
- changeset easier.
- diffcore-order: For Sorting the Output Based on Filenames
- ---------------------------------------------------------
- This is used to reorder the filepairs according to the user's
- (or project's) taste, and is controlled by the -O option to the
- 'git diff-{asterisk}' commands.
- This takes a text file each of whose lines is a shell glob
- pattern. Filepairs that match a glob pattern on an earlier line
- in the file are output before ones that match a later line, and
- filepairs that do not match any glob pattern are output last.
- As an example, a typical orderfile for the core Git probably
- would look like this:
- ------------------------------------------------
- README
- Makefile
- Documentation
- *.h
- *.c
- t
- ------------------------------------------------
- SEE ALSO
- --------
- linkgit:git-diff[1],
- linkgit:git-diff-files[1],
- linkgit:git-diff-index[1],
- linkgit:git-diff-tree[1],
- linkgit:git-format-patch[1],
- linkgit:git-log[1],
- linkgit:gitglossary[7],
- link:user-manual.html[The Git User's Manual]
- GIT
- ---
- Part of the linkgit:git[1] suite
|