vqsort3.scm 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262
  1. ;;; The SRFI-32 sort package -- three-way quick sort -*- Scheme -*-
  2. ;;; Copyright (c) 2002 by Olin Shivers.
  3. ;;; This code is open-source; see the end of the file for porting and
  4. ;;; more copyright information.
  5. ;;; Olin Shivers 2002/7.
  6. ;;; (quick-sort3! c v [start end]) -> unspecific
  7. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
  8. ;;; Sort vector V[start,end) using three-way comparison function C:
  9. ;;; (c x y) < 0 => x<y
  10. ;;; (c x y) = 0 => x=y
  11. ;;; (c x y) > 0 => x>y
  12. ;;; That is, C acts as a sort of "subtraction" procedure; using - for the
  13. ;;; comparison function will cause numbers to be sorted into increasing order.
  14. ;;;
  15. ;;; This algorithm is more efficient than standard, two-way quicksort if there
  16. ;;; are many duplicate items in the data set and the comparison function is
  17. ;;; relatively expensive (e.g., comparing large strings). It is due to Jon
  18. ;;; Bentley & Doug McIlroy; I learned it from Bentley.
  19. ;;;
  20. ;;; The algorithm is a standard quicksort, but the partition loop is fancier,
  21. ;;; arranging the vector into a left part that is <, a middle region that is
  22. ;;; =, and a right part that is > the pivot. Here's how it is done:
  23. ;;; The partition loop divides the range being partitioned into five
  24. ;;; subranges:
  25. ;;; =======<<<<<<<<<?????????>>>>>>>=======
  26. ;;; where = marks a value that is equal the pivot, < marks a value that
  27. ;;; is less than the pivot, ? marks a value that hasn't been scanned, and
  28. ;;; > marks a value that is greater than the pivot. Let's consider the
  29. ;;; left-to-right scan. If it checks a ? value that is <, it keeps scanning.
  30. ;;; If the ? value is >, we stop the scan -- we are ready to start the
  31. ;;; right-to-left scan and then do a swap. But if the rightward scan checks
  32. ;;; a ? value that is =, we swap it *down* to the end of the initial chunk
  33. ;;; of ====='s -- we exchange it with the leftmost < value -- and then
  34. ;;; continue our rightward scan. The leftwards scan works in a similar
  35. ;;; fashion, scanning past > elements, stopping on a < element, and swapping
  36. ;;; up = elements. When we are done, we have a picture like this
  37. ;;; ========<<<<<<<<<<<<>>>>>>>>>>=========
  38. ;;; Then swap the = elements up into the middle of the vector to get
  39. ;;; this:
  40. ;;; <<<<<<<<<<<<=================>>>>>>>>>>
  41. ;;; Then recurse on the <'s and >'s. Work out all the tricky little
  42. ;;; boundary cases, and you're done.
  43. ;;;
  44. ;;; Other tricks that make this implementation industrial strength:
  45. ;;; - This quicksort makes some effort to pick the pivot well -- it uses the
  46. ;;; median of three elements as the partition pivot, so pathological n^2
  47. ;;; run time is much rarer (but not eliminated completely). If you really
  48. ;;; wanted to get fancy, you could use a random number generator to choose
  49. ;;; pivots. The key to this trick is that you only need to pick one random
  50. ;;; number for each *level* of recursion -- i.e. you only need (lg n) random
  51. ;;; numbers.
  52. ;;;
  53. ;;; - After the partition, we *recurse* on the smaller of the two pending
  54. ;;; regions, then *tail-recurse* (iterate) on the larger one. This guarantees
  55. ;;; we use no more than lg(n) stack frames, worst case.
  56. ;;;
  57. ;;; - There are two ways to finish off the sort.
  58. ;;; A. Recurse down to regions of size 10, then sort each such region using
  59. ;;; insertion sort.
  60. ;;; B. Recurse down to regions of size 10, then sort *the entire vector*
  61. ;;; using insertion sort.
  62. ;;; We do A. Each choice has a cost. Choice A has more overhead to invoke
  63. ;;; all the separate insertion sorts -- choice B only calls insertion sort
  64. ;;; once. But choice B will call the comparison function *more times* --
  65. ;;; it will unnecessarily compare elt 9 of one segment to elt 0 of the
  66. ;;; following segment. The overhead of choice A is linear in the length
  67. ;;; of the vector, but *otherwise independent of the algorithm's parameters*.
  68. ;;; I.e., it's a *fixed*, *small* constant factor. The cost of the extra
  69. ;;; comparisons made by choice B, however, is dependent on an externality:
  70. ;;; the comparison function passed in by the client. This can be made
  71. ;;; arbitrarily bad -- that is, the constant factor *isn't* fixed by the
  72. ;;; sort algorithm; instead, it's determined by the comparison function.
  73. ;;; If your comparison function is very, very slow, you want to eliminate
  74. ;;; every single one that you can. Choice A limits the potential badness,
  75. ;;; so that is what we do.
  76. (define (vector-quick-sort3! c v . maybe-start+end)
  77. (call-with-values
  78. (lambda () (vector-start+end v maybe-start+end))
  79. (lambda (start end)
  80. (%quick-sort3! c v start end))))
  81. (define (vector-quick-sort3 c v . maybe-start+end)
  82. (call-with-values
  83. (lambda () (vector-start+end v maybe-start+end))
  84. (lambda (start end)
  85. (let ((ans (make-vector (- end start))))
  86. (vector-portion-copy! ans v start end)
  87. (%quick-sort3! c ans 0 (- end start))
  88. ans))))
  89. ;;; %QUICK-SORT3! is not exported.
  90. ;;; Preconditions:
  91. ;;; V vector
  92. ;;; START END fixnums
  93. ;;; 0 <= START, END <= (vector-length V)
  94. ;;; If these preconditions are ensured by the cover functions, you
  95. ;;; can safely change this code to use unsafe fixnum arithmetic and vector
  96. ;;; indexing ops, for *huge* speedup.
  97. ;;;
  98. ;;; We bail out to insertion sort for small ranges; feel free to tune the
  99. ;;; crossover -- it's just a random guess. If you don't have the insertion
  100. ;;; sort routine, just kill that branch of the IF and change the recursion
  101. ;;; test to (< 1 (- r l)) -- the code is set up to work that way.
  102. (define (%quick-sort3! c v start end)
  103. (define (swap l r n) ; Little utility -- swap the N
  104. (if (> n 0)
  105. (let ((x (vector-ref v l)) ; outer pairs of the range [l,r).
  106. (r-1 (- r 1)))
  107. (vector-set! v l (vector-ref v r-1))
  108. (vector-set! v r-1 x)
  109. (swap (+ l 1) r-1 (- n 1)))))
  110. (define (sort3 v1 v2 v3)
  111. (call-with-values
  112. (lambda () (if (< (c v1 v2) 0) (values v1 v2) (values v2 v1)))
  113. (lambda (little big)
  114. (if (< (c big v3) 0)
  115. (values little big v3)
  116. (if (< (c little v3) 0)
  117. (values little v3 big)
  118. (values v3 little big))))))
  119. (define (elt< v1 v2)
  120. (negative? (c v1 v2)))
  121. (let recur ((l start) (r end)) ; Sort the range [l,r).
  122. (if (< 10 (- r l)) ; 10: the gospel according to Sedgewick.
  123. ;; Choose the median of V[l], V[r-1], and V[middle] for the pivot.
  124. ;; We do this by sorting these three elts; call the results LO, PIVOT
  125. ;; & HI. Put LO, PIVOT & HI where they should go in the vector. We
  126. ;; will kick off the partition step with one elt (PIVOT) in the left=
  127. ;; range, one elt (LO) in the < range, one elt (HI) in in the > range
  128. ;; & no elts in the right= range.
  129. (let* ((r-1 (- r 1)) ; Three handy
  130. (mid (quotient (+ l r) 2)) ; common
  131. (l+1 (+ l 1)) ; subexpressions
  132. (pivot (call-with-values
  133. (lambda ()
  134. (sort3 (vector-ref v l)
  135. (vector-ref v mid)
  136. (vector-ref v r-1)))
  137. (lambda (lo piv hi)
  138. (let ((tmp (vector-ref v l+1))) ; Put LO, PIV & HI
  139. (vector-set! v l piv) ; back into V
  140. (vector-set! v r-1 hi) ; where they belong,
  141. (vector-set! v l+1 lo)
  142. (vector-set! v mid tmp)
  143. piv))))) ; and return PIV as pivot.
  144. ;; Everything in these loops is driven by the invariants expressed
  145. ;; in the little pictures, the corresponding l,i,j,k,m,r indices,
  146. ;; & the associated ranges.
  147. ;; =======<<<<<<<<<?????????>>>>>>>======= (picture)
  148. ;; l i j k m r (indices)
  149. ;; [l,i) [i,j) [j,k] (k,m] (m,r) (ranges )
  150. (letrec ((lscan (lambda (i j k m) ; left-to-right scan
  151. (let lp ((i i) (j j))
  152. (if (> j k)
  153. (done i j m)
  154. (let* ((x (vector-ref v j))
  155. (sign (c x pivot)))
  156. (cond ((< sign 0) (lp i (+ j 1)))
  157. ((= sign 0)
  158. (if (< i j)
  159. (begin (vector-set! v j (vector-ref v i))
  160. (vector-set! v i x)))
  161. (lp (+ i 1) (+ j 1)))
  162. ((> sign 0) (rscan i j k m))))))))
  163. ;; =======<<<<<<<<<>????????>>>>>>>=======
  164. ;; l i j k m r
  165. ;; [l,i) [i,j) j (j,k] (k,m] (m,r)
  166. (rscan (lambda (i j k m) ; right-to-left scan
  167. (let lp ((k k) (m m))
  168. (if (<= k j)
  169. (done i j m)
  170. (let* ((x (vector-ref v k))
  171. (sign (c x pivot)))
  172. (cond ((> sign 0) (lp (- k 1) m))
  173. ((= sign 0)
  174. (if (< k m)
  175. (begin (vector-set! v k (vector-ref v m))
  176. (vector-set! v m x)))
  177. (lp (- k 1) (- m 1)))
  178. ((< sign 0) ; Swap j & k & lscan.
  179. (vector-set! v k (vector-ref v j))
  180. (vector-set! v j x)
  181. (lscan i (+ j 1) (- k 1) m))))))))
  182. ;; =======<<<<<<<<<<<<<>>>>>>>>>>>=======
  183. ;; l i j m r
  184. ;; [l,i) [i,j) [j,m] (m,r)
  185. (done (lambda (i j m)
  186. (let ((num< (- j i))
  187. (num> (+ 1 (- m j)))
  188. (num=l (- i l))
  189. (num=r (- (- r m) 1)))
  190. (swap l j (min num< num=l)) ; Swap ='s into
  191. (swap j r (min num> num=r)) ; the middle.
  192. ;; Recur on the <'s and >'s. Recurring on the
  193. ;; smaller range and iterating on the bigger
  194. ;; range ensures O(lg n) stack frames, worst case.
  195. (cond ((<= num< num>)
  196. (recur l (+ l num<))
  197. (recur (- r num>) r))
  198. (else
  199. (recur (- r num>) r)
  200. (recur l (+ l num<))))))))
  201. ;; To repeat: We kick off the partition step with one elt (PIVOT)
  202. ;; in the left= range, one elt (LO) in the < range, one elt (HI)
  203. ;; in the > range & no elts in the right= range.
  204. (lscan l+1 (+ l 2) (- r 2) r-1)))
  205. ;; Small segment => punt to insert sort.
  206. ;; Use the dangerous subprimitive.
  207. (%vector-insert-sort! elt< v l r))))
  208. ;;; Copyright
  209. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
  210. ;;; This code is
  211. ;;; Copyright (c) 1998 by Olin Shivers.
  212. ;;; The terms are: You may do as you please with this code, as long as
  213. ;;; you do not delete this notice or hold me responsible for any outcome
  214. ;;; related to its use.
  215. ;;;
  216. ;;; Blah blah blah.
  217. ;;; Code tuning & porting
  218. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
  219. ;;; - The quicksort recursion bottoms out in a call to an insertion sort
  220. ;;; routine, %INSERT-SORT!. But you could even punt this and go with pure
  221. ;;; recursion in a pinch.
  222. ;;;
  223. ;;; This code is *tightly* bummed as far as I can go in portable Scheme.
  224. ;;;
  225. ;;; The internal primitive %QUICK-SORT! that does the real work can be
  226. ;;; converted to use unsafe vector-indexing and fixnum-specific arithmetic ops
  227. ;;; *if* you alter the two small cover functions to enforce the invariants.
  228. ;;; This should provide *big* speedups. In fact, all the code bumming I've
  229. ;;; done pretty much disappears in the noise unless you have a good compiler
  230. ;;; and also can dump the vector-index checks and generic arithmetic -- so
  231. ;;; I've really just set things up for you to exploit.
  232. ;;;
  233. ;;; The optional-arg parsing, defaulting, and error checking is done with a
  234. ;;; portable R4RS macro. But if your Scheme has a faster mechanism (e.g.,
  235. ;;; Chez), you should definitely port over to it. Note that argument defaulting
  236. ;;; and error-checking are interleaved -- you don't have to error-check
  237. ;;; defaulted START/END args to see if they are fixnums that are legal vector
  238. ;;; indices for the corresponding vector, etc.