This repository is supposed to contain all my GNU Guile or Scheme machine learning algorithm implementations.
zelphir.kaltstahl 459e0564a1 remove todo, usage of car is OK here | преди 4 години | |
---|---|---|
old-racket-code | преди 5 години | |
scripts | преди 5 години | |
test | преди 4 години | |
utils | преди 4 години | |
.gitignore | преди 5 години | |
LICENSE | преди 7 години | |
README.org | преди 5 години | |
columns.csv | преди 7 години | |
data-point.scm | преди 5 години | |
data_banknote_authentication.csv | преди 7 години | |
dataset.scm | преди 4 години | |
decision-tree.scm | преди 4 години | |
metrics.scm | преди 5 години | |
notes.org | преди 4 години | |
prediction.scm | преди 5 години | |
pruning.scm | преди 4 години | |
split-quality-measure.scm | преди 5 години | |
todo.org | преди 4 години | |
tree.scm | преди 4 години | |
utils.scm | преди 5 години |
You can run the tests by running the script run-tests.bash
in the scripts/
directory as follows:
# from the root directory of this project:
bash scripts/run-tests.bash
This example is outdated and still for the older Racket code.
(define shuffled-dataset (shuffle dataset))
(define small-dataset
(data-range shuffled-dataset
0
;; take only a fifth of the data to make this example run faster
(exact-floor (/ (dataset-length shuffled-dataset)
5))))
;; be sure to collect all garbage, apparently this should be called thrice
(collect-garbage)
(collect-garbage)
(collect-garbage)
;; requires a ~time~ macro
(time
;; ~for/list~ -- a Racketism, needs to be rewritten
(for/list ([i (in-range 1)])
(mean
(evaluate-algorithm #:dataset (shuffle dataset)
#:n-folds 10
#:feature-column-indices (list 0 1 2 3)
#:label-column-index 4
#:max-depth 5
#:min-data-points 24
#:min-data-points-ratio 0.02
#:min-impurity-split (expt 10 -7)
#:stop-at-no-impurity-improvement #t
#:random-seed 0))))
;; be sure to collect all garbage, apparently this should be called thrice
(collect-garbage)
(collect-garbage)
(collect-garbage)
(time
;; ~for/list~ -- a Racketism, needs to be rewritten
(for/list ([i (in-range 1)])
;; run with the whole dataset as an example, no random seed
(define tree (fit #:train-data dataset
#:feature-column-indices (list 0 1 2 3)
#:label-column-index 4
#:max-depth 5
#:min-data-points 12
#:min-data-points-ratio 0.02
#:min-impurity-split (expt 10 -7)
#:stop-at-no-impurity-improvement #t))
'done))