Library to construct a confusion matrix and retrieve statistical information from it. https://rubygems.org/gems/confusion_matrix

Peter Lane 3cb11e7d8d Updated demo.rb to use default label 1 year ago
examples 3cb11e7d8d Updated demo.rb to use default label 1 year ago
lib 366b9d0f14 all methods default to positive label, when labels pre-defined 1 year ago
test 366b9d0f14 all methods default to positive label, when labels pre-defined 1 year ago
LICENSE.rdoc baef4eb538 renamed gem with _ instead of - 1 year ago
README.rdoc 3cb11e7d8d Updated demo.rb to use default label 1 year ago
confusion_matrix.gemspec baef4eb538 renamed gem with _ instead of - 1 year ago
rakefile.rb 8c14c23bad Created first version 3 years ago

README.rdoc

= Confusion Matrix

Install from {RubyGems}[https://rubygems.org/gems/confusion_matrix/]:

> gem install confusion_matrix

source:: https://notabug.org/peterlane/confusion-matrix-ruby/

== Description

A confusion matrix is used in data-mining as a summary of the performance of a
classification algorithm. Each row represents the _actual_ class of an
instance, and each column represents the _predicted_ class of that instance,
i.e. the class that they were classified as. Numbers at each (row, column)
reflect the total number of instances of actual class "row" which were
predicted to fall in class "column".

A two-class example is:

Classified Classified |
Positive Negative | Actual
------------------------------+------------
a b | Positive
c d | Negative

Here the value:

a:: is the number of true positives (those labelled positive and classified positive)
b:: is the number of false negatives (those labelled positive but classified negative)
c:: is the number of false positives (those labelled negative but classified positive)
d:: is the number of true negatives (those labelled negative and classified negative)

From this table we can calculate statistics like:

true_positive_rate:: a/(a+b)
positive recall:: a/(a+c)

The implementation supports confusion matrices with more than two
classes, and hence most statistics are calculated with reference to a
named class. When more than two classes are in use, the statistics
are calculated as if the named class were positive and all the other
classes are grouped as if negative.

For example, in a three-class example:

Classified Classified Classified |
Red Blue Green | Actual
--------------------------------------------+------------
a b c | Red
d e f | Blue
g h i | Green

We can calculate:

true_red_rate:: a/(a+b+c)
red recall:: a/(a+d+g)

== Example

The following example creates a simple two-class confusion matrix,
prints a few statistics and displays the table.

require 'confusion_matrix'

cm = ConfusionMatrix.new :pos, :neg
cm.add_for(:pos, :pos, 10)
3.times { cm.add_for(:pos, :neg) }
20.times { cm.add_for(:neg, :neg) }
5.times { cm.add_for(:neg, :pos) }

puts "Precision: #{cm.precision}"
puts "Recall: #{cm.recall}"
puts "MCC: #{cm.matthews_correlation}"
puts
puts(cm.to_s)

Output:

Precision: 0.6666666666666666
Recall: 0.7692307692307693
MCC: 0.5524850114241865

Predicted |
pos neg | Actual
----------+-------
10 3 | pos
5 20 | neg