One Rule (OneR) Machine Learning Classification in under One Minute


Here I give a very short introduction on how to use the OneR Machine Learning package for the hurried, so buckle up!

You can also watch this video which goes through the following example step-by-step:

After installing the OneR package from CRAN load it

library(OneR)

Use the famous Iris dataset and determine optimal bins for numeric data:

data <- optbin(iris)

Build a model with the best predictor:

model <- OneR(data, verbose = TRUE)
## 
##     Attribute    Accuracy
## 1 * Petal.Width  96%     
## 2   Petal.Length 95.33%  
## 3   Sepal.Length 74.67%  
## 4   Sepal.Width  55.33%  
## ---
## Chosen attribute due to accuracy
## and ties method (if applicable): '*'

Show the learned rules and model diagnostics:

summary(model)
## 
## Call:
## OneR(data = data, verbose = TRUE)
## 
## Rules:
## If Petal.Width = (0.0976,0.791] then Species = setosa
## If Petal.Width = (0.791,1.63]   then Species = versicolor
## If Petal.Width = (1.63,2.5]     then Species = virginica
## 
## Accuracy:
## 144 of 150 instances classified correctly (96%)
## 
## Contingency table:
##             Petal.Width
## Species      (0.0976,0.791] (0.791,1.63] (1.63,2.5] Sum
##   setosa               * 50            0          0  50
##   versicolor              0         * 48          2  50
##   virginica               0            4       * 46  50
##   Sum                    50           52         48 150
## ---
## Maximum in each column: '*'
## 
## Pearson's Chi-squared test:
## X-squared = 266.35, df = 4, p-value < 2.2e-16

Plot model diagnostics:

plot(model)

Use the model to predict data:

prediction <- predict(model, data)

Evaluate the prediction statistics:

eval_model(prediction, data)
## 
## Confusion matrix (absolute):
##             Actual
## Prediction   setosa versicolor virginica Sum
##   setosa         50          0         0  50
##   versicolor      0         48         4  52
##   virginica       0          2        46  48
##   Sum            50         50        50 150
## 
## Confusion matrix (relative):
##             Actual
## Prediction   setosa versicolor virginica  Sum
##   setosa       0.33       0.00      0.00 0.33
##   versicolor   0.00       0.32      0.03 0.35
##   virginica    0.00       0.01      0.31 0.32
##   Sum          0.33       0.33      0.33 1.00
## 
## Accuracy:
## 0.96 (144/150)
## 
## Error rate:
## 0.04 (6/150)
## 
## Error rate reduction (vs. base rate):
## 0.94 (p-value < 2.2e-16)

Please note that the very good accuracy of 96% is reached effortlessly.

“Petal.Width” is identified as the attribute with the highest predictive value. The cut points of the intervals are found automatically (via the included optbin function). The results are three very simple, yet accurate, rules to predict the respective species.

The nearly perfect separation of the areas in the diagnostic plot gives a good indication of the model’s ability to separate the different species.

The whole code of this post:

library(OneR)
data <- optbin(iris)
model <- OneR(data, verbose = TRUE)
summary(model)
plot(model)
prediction <- predict(model, data)
eval_model(prediction, data)

More sophisticated examples will follow in upcoming posts… so stay tuned!

Update: For more examples see e.g. this post: OneR – Fascinating Insights through Simple Rules

I would love to hear about your experiences with the OneR package. Please drop a line or two in the comments – Thank you!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.