-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnaiveBayes_PredictingBugCovering_fromRanking.Rmd
70 lines (53 loc) · 2.1 KB
/
naiveBayes_PredictingBugCovering_fromRanking.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
title: "Naive Bayes Classifier - Predicting Bug Covering by Ranking"
author: "Christian Medeiros Adriano"
date: "August 9, 2017"
output: html_document
---
```{r setup, include=FALSE}
source("C://Users//chris//OneDrive//Documentos//GitHub//ML_VotingAggregation//aggregateAnswerOptionsPerQuestion.R");
summaryTable <- runMain();
library(class);
library(gmodels);
library(caret);
library(e1071)
```
##The goal of the study
<p>
Evaluate the bug prediction based on a ranking YES votes. For more detailed explanation, please see the
<a href="http://rpubs.com/christian_adriano/knn_cv_ranking_buggy_codefragments">previous analysis</a> </p>
##Data preparation
<p>I need to guarantee that some examples (i.e., failing methods)
do not dominate the training or testing sets. To do that, I need to get a
close to equal proportion of examples in both sets. I do that by
scrambling the data.</p>
```{r dataprep}
set.seed(9850);
g<- runif((nrow(summaryTable))); #generates a random distribution
summaryTable <- summaryTable[order(g),];#reorder the rows based on a random index
```
```{r naiveBayes.data.ranking, include=FALSE}
sub = sample(nrow(summaryTable), floor(nrow(summaryTable) * 1))
train = summaryTable[sub,]
test = summaryTable[-sub,]
xTrain = train[,"rankingVote"];
yTrain = as.factor(train$bugCovering);
xTest = test[,"rankingVote"]
yTest = as.factor(test$bugCovering);
xS = summaryTable[,"rankingVote"]
yS = data.frame(summaryTable[,"bugCovering"]);
```
# Build the model
<i> nb.model = train(xTrain,yTrain,'nb',trControl=trainControl(method='cv',number=10));</i>
```{r naiveBayes.model.ranking, include=FALSE}
nb.model = train(xTrain,yTrain,'nb',trControl=trainControl(method='cv',number=10));
```
# Test the model
```{r test.knn.cv.class, echo=FALSE}
nb.pred<-predict(nb.model$finalModel,xS);
nb.pred.df <- data.frame(nb.pred);
confusionMatrix(nb.pred.df$class,yS$bugCovering);
```
## Conclusions
<p>Compared with k-nearest neighbor, Naive Bayes produced more false negative than the k-nearest neighbor with cross validation from package CLASS. Naive Baives produced the same results as KNN from CARET package. </p>
<br><br>