{"id":6109,"date":"2023-02-27T16:29:00","date_gmt":"2023-02-27T16:29:00","guid":{"rendered":"https:\/\/www.goodacademic.com\/blog\/questions\/train-analysis\/"},"modified":"2023-02-27T16:29:00","modified_gmt":"2023-02-27T16:29:00","slug":"train-analysis","status":"publish","type":"questions","link":"https:\/\/www.goodacademic.com\/blog\/questions\/train-analysis\/","title":{"rendered":"Train Analysis"},"content":{"rendered":"<div class=\"col-sm-12 messageContent\">\n <b>Learning Goal: <\/b>I&#8217;m working on a data analytics exercise and need the explanation and answer to help me learn.<\/p>\n<p><strong>Please answer all questions in the word file and provide all related R code.<\/strong><\/p>\n<p><strong>HELP: Reading the data<\/strong><\/p>\n<p><em># Bringing the data<\/em><br \/>train.x &lt;- read.table(file = &#8220;train.txt&#8221;, header = FALSE)<br \/>train.y &lt;- read.table(file = &#8220;train_id.txt&#8221;, header = FALSE)<br \/>test.x &lt;- read.table(file = &#8220;test.txt&#8221;, header = FALSE)<br \/>test.y &lt;- read.table(file = &#8220;test_id.txt&#8221;, header = FALSE)<\/p>\n<p><em># Combining the data<\/em><br \/>train.dt &lt;- cbind(train.y, train.x)<br \/>test.dt &lt;- cbind(test.y, test.x)<\/p>\n<p><em># Assign names for the columns<\/em><br \/>colnames(train.dt) &lt;- c(&#8220;Y&#8221;, paste(&#8220;X&#8221;, 1:4, sep = &#8220;&#8221;))<br \/>colnames(test.dt) &lt;- c(&#8220;Y&#8221;, paste(&#8220;X&#8221;, 1:4, sep = &#8220;&#8221;))<\/p>\n<p><em># Converting target Y to (0, 1) standard look (this is optional)<\/em><br \/>train.dt$Y &lt;- ifelse(train.dt$Y == 1, 0, 1)<br \/>test.dt$Y &lt;- ifelse(test.dt$Y == 1, 0, 1)<\/p>\n<p>Sample Code:<\/p>\n<h3>Illustration with Stock Market data<\/h3>\n<p>Attached Files:<\/p>\n<ul>\n<li><a href=\"https:\/\/ualearn.blackboard.com\/bbcswebdav\/pid-8585489-dt-content-rid-101245828_1\/xid-101245828_1\" target=\"_blank\" rel=\"noopener\"><img src=\"https:\/\/learn.content.blackboardcdn.com\/3900.58.0-rel.36+3daac77\/images\/ci\/ng\/cal_year_event.gif\" alt=\"File\"> Illustration with Stock Market data.pdf<\/a> (820.749 KB)<\/li>\n<\/ul>\n<p><strong>### Discriminant Alanalysis<\/strong><br \/><strong>### Comparing Log Regression, LDA and QDA<\/strong><br \/><strong>### Package ISLR<\/strong><br \/>install.packages(&#8220;ISLR&#8221;)<br \/>library(&#8220;ISLR&#8221;)<\/p>\n<p>head(Smarket)<br \/>class(Smarket) <\/p>\n<p><strong># verify the format of data is data.frame <\/strong><br \/><strong># automatic variable assignment <\/strong><br \/>attach(Smarket)<\/p>\n<p><strong># checking dependence of observations<\/strong><br \/>plot(Lag1[1:1249] ~ Lag1[2:1250])<\/p>\n<p><strong># spliting the data<\/strong><br \/>train &lt;- subset(Smarket, Year &lt; 2005)<br \/>test &lt;- subset(Smarket, Year == 2005)<\/p>\n<p><strong>### Logistic Regression (full model) <\/strong><br \/>log.reg &lt;- glm(Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume, data = train, family = &#8220;binomial&#8221;)<br \/>summary(log.reg)<\/p>\n<p><strong># predictions<\/strong><br \/>pred.log.reg &lt;- predict(log.reg, test, type=&#8221;response&#8221;)<\/p>\n<p><strong># confusion matrix<\/strong><br \/><strong># use the function ifelse(condition, if the condition is satisfied then &#8220;&#8221;, if the condition is not satisfied then &#8220;&#8221;)<\/strong><br \/>table(test$Direction, ifelse(pred.log.reg &gt; 0.5, &#8220;Up&#8221;, &#8220;Down&#8221;))<\/p>\n<p><strong># accuracy rate<\/strong><\/p>\n<p>accu.log &lt;- mean(ifelse(pred.log.reg &gt; 0.5, &#8220;Up&#8221;, &#8220;Down&#8221;) == test$Direction)<br \/>accu.log<\/p>\n<p><strong># misclassification rate <\/strong><br \/>misc.log &lt;- 1 &#8211; accu.log<br \/>misc.log<\/p>\n<p><strong># another way to check the error rate (i.e. misc rate) <\/strong><br \/># mean(ifelse(pred.log.reg &gt; 0.5, &#8220;Up&#8221;, &#8220;Down&#8221;) != test$Direction)<\/p>\n<p><strong>#### LDA model full<\/strong><br \/>library(MASS)<br \/>lda.model &lt;- lda(Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume, data = train)<br \/>lda.model<\/p>\n<p><strong># predictions<\/strong><br \/>pred.lda &lt;- predict(lda.model, test, type=&#8221;response&#8221;)<\/p>\n<p><strong># confusion matrix<\/strong><br \/>table(test$Direction, pred.lda$class)<\/p>\n<p><strong># accuracy rate<\/strong><br \/>accu.lda &lt;- mean(pred.lda$class == test$Direction)<br \/>accu.lda<\/p>\n<p><strong># misclassification rate <\/strong><br \/>misc.lda &lt;- 1 &#8211; accu.lda<br \/>misc.lda<\/p>\n<p><strong>#### QDA model full<\/strong><br \/>qda.model &lt;- qda(Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume, data = train)<br \/>qda.model<\/p>\n<p><strong># predictions<\/strong><br \/>pred.qda &lt;- predict(qda.model, test, type=&#8221;response&#8221;)<\/p>\n<p><strong># confusion matrix<\/strong><br \/>table(test$Direction, pred.qda$class)<\/p>\n<p><strong># accuracy rate<\/strong><br \/>accu.qda &lt;- mean(pred.qda$class == test$Direction)<br \/>accu.qda<\/p>\n<p><strong># misclassification rate<\/strong> <br \/>misc.qda &lt;- 1 &#8211; accu.qda<br \/>misc.qda<\/p>\n<p><strong>###########################<\/strong><br \/><strong>###### Reduced Models #######<\/strong><br \/><strong>###########################<\/strong><\/p>\n<p><strong>### Logistic Regression (reduced)<\/strong><br \/>log.reg.reduced &lt;- glm(Direction ~ Lag1 + Lag2, data = train, family = &#8220;binomial&#8221;)<br \/>summary(log.reg.reduced)<\/p>\n<p><strong># predictions<\/strong><br \/>pred.log.reg.reduced &lt;- predict(log.reg.reduced, test, type=&#8221;response&#8221;)<\/p>\n<p><strong># confusion matrix<\/strong><br \/>table(test$Direction, ifelse(pred.log.reg.reduced &gt; 0.5, &#8220;Up&#8221;, &#8220;Down&#8221;))<\/p>\n<p><strong># accuracy rate<\/strong><br \/>accu.log &lt;- mean(ifelse(pred.log.reg.reduced &gt; 0.5, &#8220;Up&#8221;, &#8220;Down&#8221;) == test$Direction)<br \/>accu.log<\/p>\n<p><strong># misclassification rate<\/strong> <br \/>misc.log &lt;- 1 &#8211; accu.log<br \/>misc.log<\/p>\n<p><strong>#### LDA model (reduced)<\/strong><br \/>lda.reduced &lt;- lda(Direction ~ Lag1 + Lag2, data = train)<br \/>lda.reduced<\/p>\n<p><strong># predictions<\/strong><br \/>pred.lda.reduced &lt;- predict(lda.reduced, test, type=&#8221;response&#8221;)<\/p>\n<p><strong># confusion matrix<\/strong><br \/>table(test$Direction, pred.lda.reduced$class)<\/p>\n<p><strong># accuracy rate<\/strong><br \/>accu.lda &lt;- mean(pred.lda.reduced$class == test$Direction)<br \/>accu.lda<\/p>\n<p><strong># misclassification rate <\/strong><br \/>misc.lda &lt;- 1 &#8211; accu.lda<br \/>misc.lda<\/p>\n<p><strong>#### QDA model (reduced)<\/strong><br \/>qda.reduced &lt;- qda(Direction ~ Lag1 + Lag2, data = train)<br \/>qda.reduced<\/p>\n<p><strong># predictions<\/strong><br \/>pred.qda.reduced &lt;- predict(qda.reduced, test, type=&#8221;response&#8221;)<\/p>\n<p><strong># confusion matrix<\/strong><br \/>table(test$Direction, pred.qda.reduced$class)<\/p>\n<p><strong># accuracy rate<\/strong><br \/>accu.qda &lt;- mean(pred.qda.reduced$class == test$Direction)<br \/>accu.qda<\/p>\n<p><strong># misclassification rate<\/strong><br \/>misc.qda &lt;- 1 &#8211; accu.qda<br \/>misc.qda<\/p>\n<p><strong>#### ROC plots<\/strong><br \/>install.packages(&#8220;ROCR&#8221;)<br \/>library(&#8220;ROCR&#8221;)<\/p>\n<p>pred_LM &lt;- prediction(pred.log.reg.reduced, test$Direction)<br \/>LM &lt;- performance(pred_LM, measure = &#8220;tpr&#8221;, x.measure = &#8220;fpr&#8221;)<\/p>\n<p>pred_LDA &lt;- prediction(pred.lda.reduced$posterior[,2], test$Direction)<br \/>LDA &lt;- performance(pred_LDA, measure = &#8220;tpr&#8221;, x.measure = &#8220;fpr&#8221;)<\/p>\n<p>pred_QDA &lt;- prediction(pred.qda.reduced$posterior[,2], test$Direction)<br \/>QDA &lt;- performance(pred_QDA, measure = &#8220;tpr&#8221;, x.measure = &#8220;fpr&#8221;)<\/p>\n<p>plot(LM, col = &#8220;black&#8221;)<br \/>plot(LDA, add = TRUE, col = &#8220;orange&#8221;)<br \/>plot(QDA, add = TRUE, col = &#8220;blue&#8221;)<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Learning Goal: I&#8217;m working on a data analytics exercise and need the explanation and answer to help me learn. Please answer all questions in the word file and provide all related R code. HELP: Reading the data # Bringing the datatrain.x &lt;- read.table(file = &#8220;train.txt&#8221;, header = FALSE)train.y &lt;- read.table(file = &#8220;train_id.txt&#8221;, header = FALSE)test.x [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"closed","template":"","meta":[],"disciplines":[735],"paper_types":[],"tagged":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.goodacademic.com\/blog\/wp-json\/wp\/v2\/questions\/6109"}],"collection":[{"href":"https:\/\/www.goodacademic.com\/blog\/wp-json\/wp\/v2\/questions"}],"about":[{"href":"https:\/\/www.goodacademic.com\/blog\/wp-json\/wp\/v2\/types\/questions"}],"author":[{"embeddable":true,"href":"https:\/\/www.goodacademic.com\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.goodacademic.com\/blog\/wp-json\/wp\/v2\/comments?post=6109"}],"version-history":[{"count":0,"href":"https:\/\/www.goodacademic.com\/blog\/wp-json\/wp\/v2\/questions\/6109\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.goodacademic.com\/blog\/wp-json\/wp\/v2\/media?parent=6109"}],"wp:term":[{"taxonomy":"disciplines","embeddable":true,"href":"https:\/\/www.goodacademic.com\/blog\/wp-json\/wp\/v2\/disciplines?post=6109"},{"taxonomy":"paper_types","embeddable":true,"href":"https:\/\/www.goodacademic.com\/blog\/wp-json\/wp\/v2\/paper_types?post=6109"},{"taxonomy":"tagged","embeddable":true,"href":"https:\/\/www.goodacademic.com\/blog\/wp-json\/wp\/v2\/tagged?post=6109"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}