midterm.lyx

#LyX 1.6.7 created this file. For more info see http://www.lyx.org/
\lyxformat 345
\begin_document
\begin_header
\textclass article
\use_default_options true
\language english
\inputencoding auto
\font_roman default
\font_sans default
\font_typewriter default
\font_default_family default
\font_sc false
\font_osf false
\font_sf_scale 100
\font_tt_scale 100

\graphics default
\paperfontsize default
\use_hyperref false
\papersize default
\use_geometry false
\use_amsmath 1
\use_esint 1
\cite_engine basic
\use_bibtopic false
\paperorientation portrait
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\defskip medskip
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\author "" 
\author "" 
\end_header

\begin_body

\begin_layout Standard
John Hancock
\end_layout

\begin_layout Standard
CEN 6405
\end_layout

\begin_layout Standard
Mid Term Exam
\end_layout

\begin_layout Standard
May 30th, 2014
\end_layout

\begin_layout Part*
Midterm Exam
\end_layout

\begin_layout Section*
1
\end_layout

\begin_layout Subsection*
(a)
\end_layout

\begin_layout Standard
\begin_inset Formula \[
Y_{i}=\beta_{0}+\beta_{1}X_{1i}+\beta_{2}X_{2i}+\beta_{3}X_{1i}^{2}+\epsilon_{i}\]

\end_inset


\end_layout

\begin_layout Standard
Is not a linear regression model, or an intrinsically linear regression
 model because 
\begin_inset Formula $Y_{i}$
\end_inset

 is dependant on 
\begin_inset Formula $X_{1i}$
\end_inset

 and 
\begin_inset Formula $X_{i1}^{2}$
\end_inset


\end_layout

\begin_layout Standard
No matter what variable 
\begin_inset Formula $V$
\end_inset

 we transform 
\begin_inset Formula $X_{1i}$
\end_inset

 to, 
\begin_inset Formula $Y_{i}$
\end_inset

 will be dependant on 
\begin_inset Formula $V$
\end_inset

 and some power of 
\begin_inset Formula $V$
\end_inset

 .
\end_layout

\begin_layout Subsection*
(b)
\end_layout

\begin_layout Standard
\begin_inset Formula \[
Y_{i}=\epsilon_{i}exp\left(\beta_{0}+\beta_{1}X_{1i}+\beta_{2}X_{2i}^{2}\right)\]

\end_inset

 Is intrinsically linear because we can rewrite the above as:
\end_layout

\begin_layout Standard
\begin_inset Formula \[
ln\left(Y_{i}\right)=ln\left(\epsilon_{i}exp\left(\beta_{0}+\beta_{1}X_{1i}+\beta_{2}X_{2i}^{2}\right)\right)\]

\end_inset


\begin_inset Formula \[
=ln\left(\epsilon_{i}\right)+ln\left(exp\left(\beta_{0}+\beta_{1}X_{1i}+\beta_{2}X_{2i}^{2}\right)\right)\]

\end_inset


\end_layout

\begin_layout Standard
\begin_inset Formula \[
=ln\left(\epsilon_{i}\right)+\beta_{0}+\beta_{1}X_{1i}+\beta_{2}X_{2i}^{2}\]

\end_inset


\end_layout

\begin_layout Standard
And introduce new variables 
\begin_inset Formula $Y_{i}^{\prime}=ln\left(Y_{i}\right)$
\end_inset

 and 
\begin_inset Formula $X_{2i}^{\prime}=X_{2i}^{2}$
\end_inset

 , error term 
\begin_inset Formula $\epsilon_{i}^{\prime}=ln\left(e_{i}\right)$
\end_inset

 and the model becomes: 
\begin_inset Formula \[
Y_{i}^{\prime}=\beta_{0}+\beta_{1}X_{1i}+\beta_{2}X_{2i}^{\prime}+\epsilon_{i}^{\prime}\]

\end_inset


\end_layout

\begin_layout Subsection*
(c)
\end_layout

\begin_layout Standard
\begin_inset Formula \[
Y_{i}=\beta_{0}+log\left(\beta_{1}X_{1i}\right)+\beta_{2}X_{2i}+\epsilon_{i}\]

\end_inset

 Is intrinsically linear because we can rewrite the above as
\begin_inset Formula \[
Y_{i}=\beta_{0}+log\left(\beta_{1}\right)+log\left(X_{1i}\right)+\beta_{2}X_{2i}+\epsilon_{i}\]

\end_inset


\end_layout

\begin_layout Standard
And introduce new constant 
\begin_inset Formula $\beta_{0}^{\prime}=\beta_{0}+log\left(\beta_{1}\right)$
\end_inset

, new variable 
\begin_inset Formula $X_{i1}^{\prime}=log\left(X_{1i}\right)$
\end_inset

 and the model becomes:
\begin_inset Formula \[
Y_{i}=\beta_{0}^{\prime}+\beta_{1}X_{i1}^{\prime}+\beta_{2}X_{2i}+\epsilon_{i}\]

\end_inset


\end_layout

\begin_layout Section*
2
\end_layout

\begin_layout Subsection*
(a) 
\end_layout

\begin_layout Standard
We are assuming the 
\begin_inset Quotes eld
\end_inset

MS,
\begin_inset Quotes erd
\end_inset

 in the ANOVA table given stands for, 
\begin_inset Quotes eld
\end_inset

Mean Squared,
\begin_inset Quotes erd
\end_inset

 
\begin_inset Quotes eld
\end_inset

df
\begin_inset Quotes erd
\end_inset

, stands for, 
\begin_inset Quotes eld
\end_inset

degrees of freedom,
\begin_inset Quotes erd
\end_inset

 and that 1803.3 is the mean squared of the regression (
\begin_inset Formula $MSR)$
\end_inset

, and 0.8175 is the mean squared of the error (
\begin_inset Formula $MSE)$
\end_inset

.
 Then the computed 
\begin_inset Formula $F$
\end_inset

-value from the ANOVA table given is 
\begin_inset Formula \[
\frac{MSR}{MSE}=\frac{1803.3}{0.8175}\approx2208.87\]

\end_inset


\end_layout

\begin_layout Standard
In order to test whether a regression relation exists, we need the value
 from the 
\begin_inset Formula $F$
\end_inset

 distribution table for 
\begin_inset Formula $\alpha=0.05$
\end_inset

 , 
\begin_inset Formula $n=3$
\end_inset

, 
\begin_inset Formula $m=20.$
\end_inset

 This is because the ANOVA table given shows 20 degrees of freedom for error
 and 3 degrees of freedom for model.
 which is 
\begin_inset Formula $F_{\left[0.95;3,20\right]}=8.66$
\end_inset

 .
 The computed 
\begin_inset Formula $F$
\end_inset

-value is greater than the value from the 
\begin_inset Formula $F$
\end_inset

 distribution table.
 This means the regression explains a significant part of the variation
 at the 0.05 significance level.
 We can conclude that a regression relation exists.
\end_layout

\begin_layout Subsection*
(b)
\end_layout

\begin_layout Standard
The conclusion that a regression exists does not imply that the software
 engineer need not screen the independent variables.
 It is possible that we can do ANOVA, compute 
\begin_inset Formula $MSR$
\end_inset

 and 
\begin_inset Formula $MSE$
\end_inset

 that pass the 
\begin_inset Formula $F$
\end_inset

-test, but confidence intervals for some model parameters 
\begin_inset Formula $\beta_{i}$
\end_inset

 include 0.
 
\end_layout

\begin_layout Standard
If the 
\begin_inset Formula $F$
\end_inset

-test had shown that a regression relation had not existed, it would imply
 that we cannot reject the hypothesis that all of the model parameters 
\begin_inset Formula $\beta_{0}$
\end_inset

, 
\begin_inset Formula $\beta_{1}$
\end_inset

, 
\begin_inset Formula $\beta_{2}$
\end_inset

, 
\begin_inset Formula $\beta_{3}$
\end_inset

 are 0.
 This would imply that we would not need to screen for independent variables.
 It would be a waste of time.
\end_layout

\begin_layout Section*
3
\end_layout

\begin_layout Section*
(a)
\end_layout

\begin_layout Standard
The largest value for 
\begin_inset Formula $SSR$
\end_inset

, 5409.89 in the given table is for 
\begin_inset Formula $x_{1}$
\end_inset

, 
\begin_inset Formula $x_{2}$
\end_inset

 , 
\begin_inset Formula $x_{3}$
\end_inset

 .
 Therefore the 
\begin_inset Formula $R^{2}$
\end_inset

value for the model with 
\begin_inset Formula $x_{1}$
\end_inset

, 
\begin_inset Formula $x_{2}$
\end_inset

 , 
\begin_inset Formula $x_{3}$
\end_inset

 as independent variables would be the largest out of all the models listed.
 Therefore this model would be the best set for predicting 
\begin_inset Formula $y$
\end_inset

.
 
\end_layout

\begin_layout Subsection*
(b)
\end_layout

\begin_layout Standard
The observer misses the point that not screening means we might accidentally
 select a model with a lower 
\begin_inset Formula $R^{2}$
\end_inset

 than one of the models that we did not screen.
 In this case, we would have selected a model where error explains more
 varaition than the best model.
 Therefore, we should screen.
\end_layout

\begin_layout Section*
4
\end_layout

\begin_layout Subsection*
(a)
\end_layout

\end_body
\end_document