

small (250x250 max)
medium (500x500 max)
Large
Extra Large
large ( > 500x500)
Full Resolution


MULTIVARIATE NORMAL INFERENCE FOR HETEROGENEOUS SAMPLES AND AN APPLICATION TO META ANALYSIS By LICHI LIN Bachelor of Science in Mathematics ChungYuan Christian University ChungLi, TaoYuan 1990 Master of Science in Statistics National Central University ChungLi, TaoYuan 1993 Submitted to the Faculty of the Graduate College of the Oklahoma State University in partial fulfillment of the requirements for the Degree of DOCTOR OF PHILOSOPHY July, 2012 ii MULTIVARIATE NORMAL INFERENCE FOR HETEROGENEOUS SAMPLES AND AN APPLICATION TO META ANALYSIS Dissertation Approved: Dr. Ibrahim A. Ahmad Dissertation Adviser Dr. Carla L. Goad Dr. Lan Zhu Dr. Tieming Liu Dr. Sheryl A. Tucker Dean of the Graduate College iii TABLE OF CONTENTS Chapter Page I. INTRODUCTION AND LITERATURE REVIEW ..................................................1 1.1 Introduction .......................................................................................................1 1.2 Homogeneous Mean Model for Single Population ...........................................2 1.2.1 Inferences concerning the Mean Vector When Covariance Matrix is Unstructured ..............................................................................3 1.2.2 Inferences concerning the Mean Vector When Covariance Matrix Has Compound Symmetry Structure ............................................5 1.2.3 Inferences concerning the Mean Vector When Covariance Matrix Is Circulant ....................................................................................7 1.2.4 Inferences concerning the Mean Vector When Covariance Matrix Is Block Compound Symmetry ......................................................9 1.2.5 Inferences concerning Both Means and Covariance Matrices ................11 1.3 Homogeneous Mean Models for k Populations with k 2 ............................13 1.4 Meta Analysis .................................................................................................14 1.5 Proposed Heterogeneous Means Models ........................................................16 II. ONESAMPLE INFERENCE ................................................................................18 2.1 Introduction and Preliminary Cases ................................................................18 2.1.1 Inference for μ When Σ Is Known......................................................20 2.1.2 Inference for μ When Σ Is Unknown without Pattern ........................21 2.1.3 Inference for μ When Σ V 2 , 2 Unknown, V Known .................22 2.2 Mainstream: Inference for μ When Σ Has Compound Symmetry Structure and i C are Circulant .........................................................................25 2.2.1 Maximum Likelihood Estimators ..........................................................25 2.2.2 Hypothesis Testing for 0 0 H : μ μ Using LR Test ...............................28 2.2.3 Properties and Useful Results for ML Estimators ..................................42 2.2.4 Hypothesis Testing for 0 0 H : μ μ Using Approximate χ2 Test ............63 2.3 Simulation Study for Misuse of Homogeneous Mean Models ........................64 iv Chapter Page III. MULTISAMPLE INFERENCE ...........................................................................67 3.1 Introduction .....................................................................................................67 3.2 Likelihood Ratio Test for TwoSample Case ..................................................69 3.2.1 Estimation Under x y H : μ μ 0 .......................................................69 3.2.2 Estimation Under a x y H : μ μ ......................................................70 3.2.3 Likelihood Ratio Test for Testing x y H : μ μ 0 ..............................71 3.3 Approximate χ2 Test for x y H : μ μ 0 ...........................................................82 3.4 LRT for kSample Case ..................................................................................85 3.4.1 Estimation Under k H : μ ... μ 0 1 ...................................................86 3.4.2 Estimation Under a i j H : μ μ for some i j ..................................87 3.4.3 Likelihood Ratio Test for Testing k H : μ ... μ 0 1 ..........................88 IV. APPLICATION TO META ANALYSIS .............................................................99 4.1 Introduction and Preliminary Univariate Case ................................................99 4.2 Fixed Effect Model ........................................................................................100 4.3 Random Effects Model ..................................................................................103 4.3.1 TwoStage Method ...............................................................................103 4.3.2 OneStage Method ...............................................................................105 4.3.3 OneStage Method – Simulation Study ................................................109 V. CONCLUSIONS AND FUTURE WORK .........................................................115 5.1 Conclusions ....................................................................................................115 5.2 Future work ....................................................................................................118 REFERENCES ..........................................................................................................120 APPENDICES ...........................................................................................................123 A.1 ........................................................................................................................123 A.2 ........................................................................................................................124 A.3 ........................................................................................................................126 A.4 ........................................................................................................................127 v LIST OF TABLES Table Page Table 1 ......................................................................................................................66 vi LIST OF FIGURES Figure Page Figure 1 .....................................................................................................................60 Figure 2 .....................................................................................................................61 Figure 3 .....................................................................................................................62 Figure 4 .....................................................................................................................62 Figure 5 ...................................................................................................................111 Figure 6 ...................................................................................................................112 Figure 7 ...................................................................................................................113 Figure 8 ...................................................................................................................114 1 CHAPTER I INTRODUCTION AND LITERATURE REVIEW 1.1 INTRODUCTION If n X , ..., X 1 is a sample from a normal population, then to estimate the population mean the usual point estimator is the sample mean n X . However, if the collected data violate the assumption of “identically distributed” setting, that is, if each i X has heterogeneous mean, estimating the “population mean” will no longer make sense, except when structuring those means. Sometimes researchers believe that their collected sample is from a single population with a common constant mean when it is not, and they want to test the “population mean” equal to a specified value 0 without realizing that their data has previously been polluted due to some known or unknown mechanism. Hence the chance of rejection will be affected by the degree the data are polluted. Therefore it is necessary to model the disturbance of the data caused by the external or internal mechanisms and do inference for the parameter of interest. For example, let a random sample X , i 1,..., n, i be assumed independently, normally distributed with heterogeneous means C , i 1,..., n, i and common variance 2 . Let n C ,...,C 1 be known, and assume that ,..., ~ ( , ) 2 1 n i X X indN C . Although each i X has different mean, there is still an “underlying” mean hidden in this model. Once is estimated, each mean , i C i 1,..., n, is obtained. Actually, this model is a linear regression model through the origin. For this univariate 2 case, the model is very easy to estimate, while when extending it to multivariate case, the matrices i C ’s become troublesome. A special case of interest for i C is to assume it is a square matrix. For the remainder of this chapter, a review of the literature for inferences of multivariate homogeneous mean models for single normal population is introduced in Section 1.2 as follows: Subsection 1.2.1 gives a review for inferences concerning the mean vector when the covariance matrix is unstructured. Subsections 1.2.2 to 1.2.4 are about inferences of the means assuming that the covariance matrices are patterned. Finally, Subsection 1.2.5 is about the inferences concerning both the means and covariance matrices. Section 1.3 is about the inferences for multivariate homogeneous mean model for k normal populations with k 2. Section 1.4 gives a brief review for meta analysis. Section 1.5 formally introduces the proposed model under multivariate normal setting and gives an overall introduction for the contents of later chapters. 1.2 HOMOGENEOUS MEAN MODEL FOR SINGLE POPULATION The p dimensional multivariate normal model has mean μ and covariance matrix Σ . The basic statistical problem is to estimate the parameters with a sample of n observations n X ,..., X 1 from the normal distribution with homogeneous mean μ and homogeneous covariance matrix Σ. The maximum likelihood estimator of μ is just the sample mean and the maximum likelihood estimator of Σ is proportional to the matrix of sample variances and sample covariances. The sample covariance matrix is defined by ( )( )' 1 1 1 S X X X X i j n n i , (1.1) where n i i n 1 1 X X , and S is unbiased for estimating Σ and follows Wishart distribution , 1) 1 1 ( n n W Σ . 3 1.2.1 Inferences Concerning the Mean Vector When Covariance Matrix Is Unstructured Tests for the mean μ equal to a specified vector 0 μ have been discussed in many multivariate analysis textbooks (e.g. Anderson 2003, and Rencher 1998) for the cases that Σ is known as well as that Σ is unknown and unstructured. Since n(X μ) is distributed according to N(0, Σ) , it follows that ( )' ( ) 1 X μ Σ X μ n has a central chisquare distribution with p degrees of freedom for the case that Σ is known. For the case that Σ is unknown and unstructured, the likelihood of the homogeneous mean model given observations n x ,..., x 1 is ( ) ( )}, 2 1 (2 )   exp{ ( ) ( )} 2 1 ( ,  ,..., ) (2 )   exp{ 1 2 2 1 1 1 2 1 1 2 1 n i i T i np n n i n i i T i p n L Σ x μ Σ x μ μ Σ x x Σ x μ Σ x μ (1.2) and the corresponding log likelihood is ( ) ( ), 2 1 log   2 log ( ,  ,..., ) 1 1 1 n i i T n i n L μ Σ x x constant Σ x μ Σ x μ where log is the logarithm taken to base e. Let ( )' ( ) 0 1 0 2 X μ S X μ T n . For the rest part of this subsection, the following theorem concerning Hotelling 2 T distribution is stated and the likelihood ratio test for the hypothesis 0 0 H : μ μ is developed and based on the 2 T  statistic (Anderson 2003). Theorem 1.1 (Anderson 2003) Let n X ,..., X 1 be a sample from N(μ, Σ) , and define ( )' ( ) 0 1 0 2 X μ S X μ n T . The distribution of p n p n T 1 2 is noncentral F with p and n p degrees of freedom and noncentrality parameter ( )' ( ) 0 1 0 μ μ Σ μ μ n . If 0 μ μ , then the Fdistribution is central. 4 Since the 2 T statistic follows the Hotelling’s 2 T distribution which is the generalized version of Student’s t distribution, the confidence region of the mean vector can be derived on the basis of the 2 T statistic. The likelihood ratio for testing 0 0 H : μ μ is ( ˆ , ˆ ) ( , ˆ ) max ( , ) max ( , ) 0 0 , 0 μ Σ μ Σ μ Σ μ Σ μ Σ Σ L L L L (1.3) where ( ˆ )( ˆ )'. ˆ 1 , and 1 ˆ ( )( )', ˆ 1 1 1 0 1 0 0 n i i j n i i n i i j n n n Σ X μ X μ μ X X Σ X μ X μ (1.4) Thus (1.3) becomes .  ˆ   ˆ  2 1 exp  ˆ   ˆ  ( ) ( ) 2 1 exp  ˆ   ˆ  ˆ ( )( ) ˆ ( ˆ )( ˆ ) 2 1 exp  ˆ   ˆ  ( ˆ ) ˆ ( ˆ ) 2 1 (2 )  ˆ  exp ( ) ˆ ( ) 2 1 (2 )  ˆ  exp 2 0 2 0 2 0 1 1 1 0 0 1 0 2 0 1 2 2 1 1 0 1 0 0 2 0 2 n n p p n n i T i i n i T i i n n i i T i np n n i i T i np n tr nI tr nI np np tr tr Σ Σ Σ Σ Σ Σ Σ X μ X μ Σ X μ X μ Σ Σ Σ X μ Σ X μ Σ X μ Σ X μ Replacing Σ ˆ and 0 ˆΣ using (1.4), 2 / n becomes  ( )( )'   ( )( )'   ˆ   ˆ  1 0 0 1 0 2/ n i i j n n i i j X μ X μ X X X X Σ Σ . Further, to derive the likelihood ratio criterion, the following Corollary is required. 5 Corollary 1.1 (Anderson 2003) For C nonsingular, (1 ' ) 1 ' ' ' 1 1 C y C y y C y C yy y C y . Defining n i 1 i j A (X X)(X X)' and using Corollary 1.1, we have , 1 /( 1) 1 1 ( )' ( 1) ( ) 1 1 ( )' ( ) 1 ( )( )' 2 0 1 0 0 1 0 0 0 2 / n n T n n n n X μ S X μ A X μ X μ X μ A X μ A where 2 T is defined in Theorem 1.1. Thus the likelihood ratio test for 0 0 H : μ μ has rejection region { ,..., : } 0 2 1 T C n x x where (1 , , ) ( 1) 0 F p n p n p n p C is such that (  ) 0 0 2 P T C H , the significance level of the test. 1.2.2 Inferences Concerning the Mean Vector When Covariance Matrix Has Compound Symmetry Structure Define the p variate mean vector ( ,..., ) 1 p μ . Wilks (1946) derived the exact likelihood ratio criterion for testing : 0 H equality of p entries of the mean vector μ or p H : μ 1 0 , where is an unknown real number and p 1 is a p1 vector with all entries equal to 1, when the covariance matrix Σ has compound symmetry structure as defined in (1.5). This could be done when the likelihood ratio criterion, which was also derived in the same paper, for testing : Σ 0 H has compound symmetry vs : Σ a H is unstructured, does not have a significantly small value. The compound symmetry covariance matrix is of the form 6 , 2 2 2 2 2 2 2 2 2 Σ (1.5) where 0 and ( 1) 1 1 p to ensure positive definiteness of the compound symmetry covariance structure of Σ . This structure assumes that the unknown p variances are all equal through the common intraclass correlation. Geisser (1963) derived the likelihood ratio test for testing 0 0 H : μ μ where 0 μ is a known constant, when the underlying covariance matrix has a compound symmetry structure as shown in (1.5). In this paper, the likelihood ratio test statistic L for testing 0 0 H : μ μ under the covariance matrix structure in (1.5) is of the form 1 1, 1 ( 1) 1,( 1)( 1) 1 1 1 1 1 1 n p p n p F n F n L , (1.6) 1 2 1 2 1 ( 1) 2 ( 1)( 1) 2 1 1 1 n p p n p L , (1.7) or 2 1 1 L B B p , (1.8) where p1,(n1)( p1) F and 1,n1 F are independent F random variables with degrees of freedom indicated in subscripts and 2 p1 , 2 1 , 2 ( p1)(n1) , and 2 n1 are independent chisquare random variables with the corresponding degrees of freedom shown in subscripts. 1 B and 2 B are independent beta variables ( 1)) 2 1 ( 1)( 1), 2 1 Beta( p n p and ) 2 1 ( 1), 2 1 Beta( n , respectively, based on the following properties about beta random variables. 7 Properties of beta random variables: (Bailey 1992) Let U and V be independent, ~ ( ) 2 U m , ~ ( ) 2 V n . Then ) 2 , 2 ~ ( m n Beta U V U . The rth raw moment of L can be calculated easily and approximations to the distribution of the product has been studied by Tukey and Wilks (1946) such that finding approximate critical values for the test is feasible. The hypothesis 0 0 H : μ μ is rejected when L is sufficiently small. 1.2.3 Inferences Concerning the Mean Vector When Covariance Matrix Is Circulant A circulant matrix of order p , or circulant in short, is a p p square matrix of the form 1 2 0 1 0 2 0 1 1 ( ) a a a a a a a a a a p p p ij A . (1.9) The elements of each row of the matrix A are identical to those of the previous row, but are moved one position to the right and wrapped around such that the last element of the previous row becomes the first element of the current row. Note that the whole circulant is evidently determined by the first row. Also we may denote the circulant A in (1.9) by ( , ,..., ) 0 1 1 p A circ a a a . So A is a p p circulant if and only if ij j i p a a ( ) , where ( j i)  p is defined as when . when , ( )  j i i j p j i i j j i p For more details about circulant matrices, refer to Davis (1979) and Graybill (1983). If a positive definite covariance matrix is circulant, it must also be symmetric. Examples for circulant 8 covariance matrices ( , ,..., ) 1 2 1 2 2 p circ with p 4 and p 5 are, respectively, 1 1 1 1 1 2 1 2 1 1 1 1 2 1 2 1 2 , and 1 1 1 1 1 1 2 2 1 2 2 1 1 2 1 1 2 1 1 2 2 1 2 2 1 2 , satisfying j p j of the symmetric circulant covariance matrix Σ of the form 1 1 1 1 2 1 2 1 1 2 p p p Σ . (1.10) If assuming 1 1 ... p in (1.10), the covariance matrix is said to be compound symmetric defined in (1.5). Olkin and Press (1969) have found the MLEs of the mean μ and covariance matrix Σ and have derived the exact likelihood ratio criteria for testing equality of p entries of the mean vector μ and the mean vector μ equal to zero when the covariance matrix Σ has a circulant structure. Their derivations for estimation and testing started by making the transformations on X and S such that Y XΓ 1/ 2 n , V ΓSΓ , where X and S are sample mean and sample covariance matrix as defined in (1.1). Γ is orthogonal such that it transforms the circulant covariance matrix Σ to diagonal form. Note that Y and V are independent. They also derived the likelihood ratio tests and asymptotic approximations of the test statistics for means and covariance matrices. They simultaneously tested (i) that the mean vector μ are zero and the covariance matrix is circulant, (ii) that the p entries of the mean vector μ are all equal and the covariance matrix is circulant, both against general alternatives that all the entries of μ are real numbers and the covariance matrix is positive definitive. 9 1.2.4 Inferences Concerning the Mean Vector When Covariance Matrix Is Block Compound Symmetry The estimating and testing problems for block compound symmetry arising from multivariate normal distributions was first studied by Votaw (1948). He proposed twelve hypotheses and tested them using likelihood ratio method. An introduction of the six hypotheses for one sample will be mentioned in Subsection 1.2.5. The other six hypotheses for k samples (k 2) are stated in Section 1.3. A more recent paper that estimated and tested concerning means and covariance matrices under block compound symmetry covariance structure is given by Szatrowski (1982). In his paper, two types of covariance structures – block compound symmetry of type I (BCSI) and block compound symmetry of type II (BCSII) were considered. The problem of testing 0 0 H : μ μ given that the covariance matrix has the block compound symmetry structure was also considered. In his paper, estimating and testing were based on maximum likelihood method. Null distributions of likelihood ratio statistics of the form  ˆ   ˆ  2/ Σ Σ n were simplified for some special cases of Votaw’s six hypotheses for single population, where is the parameter space under the alternative hypothesis, ω is the parameter space under the null hypothesis. Σ ˆ is the MLE of covariance matrix under the alternative hypothesis and Σ ˆ is the MLE of covariance matrix under the null hypothesis. Also the moments of 2/ n were obtained under the null and the approximate null distributions of 2log were found using Box’s approximation (1949). A BCSI assumption can be illustrated by the following example. Suppose that a standard test score of college calculus is a random variable 1 X with mean 1 . There are a set of three other alternative tests, namely 2 X , 3 X ,and 4 X with means 2 , 3 , and 4 , respectively. So the vector ( , , , )' 1 2 3 4 X X X X X forms a 41 normal random vector with mean 10 ( , , , )' 1 2 3 4 μ . Under the block compound symmetry of type I (BCSI) assumption, the covariance structure is of the form C D D B C D B D C B D D A C C C . (1.11) The hypothesis of interest is the interchangeability of variables 2 X , 3 X , and 4 X . It is equivalent to the hypothesis that the vector X has mean ( , , , )' 1 2 2 2 μ and the covariance structure is of the form in (1.11). That is the random vectors ( , , )' 1 2 3 4 X X X X , ( , , )' 1 2 4 3 X X X X , ( , , )' 1 3 2 4 X X X X , ( , , )' 1 3 4 2 X X X X , ( , , )' 1 4 2 3 X X X X , and ( , , )' 1 4 3 2 X X X X have the same distribution. For a more general case, consider b distinct standard tests and h sets of alternative tests, each of which measures ni abilities. That is, X is partitioned into b + h subsets and forms a b n p h i i 1 variate random vector. Under the BCSI assumption, within each subset of variates, the means are equal, the variances are equal, and the covariances are equal and between any two distinct subsets of variates, the covariances are equal. In regard to the BCSII assumption, we may consider the following example. Assume that there are two types of tests of cognitive abilities. Each type of cognitive tests measures the abilities of verbal (V) and thinking (T). So the two types of test scores are assumed to be a multivariate 41 normal random vector ( , , )' 1 2 3 4 Y Y Y Y Y with mean ( , , , )' 1 2 3 4 μ , where 1 Y and 2 Y are scores of verbal ability for type I and type II tests, respectively; 3 Y and 4 Y are scores of thinking ability for type I and type II tests, respectively. Under the compound symmetry of type II (CSII) assumption, the mean of Y reduces to ( , , , )' 1 1 3 3 μ , and the covariance matrix is of the form 11 F E D B E F B D C A F E A C E F . (1.12) The test of hypothesis of interest would be 1 2 , 3 4 and that the covariance matrix has BCSII structure shown in (1.12). Or equivalently be the test of simultaneous interchangeability of two types of measures for verbal and thinking abilities. For example, the distributions of ( , , )' 1 2 3 4 Y Y Y Y and ( , , )' 2 1 4 3 Y Y Y Y are the same but the distributions of ( , , )' 1 2 3 4 Y Y Y Y and ( , , )' 2 1 3 4 Y Y Y Y are not the same. These kinds of tests can also be applied to medical research especially for repeated measurements (Crowder & Hand 1990) data when comparing the effect of treatment and control groups (Morrison, 1972). For a more general case, one can consider n types of tests and h types of measures of cognitive abilities such that Y is an nh random vector. 1.2.5 Inferences Concerning Both Means and Covariance Matrices Wilks (1946) tested the hypothesis that a normal pvariate distribution has a complete symmetry covariance matrix structure as shown in (1.5) versus the hypothesis that the covariance matrix is unstructured by likelihood ratio test. In this paper, he also derived the LRT for testing p μ 1 and Σ is compound symmetry simultaneously against the general alternative that all the entries of μ are real numbers and the covariance matrix is positive definitive. Votaw (1948) first studied the problem of estimating and testing for block compound symmetry in data arising from multivariate normal distributions. He extended Wilks’ result by considering a normal pvariate random vector which can be partitioned in q mutually independent subsets of which b subsets contain exactly one variate each and the remaining q  b = h subsets (h ≥ 1) contain n1,…, nh variates, respectively, where nα ≥ 2; α = 1,…, h; b + n1+…+ nh = p. Let (1b, n1,…, nh ) denotes such a partition of a the pvariate random vector. Without loss of 12 generality, assume n1 ≤ … ≤ nh . A special case is that b = 0. For assumptions of block compound symmetry of type I and type II, Section 1.2.4 has given a brief introduction. In his paper, Votaw (1948) proposed 6 null hypotheses for testing the means or covariances or both based on a single sample. These hypotheses are: 1) ( ) 1 H mvc , 2) ( ) 1 H vc , 3) ( ) 1 H m , 4) ( ) 1 H mvc , 5) ( ) 1 H vc , and 6) ( ) 1 H m . The hypotheses 13 are for BCSI assumptions and the remaining three are for BCSII assumptions. The null hypotheses 1, 2, 4, and 5 are against the alternative hypothesis that the means are real numbers and the covariance matrix is positive definite. The statements of the above six hypotheses are as follows: ( ) 1 H mvc is the hypothesis that within each subset of variates, the means are equal, the variances are equal, and the covariances are equal and that between any two distinct subsets of variates, the covariances are equal. ( ) 1 H vc is the hypothesis that within each subset of variates, the variances are equal and the covariances are equal and that between any two distinct subsets of variates, the covariances are equal. ( ) 1 H m is the hypothesis that within each subset of variates, the means are equal, given that the variances are equal and the covariances are equal and that between any two distinct subsets, the covariances are equal. ( ) 1 H mvc is the hypothesis that within each subset of variates, the means are equal, the variances are equal, and the covariances are equal and that between any two distinct subsets of variates, the diagonal covariances are equal and the offdiagonal covariances are equal. ( ) 1 H vc is the hypothesis that within each subset of variates, the variances are equal and the covariances are equal and that between any two distinct subsets of variates, the diagonal covariances are equal and the offdiagonal covariances are equal. 13 ( ) 1 H m is the hypothesis that within each subset of variates, the means are equal, given that the variances are equal and the covariances are equal and that between any two distinct subsets of variates, the diagonal covariances are equal and the offdiagonal covariances are equal. Votaw derived the likelihood ratio for each hypothesis. In his paper, he also developed an explicit expression of the likelihood ratio criterion for each hypothesis and found its rth moment and approximate distribution when the corresponding hypothesis is true. Olkin and Press (1969) have considered the problem of 1) testing the null that Σ has complete symmetry versus the alternative hypothesis that Σ is a circulant; 2) testing the null that I 2 Σ versus the alternative hypothesis that Σ is a circulant; 3) testing the null hypothesis that Σ is a circulant versus the alternative hypothesis that Σ is positive definite. 1.3 HOMOGENEOUS MEAN MODELS FOR k POPULATIONS WITH k 2 Votaw (1948) tested the following hypotheses based on k samples: 1*) H (MVC  mvc) k , 2*) H (VC  mvc) k , 3*) H (M  mVC) k , 4*) H (MVC  mvc) k , 5*) H (VC  mvc) k , and 6*) H (M  mVC) k . The hypotheses 13 are for BCSI assumptions and the rest three are for BCSII assumptions. The statements of the above six hypotheses are as follows: H (MVC  mvc) k is the hypothesis that k normal pvariate distributions are the same given that they all satisfy ( ) 1 H mvc which is introduced in section 1.2.5. H (VC  mvc) k is the hypothesis that k normal pvariate distribution have the same variancecovariance matrix given that they all satisfy ( ) 1 H mvc . H (M  mVC) k is the hypothesis that k normal pvariate distributions are the same given that they all satisfy ( ) 1 H mvc and that they all have the same variancecovariance matrix. 14 H (MVC  mvc) k is the hypothesis that k normal pvariate distributions are the same given that they all satisfy ( ) 1 H mvc which is introduced in section 1.2.5. H (VC  mvc) k is the hypothesis that k normal pvariate have the same variancecovariance matrix given that they all satisfy ( ) 1 H mvc . H (M  mVC) k is the hypothesis that k normal pvariate distributions are the same given that they all satisfy ( ) 1 H mvc and that they all have the same variancecovariance matrix. For each of the above six hypotheses, Votaw developed the likelihood ratio test in terms of deriving the explicit expression of the likelihood ratio criteria 2 L , where is the likelihood ratio, for the hypotheses 1* – 4* and N L 2 / for the remaining two hypotheses, where N is total number of sample sizes for all k populations. He also found the rth moment and approximate distribution for each test hypothesis. Geisser (1963) compared the means of k pvariate normal populations under the assumption that the k normal populations have the common compound (complete) symmetry covariance structure using multivariate analysis of variance approach implemented by use of the information criterion (Chapter 9, Kullback 1959). 1.4. META ANALYSIS Meta analysis has been widely used to synthesize results from systematic reviews of reliable research in many fields. There has been a massive growth in application of meta analysis to areas such as medical research, health care, education (Glass, 1976), criminal justice, social policy, etc. See Kulinskaya et al. (2008) and Sutton et al. (2000) for a detailed account of meta analysis. A recent development of meta analysis has been summarized by Sutton and Higgins (2008). One uses a fixed effect model to combine treatment or parameter estimates when assuming no heterogeneity between the study results. In fact, point estimates of parameters from different 15 studies are almost always different. If the differences of the point estimates are only simply due to sampling error, that is, the source of variation between studies is random variation, we can use a fixed effect model. Sometimes the researchers prefer to believe that the true unknown parameters from different studies vary from one study to the next, the studies represent a random sample of the parameters that could have been observed and comes from a specific distribution. Under this situation, a random effects model will be considered in the analysis. The standard fixed effect model in metaanalysis is that if we have k independent studies, with data, each of which reports an estimate ( ) ˆ i for a common parameter . Each estimate ( ) ˆ i is assumed independently, normally distributed as ˆ ~ ( , ), 1,..., , 2 ( ) i k n N i i i (1.13) where is the sample size of ith study and 2 i is the underlying variance parameter for ith study. Given ( ( ) ˆ i , 2 i , i n ) the ML estimator for and its variance are, respectively, , ˆ ~ 1 2 1 ( ) 2 k i i i k i i i i n n (1.14) and . 1 ( ~) 1 2 k i i i n Var (1.15) Now consider the multivariate models with k independent samples, each of which i ini X X 1,..., is from ( , ) p i MVN μ Σ population for i 1,..., k . Suppose μ is the parameter vector of interest. The ML estimator for μ based on ith sample is ˆ , 1,..., . ( ) i k i i μ X (1.16) 16 Here we assume Σi is known for all i 1,...,k . In fact, ˆ 's (i) μ are independent and ), 1,..., . 1 ˆ ~ ( , ( ) i k n MVN i i i μ μ Σ (1.17) Given ( ( ) ˆ i μ , 2 i Σ , i n ) for the k studies, the ML estimator for μ and its variancecovariance matrix based on the k independent samples are respectively ~ ˆ , 1 ( ) 1 1 1 1 k i i i i k i i i μ n Σ n Σ (1.18) (~) . 1 1 1 k i i i Cov μ n Σ (1.19) Statistical inferences are based on the fact that ~ . ~ ' ~ 2 , 1 1 p k i i i μ μ n Σ μ μ (1.20) Applications of the proposed heterogeneous means normal model to random and fixed effects meta analysis will be developed and presented in Chapter 4. The proposed models will be stated in the next section. 1.5 PROPOSED HETEROGENEOUS MEANS MODELS Consider an independent sample M X ,..., X 1 such that X ~ (μ , Σ) i p i MVN , where μ C μ i i for all i 1,...,M , and both μ and Σ are unknown. The matrices i C are p p for all i 1,...,M and the covariance matrix Σ is positive definite. Some further restrictions will be considered later for i C when necessary. The likelihood function is . 1 2 1 2 1 2 1 1 2 1 ( ) ( )} 2 1 (2 )   exp{ ( ) ( )} 2 1 ( ,  ,..., ) (2 )   exp{ M i i i T i i Mp M M i i i T i i p M L Σ x C μ Σ x C μ μ Σ x x Σ x C μ Σ x C μ (1.21) 17 The covariance matrix Σ is patterned in order to make the maximum likelihood estimator (MLE) of μ vector not involve the ML estimator of Σ . Based on the likelihood function for a given sample, inferences for onesample and multisample data are presented in Chapters 2 and 3, respectively. The likelihood ratio test for onesample case for 0 0 H : μ μ is derived explicitly under some constraints on the matrices i C and covariance matrix Σ . Especially, i C is assumed circulant for all i. Σ is assumed compound (complete) symmetry of the form in (1.5). The distributions of the MLEs of the intraclass correlation and variance 2 , namely ˆ and 2 ˆ , respectively, are obtained and the behavior of ˆ is investigated in terms of its mean and standard deviation by a simulation study. For the twosample and multisample cases, the likelihood ratio test for testing k H : μ ... μ 0 1 is derived exactly assuming equal compound symmetry covariance matrix for the k populations. Large sample χ2 test is gained for each of onesample and twosample cases. An application of the proposed model to meta analysis is developed in Chapter 4. In traditional meta analysis, the sample from each study is assumed independently, identically distributed, while the sample from the proposed model is not the case. In Chapter 4, applications of the proposed model to fixed and random effects models for multivariate meta analysis (Jackson et al., 2011, Nam et al., 2003) about continuous outcomes will be developed and presented. Since the outcome measures in the proposed model are noncomparative continuous, onestage method for individual patient / participant data (IPD) random effects model is suggested by Higgins et al. (2001) to investigate the heterogeneity of the effects (parameters) among several studies. 18 CHAPTER II ONESAMPLE INFERENCE 2.1 INTRODUCTION AND PRELIMINARY CASES Consider an independent sample of size M, M X ,..., X 1 ~ (μ , Σ) p i MVN , where μ C μ i i for all i 1,...,M , and both μ and Σ are unknown. The matrices i C are p p for all i 1,...,M and the covariance matrix Σ is positive definite. Some further restrictions will be considered later for i C when necessary. The likelihood function is already shown in (1.21), thus the log likelihood function is ( ) ( ). 2 1 log   2 ( ) ( ) 2 1 log   2 log(2 ) 2 log ( ,  ,..., ) 1 1 1 1 1 M i i i T i i M i i i T M i i M constant Mp M L Σ x C μ Σ x C μ μ Σ x x Σ x C μ Σ x C μ (2.1) For simplicity, log ( ,  ,..., ) 1 M L μ Σ x x will be expressed as log L(μ, Σ  x) from now on. Let M i i i T i i Q 1 1 (x C μ) Σ (x C μ) . Our goal is to find the MLEs for μ and Σ . We can start by rewriting the log likelihood function in (2.1) such that maximizing log L(μ, Σ  x) , or equivalently minimizing Q with respect to μ , becomes easier. But Q can be expressed as 19 ( )( ) [ ], ( )( ) ( )( ) ( ) ( ) ( ) ( ) 1 1 1 1 1 1 1 1 1 1 1 Σ x C μ x C μ Σ V Σ x C μ x C μ Σ x C μ x C μ x C μ Σ x C μ x C μ Σ x C μ tr tr tr tr Q tr tr M i T i i i i M i T i i i i M i T i i i i M i i i T i i M i i i T i i where M i T i i i i 1 V (x C μ)(x C μ) . Define ˆ , 1 1 1 1 1 M i i T i M i i T i μ C Σ C C Σ X then V can be expressed as ( ˆ )( ˆ ) ( ˆ )( ˆ ) , ( ˆ )( ˆ ) ( ˆ ˆ )( ˆ ˆ ) 1 1 1 1 M i M i T i i i i T i i i i M i T i i i i M i T i i i i i i i i x C μ C μ C μ C μ C μ x C μ A C μ C μ C μ C μ V x C μ C μ C μ x C μ C μ C μ where M i T i i i i 1 A (x C μˆ )(x C μˆ ) . Hence we have ( ˆ )( ˆ ) , ( ˆ )( ˆ ) ( ˆ )( ˆ ) ( ˆ )( ˆ ) 1 1 1 1 1 1 1 1 1 1 M i T i i i i M i M i T i i i i T i i i i M i T i i i i tr tr tr tr Q tr tr Σ A Σ C μ C μ C μ C μ Σ x C μ C μ C μ Σ C μ C μ x C μ Σ A Σ C μ C μ C μ C μ where the second equality is justified by ( ˆ ) ( ˆ ) 0. ( ˆ )( ˆ ) ( ˆ )( ˆ ) 1 1 1 1 1 1 M i i i T i T M i T i T i i M i T i i i i tr tr μ μ C Σ x C μ Σ x C μ C μ C μ Σ x C μ μ μ C Likewise, we have ( ˆ )( ˆ ) 0. 1 1 M i T i i i i trΣ C μ C μ x C μ Hence Q can be expressed as 20 [ ( ˆ )] [ ( ˆ )]. [ ( ˆ )][ ( ˆ )] 1 1 1 1 1 1 M i i T i M i T i i tr Q tr tr Σ A C μ μ Σ C μ μ Σ A Σ C μ μ C μ μ Therefore, the log likelihood becomes ( ˆ ) ( ˆ ) , 2 1 log   2 log ( ,  ) 1 1 1 Σ A μ μ C Σ C μ μ μ Σ x Σ M i i T i T tr M L constant (2.2) where M i T i i i i 1 A (x C μˆ )(x C μˆ ) , M i i T i M i i T i 1 1 1 1 1 μˆ C Σ C C Σ x . We can base on the log likelihood expressed in (2.2) to find the MLEs for μ and/or Σ under some specified conditions. 2.1.1 Inference for μ When Σ Is Known From the log likelihood derived in (2.2), we can see that the third term of the righthand side is the only one involving μ . If M i i T i 1 1C Σ C is a positive definite matrix, the minimum of Q, w.r.t. μ , occurs at ˆ , 1 1 1 1 1 M i i T i M i i T i μ C Σ C C Σ X (2.3) which is the MLE of μ , a linear combination of 's i X . Note that μˆ is normally distributed with mean , ( ˆ ) 1 1 1 1 1 1 1 1 1 1 C Σ C C Σ C μ μ μ C Σ C C Σ X M i i T i M i i T i M i i T i M i i T i E E 21 and the covariance matrix Cov(μˆ ) obtained in the following way. Since μˆ satisfies the identity M i i T i M i i T i 1 1 1 1 C Σ C μˆ C Σ X , taking covariance on both sides yields ( ˆ ) . ( ˆ ) ( is positive definite) ( ˆ ) ( ) ( )( ) ˆ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M i i T i M i i T i M i i T i M i i T i M i i T i i M i T i T T i M i i T i T M i i T i M i i T i M i i T i M i i T i Cov Cov Cov Cov Cov Cov μ C Σ C C Σ C μ C Σ C C Σ C C Σ Σ Σ C C Σ C Σ C Σ C μ C Σ C C Σ X C Σ C Σ C μ C Σ X Hence μˆ is normally distributed as 1 1 1 ˆ ~ , M i i T p i μ MVN μ C Σ C , (2.4) which leads to the result 2 1 1 ( ˆ ) ( ˆ ) ~ p M i i T i T μ μ C Σ C μ μ . Therefore, for testing 0 0 H : μ μ we reject 0 H if 2 0 , 1 1 0 ( ˆ ) ( ˆ ) p M i i T i T μ μ C Σ C μ μ . 2.1.2 Inference for When Σ Is Unknown without Pattern When Σ is unknown we have the MLE of μ which has the same form as that in (2.3) with Σ replaced by Σ ˆ , the MLE of Σ . Hence the MLE of μ is μ 22 ˆ ˆ ˆ , 1 1 1 1 1 M i i T i M i i T i μ C Σ C C Σ X (2.5) where Σ ˆ is the MLE of Σ . Therefore based on a result of Anderson (2003, Lemma 3.2.2, p. 69) in connection with (2.2), we have the MLE of Σ M i T i i i i M 1 ( ˆ )( ˆ ) . ˆ 1 Σ X C μ X C μ (2.6) We should note that the expression of μˆ in (2.5) involves Σ ˆ . Recall that in the iid case, if μ μ i for all i, the MLE of μ does not involve Σ at all. In general, there are no explicit solutions for μˆ and Σ ˆ and the equations in (2.5) and (2.6) need to be solved iteratively for μˆ and Σ ˆ . Thus the approximation 2 1 1 ( ˆ ) ˆ ( ˆ ) p D M i i T i μ μ C Σ C μ μ (Crowder and Hand, 1990) is still attainable such that testing 0 0 H : μ μ asymptotically can be done. Nevertheless, to remove Σ ˆ in (2.5) such that the MLEs μˆ and Σ ˆ can be gained explicitly, we should consider a patterned covariance matrix Σ with details about inference for μ covered in Section 2.2. Before doing so, let us consider another structure of Σ in the next subsection. 2.1.3 Inference for μ When Σ V 2 , 2 Unknown, V Known Recall that X ~ (μ , Σ) i p i MVN . In this subsection, we consider the case that Σ V 2 , where 0 2 is an unknown constant and is a known positive definite matrix. So μ and 2 are the only unknown parameters. Therefore, the maximum likelihood estimator of μ is M i i T i M i i T i 1 1 1 1 1 μˆ C V C C V X . To find the MLE of 2 , let us consider the log likelihood function first. Define 2 , the log V 23 likelihood function is ( ) ( ), 2 1 ln 2 ln ( ,  ) 1 1 M i i i T i i Mp L μ x constant x C μ V x C μ which yields ( ) ( ). 2 1 2 ln 1 1 2 M i i i T i i L Mp x C μ V x C μ Setting the above equation zero and solving for , the MLE of 2 is M i i i T i i Mp 1 2 1 ( ˆ ) ( ˆ ) 1 ˆ X C μ V X C μ . Since μˆ is a linear combination of 's i X , the distribution of μˆ can be found as ˆ ~ ( , ). 1 1 2 1 M i i T p i μ MVN μ C V C Next, the distribution of 2 2 Mpˆ / can be shown to follow 2 distribution with p(M 1) degrees of freedom. We may also show that μˆ and 2 ˆ are independent. To proceed, partition the quantity M i i i T i i 1 1 (X C μ) V (X C μ) . That is, ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) 2 ( ˆ ) ( ˆ ) ( ˆ ˆ ) ( ˆ ˆ ) ( ) ( ) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 X C μ V X C μ μ μ C V C μ μ μ μ C V C μ μ X C μ V X C μ μ μ C V X C μ X C μ C μ C μ V X C μ C μ C μ X C μ V X C μ M i i T i T M i i i T i i M i i T i T M i i i T i T M i i i T i i M i i i i i T i i i i M i i i T i i We need to note that M i i i T i T 1 1 (μˆ μ) C V (X C μˆ ) equals 0 due to the fact that 24 ( ˆ ) 0 1 1 M i i i T i μ C X V C . Therefore we have M i i i T i i 1 2 1 (X C μ) V (X C μ) equal to ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ). 1 2 1 1 2 1 μ μ C V C μ μ μ C X V μ C X M i i T i T M i i i T i i We can show that both terms of the above quantity are independent by showing that each pair of μˆ and X C μˆ i i for all i 1,...,M are independent. Since both μˆ and X C μˆ i i are normally distributed, we can show that they are statistically independent by just showing that their covariance matrix is zero. That is, 0. ( ) ( , ) , ( ˆ , ˆ ) ( ˆ , ˆ ) ( ˆ , ) ( ˆ , ˆ ) 1 1 2 1 1 1 2 1 1 1 1 2 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 T k M i i T i T k M i i T i T k M i i T i T k M i i T i T k M i i T k k i T k M i i T i T k k M i i T i M i i T i k k k k Cov Cov Cov Cov Cov Cov C V C C C V C C C V C C V V C V C C C V C C V X X C V C C C V C C V X X μ μ C μ X C μ μ X μ C μ This implies that M i i i T i i 1 2 1 (X C μˆ ) V (X C μˆ ) and ( ˆ ) ( ˆ ) 1 2 1 μ μ C V C μ μ M i i T i T are statistically independent. In addition, using the result of sum of two independent chisquare random variables (Bain & Engelhardt 1992, page 284), we have 2 1 2 1 ( ˆ ) ( ˆ ) ~ p M i i T i T μ μ C V C μ μ implying that 2 ( 1) 1 2 2 2 1 ˆ / ( ˆ ) ( ˆ ) ~ M p M i i i T i i Mp X C μ V X C μ . Therefore, under H0: 0 μ μ , we have 25 p M p M i i T i T F M M p 2 ,( 1) 0 1 1 0 ~ ˆ ( 1) ( ˆ ) ( ˆ ) μ μ C V C μ μ , which can be used for testing 0 0 H : μ μ . 2.2 MAINSTREAM: INFERENCE FOR μ WHEN Σ HAS COMPOUND SYMMETRY STRUCTURE AND i C ARE CIRCULANT 2.2.1 Maximum Likelihood Estimators There are three conditions considered before deriving the MLEs for the unknown parameters. The theories developed later for section 2.2 are based on these three assumptions stated below. Condition (1). If T i T i C Σ Σ C 1 1 for all i, the MLE for μ in (2.3) reduces to ˆ . 1 1 1 M i i T i M i i T i μ C C C X (2.7) Condition (2). To guarantee T i T i C Σ Σ C 1 1 in Condition (1), we assume that i C is a circulant matrix for every i and Σ has a compound symmetry structure. The following theorem will be applied to this condition. Theorem 2.0: (Schott (1997): Theorem 7.58, page 303) Suppose that A and B are m×m circulant matrices. Then their product commutes; That is, AB = BA. Let Σ have the structure [(1 ) ], 2 p p Σ I J (2.8) 26 by rewriting the covariance matrix defined in (1.5), where 1 ( 1) p to ensure positive definiteness of Σ . Note that the eigenvalues for Σ in (2.8) are [1 ( 1) ] 2 p with multiplicity 1 and (1 ) 2 with multiplicity p 1. Thus Σ is a symmetric circulant matrix and we say Σ has compound symmetry which has been introduced in Subsection 1.2.2. For each i 1,...,M , if i C is also a circulant matrix, then we have T i T i C Σ Σ C 1 1 which results in the reduced form of μˆ shown in (2.7). Working on the log likelihood function in (2.1) with Σ of the form in (2.8), we may get the MLEs for and 2 . To find the MLEs for and 2 , first note that the determinant and inverse of Σ are, respectively, ( ) (1 ) [1 ( 1) ] 2 1 p p p Σ , and ] 1 ( 1) [ (1 ) 1 2 1 p p p Σ I J . (cf. Graybill, 1983, Theorem 8.34, page 190.) Let 2 , the log likelihood function in (2.1) becomes ( ) ( ). (1 ) 1 ( 1) 1 2 1 ( ) ( ) (1 ) 1 2 1 log ( 1) log(1 ) log[1 ( 1) ] 2 log ( , ,  ) 1 1 M i p i i T i i M i i i T i i p p p p M L constant x C μ J x C μ x C μ x C μ μ x (2.9) Let M i i i T i i B 1 ) ˆ ( ) ˆ ( 1 μ C x μ C x , and M i p i i T i i B 1 2 (x C μˆ ) J (x C μˆ ) , where p J is a p p square matrix with all elements equal to 1. To find the maximum likelihood estimators for ( ) 2 and , we take the first partial derivative of the log likelihood function in (2.9) with respect to and separately. So we have 2 2 (1 ) 1 ( 1) 1 1 2 (1 ) 1 2 log ( , ,  ) 2 2 B p B L M p μ x 27 and 2. 2 (1 ) [1 ( 1) ] 1 ( 1) 1 (1 ) 1 2 1 1 ( 1) 1 1 ( 1) 2 2 2 (1 ) [1 ( 1) ] (1 )[1 ( 1) ] {(1 )( 1) [1 ( 1) ]} 1 (1 ) 1 2 1 1 ( 1) 1 1 ( 1) 2 log ( , ,  ) 2 2 2 2 2 2 2 B p p B p M p p B p p p p B p L M p p μ x Setting 0 log ( ˆ , ,  ) L μ x and 0 log ( ˆ , ,  ) L μ x and solving for and , we have 2 1 ( 1) ˆ ˆ 1 (1 ˆ ) ˆ 1 B p B Mp (2.10) and 2 0. [1 ( 1) ˆ ] 1 ( 1) ˆ 1 ˆ(1 ˆ ) 1 1 ( 1) ˆ 1 1 ˆ ( 1) 2 2 2 B p p B p p p M (2.11) Note that ˆ in (2.10) can also be expressed as 1 ( 1) ˆ [1 ( 1) ˆ ] 1 ˆ 2 (1 ˆ ) ˆ 1 p p B B Mp . Inserting ˆ in (2.10) into (2.11) and solving for ˆ yields [1 ( 1) ˆ ] 1 2 0, [1 ( 1) ˆ ][( 1) ˆ 1 ( 1) ˆ ] 1 [ ( 1) ˆ 1 ( 1) ˆ ] 2 0 ( 1) ˆ [1 ( 1) ˆ ] 1 ˆ 2 [1 ( 1) ˆ ] 1 [1 ( 1) ˆ ] 2 0 0 1 ( 1) ˆ [1 ( 1) ˆ ] 1 [1 ( 1) ˆ ] 2 ˆ(1 ˆ ) 1 ( 1) ˆ 0 [1 ( 1) ˆ ] [1 ( 1) ˆ ] 1 [1 ( 1) ˆ ] 2 ˆ(1 ˆ ) 1 (1 ˆ )[1 ( 1) ˆ ] ( 1) ˆ 2 2 2 2 2 2 2 2 2 2 p B B p p p B p p B p p B B p B p B p p B p B Mp p p p B p B p Mp p 28 which implies 1 . 1 2 1 1 ˆ B B p (2.12) Substituting ˆ in (2.12) into (2.10) , the MLE for 2 is 1. 1 { 1 ˆ 1} (1 ˆ ) 1 2 1 ( 2/ 1 1) ˆ 1 (1 ˆ ) 1 2 1 ( 1) ˆ ˆ 1 (1 ˆ ) ˆ 1 ˆ 2 B Mp B B Mp B B B B Mp B p B Mp Hence we arrive at the following lemma. Lemma 2.1: Let ,..., ~ ( , ) 1 X X μ Σ M p i N , where μ C μ i i for all i 1,...,M , i C is circulant and [(1 ) ] 2 p p Σ I J defined in (2.8) such that T i T i C Σ Σ C 1 1 . Then the MLEs for μ , 2 , and are, respectively, M i i T i M i i T i 1 1 1 μˆ C C C X , 1 1 ˆ 2 B Mp , and 1 1 2 1 1 ˆ B B p , where M i i i T i i B 1 ) ˆ ( ) ˆ ( 1 μ C X μ C X , and M i p i i T i i B 1 2 (X C μˆ ) J (X C μˆ ) . 2.2.2 Hypothesis Testing for 0 0 H : μ μ Using LR Test In this subsection, the likelihood ratio test will be derived for 0 0 H : μ μ . Restrictions T i T i C Σ Σ C 1 1 for all i 1,...,M are still valid here and we also assume that Σ R 2 , where p p R (1)I J , and both 2 and are unknown. The following theorem states the likelihood ratio test for 0 0 H : μ μ under the above assumptions. 29 Theorem 2.1: Let ,..., ~ ( , ) 1 X X μ Σ M p i N , where μ C μ i i for all i 1,...,M , i C is circulant and [(1 ) ] 2 p p Σ I J defined in (2.8). The likelihood ratio test for testing 0 0 H : μ μ is to reject 0 H if , W C where C is such that (  ) 0 P W C H , and W is defined as: , ( ) ( ) ( ˆ ) ( ˆ ) ( ) ( ) ( ) ( ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) 1 0 0 1 1 1 1 0 0 0 0 1 1 M i p i i T i i M i p i i T i i p M i M i p i i T i i i i T i i M i M i p i i T i i i i T i i p p W X C μ J X C μ X C μ J X C μ X C μ X C μ X C μ J X C μ X C μ X C μ X C μ J X C μ where μˆ is defined in (2.7). Proof: The likelihood ratio for testing 0 0 H : μ μ is max ( , ) max ( , ) 0 μ Σ μ Σ θ θ L L , which can be further derived as follows: ˆ ( )( ) ˆ ( ˆ )( ˆ ) , 2 1 exp  ˆ   ˆ  ( ˆ ) ˆ ( ˆ ) 2 1  ˆ  exp ( ) ˆ ( ) 2 1  ˆ  exp ( ˆ ) ˆ ( ˆ ) 2 1 (2 )  ˆ  exp ( ) ˆ ( ) 2 1 (2 )  ˆ  exp max ( , ) max ( , ) 1 1 1 0 0 1 0 2 0 1 2 1 1 0 1 0 0 2 0 1 2 2 1 1 0 1 0 0 2 0 2 0 M i T i i i i M i T i i i i M M i i i T i i M M i i i T i i M M i i i T i i Mp M M i i i T i i Mp M tr tr L L Σ X C μ X C μ Σ X C μ X C μ Σ Σ Σ X C μ Σ X C μ Σ X C μ Σ X C μ Σ X C μ Σ X C μ Σ X C μ Σ X C μ μ Σ μ Σ θ θ where {( , )  is pd}, {( , )  , is pd} 0 0 μ Σ Σ μ Σ μ μ Σ . Showing 30 tr Mp M i T i i i i 1 0 0 1 0 Σˆ (X C μ )(X C μ ) and tr Mp M i T i i i i 1 1 Σˆ (X C μˆ )(X C μˆ ) from Appendix A.1 we obtain , ( ˆ ) (1 ˆ ) [1 ( 1) ˆ ] ( ˆ ) (1 ˆ ) [1 ( 1) ˆ ] ˆ ˆ 2 1 exp ˆ ˆ 2 0 1 0 2 0 2 1 2 2 0 2 2 0 M p p p p M M M M p p Mp Mp Σ Σ Σ Σ (2.13) where M i i i T i i Mp 1 0 0 2 0 ( ) ( ) 1 ˆ X C μ X C μ , 1 ( ) ( ) ( ) ( ) 1 1 ˆ 1 0 0 1 0 0 0 M i i i T i i M i p i i T i i p X C μ X C μ X C μ J X C μ , M i i i T i i Mp 1 2 ( ˆ ) ( ˆ ) 1 ˆ X C μ X C μ , and 1 ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) 1 1 ˆ 1 1 M i i i T i i M i p i i T i i p X C μ X C μ X C μ J X C μ . Using the above expressions for 2 0 ˆ , 0 ˆ , 2 ˆ , and ˆ in (2.13), we gain the likelihood ratio test as stated in this theorem (detail shown in Appendix A.2). The proof is complete. Although the likelihood ratio has been derived in Theorem 2.1, the null distribution of W in Theorem 2.1 is still not derived yet. Define 31 M i i i T i i B 1 0 0 0 1 (X C μ ) (X C μ ) and M i p i i T i i B 1 0 0 0 2 (X C μ ) J (X C μ ) . B1and B2 have been defined in Lemma 2.1. Hence W can be expressed as . 2 2 1 1 2 2 1 1 1 2 2 1 2 2 0 1 0 0 1 0 1 0 0 1 B B p B B B p B pB B B pB B B W p p p p Under the null hypothesis 0 0 H : μ μ , the exact, asymptotic, or approximate distributions of W is of our great interest. To find the exact null distribution of W , the following propositions are needed. Proposition 2.1: Under 0 0 H : μ μ , 0 0 2 1 1 B p B is distributed as a chisquare random variable with M( p 1) degrees of freedom times a constant (1 ) 2 ; that is, 2 ( 1) 2 0 0 2 (1 ) 1 1 M p d B p B . In addition, 0 0 2 1 1 1 B p B M is strongly convergent to a constant ( 1) (1 ) 2 p , that is, 2 ( 1) (1 ). 1 1 1 1 2 0 0 B p p B M wp Proof: Under 0 0 H : μ μ , the distribution of 0 X C μ i i is (0, Σ) p N , where the covariance matrix p p Σ (1 )I J 2 . It follows from Box (1954) that the quantities )( ) 1 ( ) ( 0 0 X C μ I J X C μ p p i i T i i i p Q , i 1,...,M 32 are independently, identically distributed like a p j j 1 2 1 random variable, where j ’s are the latent roots of , 1 (1 ) (1 ) (1 ) ) 1 ) (1 ) ( 1 ( 2 2 2 1 p p p p p p p p p p p p p p p p I J I J J J P Σ I J I J I J (2.14) and 2 1 ’s are independent chisquare random variables with 1 degree of freedom. Hence M i i M i p p i i T i i Q p B p B 1 1 0 0 0 0 )( ) 1 2 ( ) ( 1 1 X C μ I J X C μ is distributed as sum of M independent p j j 1 2 1 random variables. Because p p p I J 1 is symmetric idempotent, the latent roots of p p p I J 1 are 0’s or 1’s. In fact, the latent roots of p p p I J 1 are 1 with multiplicity p 1 and 0 with multiplicity 1. Therefore the latent roots of 1 P are (1 ) 2 with multiplicity p 1 and 0 with multiplicity 1. So we have 2 1 2 ~ (1 ) p iid i Q distribution (2.15) for all i 1,...,M , implying that 2 (1 ) (1 ) . 1 1 2 ( 1) 2 1 2 1 2 0 0 M p M d i p d B p B Moreover, based on SLLN in connection with (2.15), we have 33 (1 ) ( 1) ( 1) (1 ). )( ) 1 ( ) ( 1 2 1 1 1 1 2 2 2 1 0 0 0 0 E p p M p B p B M wp M i p p i i T i i X C μ I J X C μ The proof is complete. Proposition 2.2: Under 0 0 H : μ μ , 0 B2 is distributed as a chisquare random variable with M degrees of freedom times a constant 1 ( 1) 2 p p , that is, 2 2 0 2 1 ( 1) M d B p p . In addition, 0 2 1 B M is strongly convergent to a constant 1 ( 1) 2 p p , that is, 2 1 ( 1) 1 1 2 0 B p p M wp . Proof: Recall that M i p i i T i i B 1 0 0 0 2 (X C μ ) J (X C μ ) , where ( ) ( ) 0 0 X C μ J X C μ p i i T i i ’s are iid random variables. First we have 2 1 1 0 0 ) ( ) ( p j j d p i i T i i X C μ J X C μ , (2.16) where j ’s are latent roots of p p p p p P ΣJ (1 )I J J 1 ( p 1) J 2 2 2 . Note that the latent roots of p J is p with multiplicity 1 and 0 with multiplicity p 1. So j ’s are 1 ( 1) 2 p p with multiplicity 1 and 0 with multiplicity . Hence (2.16) becomes ( ) ( ) 1 ( 1) , 2 1 2 0 0 p p d p i i T i i X C μ J X C μ implying that 2 2 0 2 1 ( 1) M d B p p , (2.17) p 1 34 and 1 ( 1) (1) 1 ( 1) . ( ) ( ) 1 2 1 1 2 2 2 1 0 0 0 E p p p p M B M wp M i p i i T i i X C μ J X C μ The proof is complete. Proposition 2.3: 2 1 1 B p B is distributed as a chisquare random variable with (M 1)(p 1) degrees of freedom times a constant (1 ) 2 , that is, 2 ( 1)( 1) 2 2 (1 ) 1 1 M p d B p B . Proof: Assume that E(μˆ ) μ . So M i p p i i T i i 1 p )( ) 1 (X C μ) (I J X C μ can be written as )( ˆ ). 1 2 ( ˆ ) ( )( ˆ ) 1 2 ( ˆ ) ( 1 1 )( ) 1 ( ) ( 1 1 1 M i p p i i T i T p p M i i T i T M i p p i i T i i p p B p B p μ μ C I J X C μ μ μ C C I J μ μ X C μ I J X C μ Because both i C and p p p I J 1 are circulant matrices, i C and p p p I J 1 are commutable such that T p p p p i T i p p C I J I J )C 1 ) ( 1 ( . In connection with the fact that ( ˆ ) 0 1 M i i i T i C X C μ , we have )( ˆ ) 0 1 ( ˆ ) ( 1 M i p p i i T i T p μ μ C I J X C μ implying that 35 )( ˆ ). 1 ( ˆ ) ( )( ) 1 2 ( ) ( 1 1 1 1 μ μ C C I J μ μ X C μ I J X C μ p p M i i T i T M i p p i i T i i p p B p B (2.18) From Proposition 2.1 we have 2 ( 1) 2 1 )( ) (1 ) 1 ( ) ( M p M d i p p i i T i i p X C μ I J X C μ . Also we need to know the distribution of the second term of the last expression in (2.18). Because we have that ˆ ~ ( , ) 1 1 μ μ 0 C C Σ M i i T i MVN , the quadratic form )( ˆ ) 1 ( ˆ ) ( 1 μ μ C C I J μ μ p p M i i T i T p is distributed like the quantity p j j 1 2 1 , where j ’s are the latent roots of the matrix ) 1 ( 1 1 1 3 p p M i i T i M i i T i p P C C Σ C C I J . Note that Σ and M i i T i 1 C C commute so 3 1 ) 1 P Σ (I J P p p p , where 1 P is defined in (2.14). Hence we have 2 1 2 1 )( ˆ ) (1 ) 1 ( ˆ ) ( p d p p M i i T i T p μ μ C C I J μ μ . Note that 2 1 1 B p B and )( ˆ ) 1 ( ˆ ) ( 0 1 0 μ μ C C I J μ μ p p M i i T i T p are independent chisquare random variables since 2 1 1 B p B is a function of X C μˆ i i ’s , also X C μˆ i i and μˆ are 36 independent due to the fact that (X C μˆ , μˆ ) 0 i i Cov . In addition, using the result of sum of two independent chisquare random variables, we have 2 (1 ) . 1 1 2 ( 1)( 1) 2 M p d B p B The proof is complete. Proposition 2.4: B2 is distributed as a chisquare random variable with M 1degrees of freedom times a constant 1 ( 1) 2 p p ; that is, 2 1 2 2 1 ( 1) M d B p p . Proof: Assume that E(μˆ ) μ . Using the fact that ( ˆ ) 0 1 M i i i T i C X C μ , we have the expression for B2 that ( ) ( ) ( ˆ ) ( ˆ ). 2 ( ˆ ) ( ˆ ) 1 1 1 X C μ J X C μ μ μ C C J μ μ X C μ J X C μ p M i i T i T M i p i i T i i M i p i i T i i B The second term ( ˆ ) ( ˆ ) 1 μ μ C C J μ μ p M i i T i T of the last expression above is distributed as the quantity p j j 1 2 1 where j ’s are the latent roots of the matrix 2 1 1 1 4 P C C Σ C C J ΣJ P p p M i i T i M i i T i , where 2 P is previously defined in the proof of Proposition 2.2. Hence ( ˆ ) ( ˆ ) 1 ( 1) . 2 1 2 1 p p d p M i i T i T μ μ C C J μ μ 37 Since B2 and ( ˆ ) ( ˆ ) 1 μ μ C C J μ μ p M i i T i T are independent and we have from Proposition 2.2 that 2 2 1 ( ) ( ) 1 ( 1) M M d i p i i T i i p p X C μ J X C μ , B2 is distributed as the quantity 2 1 2 1 ( 1) M p p . The proof of Proposition 2.4 is complete. The following proposition can be used to show independence of 2 1 1 B p B and B2 required when finding the exact null distribution of the likelihood ratio test statistic W stated in Theorem 2.1. Proposition 2.5: Let Y X C μˆ i i i , M i i T i M i i T i 1 1 1 μˆ C C C X , p p A I J 1 , p p B J 1 , M i i T i S 1 Y AY A and M i i T i S 1 Y BY B . Then i T i Y AY and j T j Y BY are independent for all i and j and hence A S and B S are independent. Remark 2.1: Since i Y is a linear combination of M X vec X ,..., X 1 it can be expressed as Y M X i i , where i M is a pMp matrix with the structure T i M T i i T p i i T i i T i i M C Q C C Q C I C Q C C Q C C Q C 1 1 1 1 1 1 1 1 , where M i i T i 1 Q C C . Rewrite both A S and B S we have 38 M i i T i S 1 M X A M X A and M i i T i S 1 M X B M X B , X is distributed as a multivariate normal ( ,..., , ) 1 C μ C μ Σ Mp M M MVN vec I . And we have X M AM X A M i i T i T S 1 and X M BM X B M i i T i T S 1 . To show that A S and B S are independent, it suffices to show that M AM Σ M BM 0 M i i T M i M i i T i I 1 1 , (2.19) where Σ has compound symmetry with the structure [(1 ) ] 2 p p Σ I J and is circulant. The calculation of the matrix M i i T i M i i T i 1 1 M AM Σ M BM is complicated so another way to prove Proposition 2.5 is to show first that i T i Y AY and j T j Y BY are independent for all i and j. Proof of Proposition 2.5: Because both A and B are symmetric and idempotent, we may rewrite i T i Y AY and j T j Y BY respectively by i T i T i i T i Y AY Y AAY (AY ) AY , and i T i T i i T i Y BY Y BBY (BY ) BY . Note that i T i Y AY and j T j Y BY are squared lengths of i AY and j BY , respectively. So we only have to show that i AY and j BY are independent. Consider the distribution of the random vector 39 X BM AM BM X AM X BY AY j i j i j i , where i M is defined in Remark 2.1. j i BY AY is a linear combination of X which is normal so j i BY AY is normal. Thus showing (AY ,BY ) 0 i j Cov implies that i AY and j BY are independent normal random vectors then the proof is done. Since B is symmetric, we have (AY ,BY ) A (Y ,Y )B A (Y ,Y )B i j T i j i j Cov Cov Cov . Thus it suffices to show that ( , ) i j Cov Y Y is a circulant matrix so A , B , and ( , ) i j Cov Y Y commute implying that A (Y ,Y )B (Y ,Y )AB 0 i j i j Cov Cov , using the fact that AB 0 . To show that ( , ) i j Cov Y Y is a circulant matrix, we may use a direct proof. We have ( , ) ( ) ( ) ( ˆ ) , ( , ) ( )( ) ( ) ( ) ( ˆ ) ( , ) ( , ) ( , ) ( ˆ ) ( , ) ( , ˆ ) ( ˆ , ) ( ˆ , ˆ ) ( , ) ( , ˆ ) ( ˆ , ) ( ˆ , ˆ ) ( , ) ( ˆ , ˆ ) 1 1 1 1 1 1 1 1 T j i j T i j T i j i i j T j i j T i j T j T T i j i i T j i j M i i T i i T j M i i T i j i i T i j i j T i j i j i j i j i j i j i j i i j j Cov Var Var Var Cov Var Var Var Cov Cov Cov Var Cov Cov Cov Cov Cov Cov Cov Cov Cov Cov X X X C Q C C Q C X C μ C X X X Q C C C Q C X C μ C X X X Q C X C C Q C X X C μ C X X X μ C C μ X C μ μ C X X X C μ C μ X C μ C μ Y Y X C μ X C μ where M i i T i 1 Q C C . Note that ( , ) i j Cov X X = Σ if i j and 0 otherwise. Also from Section 2.3.1 we have μ Q Σ 1 ( ˆ ) Var and the fact that Σ , i C , and Q are circulant matrices implying that their inverse and transpose are also circulant so the commutability holds. Hence ( , ) i j Cov Y Y becomes 40 ( ) , if . ( ) , if , if , if ( , ) 1 1 1 1 1 1 1 1 i j I i j i j i j Cov T i j T i i T i j T i j T i j T i i T i i T i i i j C Q C Σ C Q C Σ 0 ΣC Q C C Q C Σ C Q ΣC Σ ΣC Q C C Q C Σ C Q ΣC Y Y Therefore ( , ) i j Cov Y Y is circulant. The proof of Proposition 2.5 is complete. Now, it is time to state and prove the following main results using Propositions 2.1 – 2.5. Theorem 2.2: The likelihood ratio test for testing 0 0 H : μ μ in Theorem 2.1 is to reject 0 H if W C , where C is such that (  ) 0 P W C H , and W is expressed as , 2 2 1 1 2 2 1 1 1 2 2 1 2 2 1 1 0 1 0 0 1 0 1 0 0 1 A C B D B B p B B B p B pB B B pB B B W p p p p p p where M i i i T i i B 1 1 (X C μˆ ) (X C μˆ ) and M i p i i T i i B 1 2 (X C μˆ ) J (X C μˆ ) , M i i i T i i B 1 0 0 0 1 (X C μ ) (X C μ ) and M i p i i T i i B 1 0 0 0 2 (X C μ ) J (X C μ ) , 0 0 2 1 1 B p A B , 2 1 1 B p B B , 0 C B2 , D B2 . Furthermore, under 0 0 H : μ μ , W is distributed as the random variable ** 1 * 1 1 1 1 1 1 1 F M F M p , 41 where * F and ** F are independent and distributed as p1,(M1)( p1) F , and 1,M1 F random variables, respectively. Proof: Recall from the proofs of propositions 2.1  2.4 that A B R, (2.20) where )( ˆ ) 1 ( ˆ ) ( 0 1 0 μ μ C C I J μ μ p p M i i T i T p R . Also C D S , (2.21) where ( ˆ ) ( ˆ ) 0 1 0 μ μ C C J μ μ p M i i T i T S . If we can show that B, R, D, and S are mutually independent, combined with the following facts (7), (8), (9), and (10), then the proof is done. Note that Facts (1)  (6) for showing pairwise independence among B, R, D, and S are sufficient for showing mutual independence among them. The facts needed to prove this theorem are shown below: (1) B and R are independent, (2) D and S are independent, (3) B and D are independent (Proposition 2.5), (4) B and S are independent, (5) R and D are independent, (6) R and S are independent, 42 (7) )( ˆ ) (1 ) (( 1)( 1)) 1 2 ( ˆ ) ( 1 1 2 2 1 M p p B p B B M d i p p i i T i i X C μ I J X C μ , (8) )( ˆ ) (1 ) ( 1) 1 ( ˆ ) ( 2 2 0 1 0 p p R d p p M i i T i T μ μ C C I J μ μ , (9) 2 ( ˆ ) ( ˆ ) 1 ( 1) ( 1) 2 2 1 D B p p M M d i p i i T i i X C μ J X C μ , and (10) ( ˆ ) ( ˆ ) ~ 1 ( 1) (1) 2 2 0 1 0 S p p p M i i T i T μ μ C C J μ μ . First, Facts (1), (2), (4), and (5) hold due to the facts that X C μˆ i i and μˆ are independent for each i. Fact (6) is true because I J J 0 p p p p ) 1 ( . Fact (3) is the result of Proposition 2.5. Facts (7) and (9) are direct results of Propositions 2.3 and 2.4, respectively. Facts (8) and (10) are shown in the proofs of Proposition 2.3 and 2.4, respectively. Hence the result that R , S , B , and D are independent in connection with the expressing of W D S B B R D S R B D W p p p 1 1 1 ( ) ( ) 1 1 1 fulfills the proof of Theorem 2.2. 2.2.3 Properties and Useful Results about ML Estimators In addition to the likelihood ratio test for testing 0 0 H : μ μ , the null distribution of the statistic ) ˆ ( )] ˆ ( ˆ ( ˆ ) [ 0 1 0 μ μ μ μ μ Var (2.22) also draws our attention. The exact null distribution of (2.22) is not easy to obtain while we may at least find its asymptotic distribution. First note that 43 Σ C C μ ˆ ) ' ( ) ˆ ( ˆ 1 1 M i i i Var , and p p p Σ I J 1 ( 1) ˆ ˆ ˆ (1 ˆ ) ˆ 1 2 1 . The quadratic form (2.22) can be phrased as: ˆ ( 1) ( ˆ ) ˆ ( ˆ ) 2 0 1 1 0 M M p M i i T i T μ μ C R C μ μ , (2.23) where p p p R I J 1 ( 1) ˆ ˆ (1 ˆ ) ˆ 1 1 . The following propositions are helpful for developing an approximate distribution of the statistic in (2.22) under the hypothesis 0 0 H : μ μ . Details of the derivation of the approximate null distribution of (2.22) will be shown in Subsection 2.2.4. Before deriving the approximate null distribution of the statistic in (2.22), let us first look at the following proposition about the MLE of 2 . Proposition 2.6: Let M i i i T i i B 1 1 (X C μˆ ) (X C μˆ ) . Then 1 2 1 ˆ B Mp is the MLE of 2 , and 2 2 ˆ 1 ˆˆ M M is an unbiased estimator for 2 . In addition the following results hold. a) 2 1 2 ) 1 1 ( 1 (ˆ ) M B Mp E E , and 2 1 2 ( 1) 1 ) ˆˆ ( B M p E E . 44 b) ) 1 [1 ( 1) ] ( 2( 1) ( ˆ ) 2 4 2 2 M p O M p M V , and ) 1 [1 ( 1) ] ( ( 1) 2 ) ˆˆ ( 2 2 4 M p O M p V . c) Both 2 ˆ and 2 ˆˆ are consistent estimators of 2 ; that is, 2 2 ˆ p and 2 2 ˆˆ p . Proof: Recall that the MLE of μ is M i i T i M i i T i 1 1 1 μˆ C C C X . To find E(B1) , ( ˆ ) 2 E , ( 1 ) 2 E B , and ( ˆ ) 2 V , recall that B1 can be expressed as 1 ( ) ( ) ( ˆ ) ( ' )( ˆ ), 1 1 X C μ X C μ μ μ C C μ μ M i i i M i i i T i i B and the following result (cf. Christenson (2002), Theorem 1.3.2) is needed. If E(Y) μ and Cov(Y) V then E(Y'AY) tr(AV) μ'Aμ. So we have ( ) ( ) ( 1) , [ ( ) 0] ( ' )( ' ) 0 ( 1) ( ) ( ) ( ˆ ) ( ' )( ˆ ) 2 1 1 1 1 1 1 Mtr tr M p tr tr E B E E M i i i M i i i M i M i i i M i i i T i i Σ Σ Σ C C C C Σ X C μ X C μ μ μ C C μ μ which implies that 2 2 ) 1 1 (1 1 (ˆ ) M B Mp E E . Next, ( ˆ ) ( ' )( ˆ ) 2 , 2 ( ) ( ) ( ˆ ) ( ' )( ˆ ) ( 1 ) ( ) ( ) 2 1 1 1 2 1 2 E A B C E E B E M i i i M i i i M i i i T i i M i i i T i i μ μ C C μ μ X C μ X C μ μ μ C C μ μ X C μ X C μ 45 where A, B, and C are, respectively, given by 2 ( ) ( ) 2 ( ), 2 ( ) ( ) ( ) ( ) ( ) ( ) (Neudecker & Magnus (1979),Theorem 4.2) ( ) ( ) 2 2 2 2 2 2 2 1 2 2 1 Σ Σ Σ Σ Σ X C μ X C μ X C μ X C μ X C μ X C μ X C μ X C μ M tr tr M M tr M tr Mtr E E E A E j j T i i j j T i i i j M i i i T i i M i i i T i i ( ) ( ) ( ˆ ) ( ' )( ˆ ) 1 1 X C μ X C μ μ μ C C μ μ M i i i M i i i T i i B E , and ( ) 2 ( ). ( ˆ ) ( ' )( ˆ ) 2 2 2 1 Σ Σ μ μ C C μ μ tr tr C E M i i i Let us attend to the representation of B. Define M i i i 1 'C C Q , we have M i i T i 1 1 μˆ Q C X . Thus the quadratic form (μˆ μ)Q(μˆ μ) can be rewritten as: ( ) ( ). ( ) ( ) ( ) ( ) ( is symmetric, circulant) ( ˆ ) ( ˆ ) ( ) ( ) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M i M j j j T i j T i i M i i T i M i i T i T M i i T i M i i T i M i i T i T M i i T i M i i T i T M i i T i C C C C X μ C Q C X μ C X C μ Q C X C μ C X Qμ Q C X Qμ Q μ μ Q μ μ Q C X Q Qμ Q Q C X Q Qμ So B can be expressed as 46 ( ) ( ) ( ) ( ) . ( ) ( ) ( ) ( ) 1 1 1 1 1 1 1 1 M i M j M k j j T i j T k k i i T k k M i M j j j T i j T i i M k k k T k k E C C B E C C X C μ X C μ X μ C Q C X μ X C μ X C μ X μ C Q C X μ (2.24) Consider the term in (2.24): ( ) ( ) ( ) ( ) 1 X C μ X C μ X μ C Q C X μ j j T i j T k k i i T k k E C C . (2.25) To calculate (2.25), the results of Magnus (1979) can be applied to the following two cases. Case 1: i j , i j k Σ C Q C Σ Σ C Q C Σ X C μ X C μ X C μ C Q C X C μ ( ) 2 ( ) ( ) ( ) ( ) ( ) 1 1 1 T k k T k k k k T k k T k k k k T k k tr tr tr E (2.26) For i, j such that i j k Σ C Q C Σ X C μ X C μ X C μ C Q C X C μ X C μ X C μ X C μ C Q C X C μ ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 1 1 T i i i i T i i T k k i i T k k i i T i i T k k i i T k k tr tr E E E (2.27) Case 2: i j , In this case, only one of i and j equal to k, or neither of them equal to k. For these two scenarios, (2.25) is equal to zero. That is, ( ) ( ) ( ) ( ) 0 1 X C μ X C μ X μ C Q C X μ j j T i j T k k i i T k k E C C . (2.28) Thus (2.24) becomes ( ) 2 ( ) ( 1) ( ) 2 ( ) 2 2 2 2 2 B trΣ tr Σ M trΣ M trΣ tr Σ (2.29) So we have 47 ( 1) [ ] 2( 1) [1 ( 1) ] , ( 1) 2( 1) ( ) 2 ( ) 2 2 ( ) ( ) 2 ( ) ( 1 ) 2 2 2 2 2 4 2 2 2 2 2 2 2 2 2 2 2 M p M p p M tr M tr M tr M tr M tr tr tr tr E B A B C Σ Σ Σ Σ Σ Σ Σ Σ and 2( 1) [1 ( 1) ] . 2( 1) ( ) ( 1) 2( 1) ( ) ( 1) ( 1) ( 1 ) ( 1) 2 4 2 2 2 2 2 2 2 2 M p p M tr M tr M tr M tr V B E B E B Σ Σ Σ Σ Hence we have [1 ( 1) ] , 1 ) 2 1 1 ( 1 [( ˆ ) ] 2 4 2 2 4 2 2 2 2 2 p M p M M p M B E E yielding ). 1 [1 ( 1) ] ( 2( 1) ) 1 [1 ( 1) ] (1 1 ) 2 1 (1 ( ˆ ) [( ˆ ) ] [ ˆ ] 2 4 2 2 4 2 4 2 2 4 2 2 2 2 2 M p O M p M M p M p M M V E E Therefore we have that 2 2 ˆ p . The proof is complete. Remark 2.2: Proposition 2.6 (a) and (b) can be shown more effortlessly by using the results about the distribution of B1 which will be stated in Theorem 2.3 later in this subsection. Theorem 2.3 (a) states that B1 is distributed as the quantity 2 ( 1)( 1) 2 (1 ) M p + 2 ( 1) 2 [1 ( 1) ] M p , where 2 (M1)( p1) and 2 (M1) are independent chisquared random variables with (M 1)( p 1) and (M 1) degrees of freedom, respectively. Hence the results 48 ( 1) , ( 1)( 1)(1 ) ( 1)[1 ( 1) ] ( ) [ (1 ) [1 ( 1) ] ] 2 2 2 2 ( 1) 2 2 ( 1)( 1) 2 1 M p M p M p E B E p M p M and 2 4 4 2 2 2 4 2 4 2 ( 1) 2 2 ( 1)( 1) 2 1 2( 1) [1 ( 1) ] 2( 1) {( 1)(1 ) [1 ( 1) ] } 2( 1)( 1)(1 ) 2( 1)[1 ( 1) ] ( ) [ (1 ) [1 ( 1) ] ] M p p M p p M p M p Var B Var p M p M obtained from Theorem 2.3 (a) are exactly the results of Proposition 2.6 (a) and (b), respectively. The following proposition is helpful to prove Theorem 2.3 (a). Proposition 2.7: (a) If p j j ij iid i A X 1 ~ , i 1,...,M , where ij X are independent 2 random variables with 1 degree of freedom. Then p j j M M d i i A 1 2 1 . (b) If 2 2 2 ( 1) 2 (1 ) [1 ( 1) ] M p M d A p , where 2 M( p1) and 2 M are independent, 2 1 2 2 1 2 (1 ) [1 ( 1) ] C p p d , where 2 p1 and 2 1 are independent, A BC , where B and C are independent, then B is distributed as the quantity 2 1 2 2 ( 1)( 1) 2 (1 ) [1 ( 1) ] M p M p . Proof: (a) Let j Y be independent chisquared random variables with M degrees of freedom. The moment generating function of M i i A 1 is 49 (1 2 ) ( ) ( ) ( ) ( ), ( ) (1 2 ) ( ) ( ) ( ) ( ) 1 1 1 1 1 1 1 1 / 2 1 1/ 2 1 1 1 t M t E e E e M t M t t M t M t E e E e p j j j p j j j j j j ij p j j ij p j j ij i M i i Y p t Y j t Y p j Y j p j M j M p j j M p j X j M M t X i M X i A A which is the moment generating function of the random variable p j j j Y 1 . (b) Since B and C are independent, we have the moment generating of A which can be expressed asM (t) M (t) M (t)M (t) A B C B C . The mgf of A is 1 2 (1 ) 1 2 [1 ( 1) ] . ( (1 )) ( [1 ( 1) ]) ( ) ( ) ( ) ( ) 2 ( 1) / 2 2 / 2 2 2 (1 ) [1 ( 1) ] (1 ) [1 ( 1) ] 2 2 ( 1) 2 2 2 ( 1) 2 2 2 2 ( 1) 2 M p M A p p t t p M t M t p M t M t M t M t M p M M p M M p M The mgf of C is 1 2 (1 ) 1 2 [1 ( 1) ] . ( (1 )) ( [1 ( 1) ]) ( ) ( ) ( ) ( ) 2 ( 1) / 2 2 1/ 2 2 2 (1 ) [1 ( 1) ] (1 ) [1 ( 1) ] 2 1 2 1 2 1 2 2 1 2 2 1 2 2 1 2 t t p M t M t p M t M t M t M t p C p p p p p Thus the mgf of B is 2 ( 1)( 1) / 2 2 ( 1) / 2 2 ( 1) / 2 2 1/ 2 2 ( 1) / 2 2 / 2 1 2 (1 ) 1 2 [1 ( 1) ] 1 2 (1 ) 1 2 [1 ( 1) ] 1 2 (1 ) 1 2 [1 ( 1) ] ( ) ( ) ( ) M p M p M p M C A B t t p t t p t t p M t M t M t which is the mgf of 2 1 2 2 ( 1)( 1) 2 (1 ) [1 ( 1) ] M p M p random variable, where 2 (M1)( p1) and 2 M1 are independent chisquared random variables with (M 1)(p 1) and M 1degrees of freedom, respectively. The proof is complete. 50 Proposition 2.7 will be used to prove the following theorem. Theorem 2.3: (a) M i i i T i i B 1 1 (X C μˆ ) (X C μˆ ) is distributed as the quantity 2 1 2 2 ( 1)( 1) 2 (1 ) [1 ( 1) ] M p M p , where 2 (M1)( p1) and 2 M1 are independent chisquared random variables with (M 1)( p 1) and (M 1) degrees of freedom, respectively. (b) B1has an approximate 2 ( 1) 2 M h g distribution, where (1) p p p g 2 2 ( 1)(1 ) [1 ( 1) ] , and (2) 2 2 2 ( 1)(1 ) [1 ( 1) ] p p p h . Proof: (a) Recall that B1 can be expressed as 1 ( ) ( ) ( ˆ ) ( ' )( ˆ ). 1 1 X C μ X C μ μ μ C C μ μ M i i i M i i i T i i B (2.30) The first term of the last expression in (2.30) has the same distribution as that of sum of M independent random variables 2 1 1 p j j ,where 2 1 are independent chisquare random variables with 1 degree of freedom. j ’s are eigenvalues of [(1 ) ] 2 p p Σ I J . The eigenvalues of Σ are (1 ) 2 with multiplicity p 1and 1 ( 1) 2 p with multiplicity 1. Thus M i i i T i i 1 (X C μ) (X C μ) is distributed as M independent random variables each of which is 2 1 2 2 1 2 (1) 1 ( 1) p p ; that is, 51 2 2 2 ( 1) 2 1 ( ) ( ) (1 ) 1 ( 1) M p M M d i i i T i i p X C μ X C μ . Similarly for the second term of the last expression in (2.30), ( ˆ ) ( ' )( ˆ ) 1 μ μ C C μ μ M i i i is distributed like the quantity 2 1 2 2 1 2 (1) 1 ( 1) p p because μˆ μ is distributed as ( , ( ' ) ) 1 1 0 C C Σ M i i i N , implying that ( ˆ ) ( ' )( ˆ ) 1 μ μ C C μ μ M i i i is distributed as the quantity p j j 1 2 1 ,where j ’s are eigenvalues of C C C C Σ Σ 1 1 1 ( ' )( ' ) M i i i M i i i . Since B1and ( ˆ ) ( ' )( ˆ ) 1 μ μ C C μ μ M i i i are independent, based on Proposition 2.7 B1is distributed as the quantity 2 1 2 2 ( 1)( 1) 2 (1 ) [1 ( 1) ] M p M p , where 2 (M1)( p1) and 2 M1 are independent chisquared random variables with (M 1)( p 1) and (M 1) degrees of freedom, respectively. The proof of part (a) is complete. (b) Box (1954, Theorem 3.1) showed that p j p i j i g 1 1 2 / and p i i p j j h 1 2 2 1 / are chosen so that the distribution of p j j 1 2 1 has the same first two moments as 2 2 h g . Since B1 is distributed like sum of M 1 independent random variables 2 1 1 p j j , B1 has an approximate 2 ( 1) 2 M h g distribution. The proof of Theorem 2.3 (b) is complete. Corollary 2.1: The test statistic for testing 2 0 2 0 H : is 52 M i i i T i i Mp 1 2 ˆ (X C μˆ ) (X C μˆ ) . Under 0 H , 2 ( 1) ˆ . 2 0 2 ~ ˆ ˆ M h g Mp , where p p p g 2 2 ( 1)(1 ˆ ) [1 ( 1) ˆ ] ˆ , and 2 2 2 ( 1)(1 ˆ ) [1 ( 1) ˆ ] ˆ p p p h . Proof: It follows directly from Theorem 2.3. The maximum likelihood estimator of is biased while its approximate mean is and the approximate variance can also be obtained. Some results about the maximum likelihood estimator of are shown in the next proposition. Proposition 2.8: The MLE of , namely 1) 1 2 ( 1 1 ˆ B B p , where M i i i T i i B 1 ) ˆ ( ) ˆ ( 1 μ C X μ C X and M i p i i T i i B 1 2 (X C μˆ ) J (X C μˆ ) , has the following properties: a) , [1 ( 1) ] 1 ( 1) ( 1, 2) 2 p p p Corr B B b) 1) 1 2 ( 1 1 ( ˆ ) B B p E E , ( 1) [1 ( 1) ] (1 ) 1 2 ( ˆ ) 2 2 p p p M V , and c) ˆ in probability. Proof: Recall that 53 ( ) ( ) ( ˆ ) ( ˆ ), 1 ( ˆ ) ( ˆ ) 1 1 X C μ X C μ μ μ Q μ μ X C μ X C μ T M i i i T i i M i i i T i i B (2.31) ( ) ( ) ( ˆ ) ( ˆ ), 2 ( ˆ ) ( ˆ ) 1 1 X C μ J X C μ μ μ QJ μ μ X C μ J X C μ p T M i p i i T i i M i p i i T i i B (2.32) and M i M j j j T i j T i i C C 1 1 1 (μˆ μ) Q(μˆ μ) (X μ) C Q C (X μ), (2.33) where M i i T i 1 Q C C . Similarly, ( ˆ ) ( ˆ ) ( ) ( ). 1 1 1 M i M j p j j T i j T p i i T μ μ QJ μ μ X C μ C Q C J X C μ (2.34) Note that i C , 1 Q , T j C , and p J commute with each other since all of them are circulant matrices. The commutation property will be used when necessary in calculations. So Cov(B1,B2) can be expressed as , ( ˆ ) ( ˆ ), ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ), ( ) ( ) ( ) ( ), ( ˆ ) ( ˆ ) ( 1, 2) ( ) ( ), ( ) ( ) 1 1 1 1 D E F G Cov Cov J Cov Cov B B Cov p M i p i i T i i p M i i i T i i M i p i i T i i M i i i T i i μ μ Q μ μ μ μ QJ μ μ μ μ Q μ μ X C μ X C μ X C μ X C μ μ μ QJ μ μ X C μ X C μ X C μ J X C μ The derivations of D, E, F, and G are shown below. 54 ( ) ( ) 2 ( ) ( ) ( ) 2 ( ) 2 [1 ( 1) ] , ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ), ( ) ( ) if , covariance is zero. ( ) ( ), ( ) ( ) ( ) ( ), ( ) ( ) 2 2 4 1 1 1 1 1 1 1 tr tr tr tr tr M tr Mp p E E E Cov i j Cov D Cov p M i p p p M i p i i T i i i i T i i p i i T i i i i T i i M i p i i T i i i i T i i M i M j p j j T i i j j T i i M j p j j T j j M i i i T i i Σ J Σ Σ J Σ Σ J Σ J Σ X C μ X C μ X C μ J X C μ X C μ X C μ X C μ J X C μ X C μ X C μ X C μ J X C μ X C μ X C μ X C μ J X C μ X C μ X C μ X C μ J X C μ 2 ( ) 2 ( ) 2 ( ), ( ) ( ) 2 ( ) ( ) ( ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ), ( ˆ ) ( ˆ ) 1 1 2 1 1 1 1 1 1 QQ ΣQJ Q Σ ΣJ Σ J Σ QQ Σ QJ Q Σ QQ ΣQJ Q Σ QQ Σ QJ Q Σ μ μ Q μ μ μ μ QJ μ μ μ μ Q μ μ μ μ QJ μ μ μ μ Q μ μ μ μ QJ μ μ p p p p p p p p p tr tr tr tr tr tr tr tr E E E G Cov if , covariance is zero ( ) ( ),( ) ( ) ( ) ( ), ( ) ( ) ( ) ( ),( ) ( ) ( ) ( ), ( ) ( ) ( ) ( ), ( ˆ ) ( ˆ ) , , 1 1 , , 1 1 1 1 1 1 1 1 1 1 1 i j Cov C C C C Cov C C C C Cov C C C C Cov C C E Cov M i j k i j k p j j T i j T k k i i T k k M i j k i j k p j j T i j T k k i i T k k M i M j M k p j j T i j T k k i i T k k M i M j p j j T i j T i i M i i i T i i p M i i i T i i X μ X μ X μ C Q C J X μ X μ X μ X μ C Q C J X μ X μ X μ X μ C Q C J X μ X C μ X C μ X μ C Q C J X μ X C μ X C μ μ μ QJ μ μ 55 2 ( ) 2 ( ) 2 ( ). ( ) ( ) 2 ( ) ( ) ( ) 0 ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ), ( ) ( ) ( ) ( ), ( ) ( ) 1 2 1 1 1 1 1 1 , 1 1 1 1 1 1 , 1 1 1 1 Σ C C Q J Σ Σ QQ J Σ J Σ Σ C C Q J Σ Σ C C Q J Σ Σ C C Q J Σ X μ X μ X μ C C Q J X μ X μ X μ X μ C C Q J X μ X μ X μ X μ C C Q J X μ X μ X μ X μ C C Q J X μ X μ X μ X μ C C Q J X μ X μ X μ X μ C C Q J X μ p p M i i p T i M i i p T i p i T i p i T i M i k i k i p i i T i T k k i i T k k i p i i T i T k k i i T k k M i i p i i T i T i i i i T i i i p i i T i T i i i i T i i M i k i k i p i i T i T k k i i T k k M i i p i i T i T i i i i T i i tr tr tr tr tr tr tr tr E C C E C C E C C C C E C C E C C E C C C C Cov C C C C Cov C C C C Similarly, ( ˆ ) ( ˆ ), ( ) ( ) 2 ( ) 2 1 μ μ Q μ μ X C μ X C μ J Σ p M i p i i T i i F Cov J tr . So D, E, F, and G are, respectively, 2 ( ) 2 [1 ( 1) ] , 2 2 4 D M tr Mp p p J Σ and 2 ( ) 2 [1 ( 1) ] . 2 2 4 E F G tr p p p J Σ Therefore, Cov(B1,B2) and Corr(B1,B2) are respectively 2( 1) ( ) 2( 1) [1 ( 1) ] , ( 1, 2) 2 2 4 M tr M p p Cov B B D E F G p J Σ (2.35) and . [1 ( 1) ] 1 ( 1) 2( 1) [1 ( 1) ] 2( 1) [1 ( 1) ] 2( 1) [1 ( 1) ] ( 1) ( 2) ( 1, 2) ( 1, 2) 2 2 4 2 2 4 2 4 p p p M p p M p p M p p Var B Var B Cov B B Corr B B 56 Finally, we may compute the approximate mean and variance of 2 1 B / B using the firstorder Taylor’s series in two variables f (x, y) x / y , y 0 . Hence we have 1 ( 1) , [1 ( 1) ] ( 1) ( ) ( 1) ( ) ( 1) ( 2) ) 1 2 ( 2 2 p p p p M tr M tr E B E B B B E p Σ J Σ (2.36) and , [1 ( 1) ] ( 1)(1 ) 1 2 [1 ( 1) ] 2[1 ( 1) ] ( 1) 2[1 ( 1) ] ( 1) 4[1 ( 1) ] ( 1) 2[1 ( 1) ] 1 2 [1 ( 1) ] ( 2) ( 1) ( 2, 1) 2 ( 1) ( 1) ( 2) ( 2) ( 1) ( 2) ) 1 2 ( 2 2 2 2 2 2 2 2 2 2 p p p M p p p M p p M p p M p p M p E B E B Cov B B E B V B E B V B E B E B B B V (2.37) implying that ˆ in probability. The proof is complete. The following theorem states the exact distribution of the MLE of . Theorem 2.4: The MLE of , say 1) 1 2 ( 1 1 ˆ B B p with M i i i T i i B 1 1 (X C μˆ ) (X C μˆ ) and M i p i i T i i B 1 2 (X C μˆ ) J (X C μˆ ) is distributed as the quantity 1 1 1 ( 1) 1 (1 )( 1) 1 (M 1)( p 1),M 1 F p p p p . 57 Remark 2.3: ˆ is between ( 1) 1 p and 1 since the ratio 1 2 B B is between 0 and p. To show this, first we have that 0 1 2 B B implying 1 ( 1) p since B1 0and B2 0 for nonzero vectors X C μˆ i i . Secondly, consider the identity i i i p i i p p i x x x ( p )x x ( p )x 1 1J I J . Since all the three quantities i i xx , i p i x ( p )x 1J , and i p p i x ( p )x 1I J are positive for nonzero vectors i x , the inequality M i i p i M i i i x x x p x 1 1 1 ( J ) holds and it implies that x x x x p M i M i p i i i i 1 1 J / . Hence 1. Proof of Theorem 2.4: Recall from (2.31) and (2.32) that 1 ( ) ( ) ( ˆ ) ( ˆ ), 1 μ μ Q μ μ μ C X μ C X T M i i i T i i B and 2 ( ) ( ) ( ˆ ) ( ˆ ), 1 μ μ QJ μ μ μ C X J μ C X p T M i p i i T i i B where M i i T i 1 Q C C . And we have B1 B1(1/ p)B2(1/ p)B2 . Since from Proposition 2.5 B1(1/ p)B2and (1/ p)B2 are independent, and from Propositions 2.3 and 2.4 B1(1/ p)B2 and (1/ p)B2 are distributed as (1 ) (( 1)( 1)) 2 2 M p and 1 ( 1) ( 1) 2 2 p M random variables, respectively, we have , 1 (1/ ) 2 1 (1/ ) 2 (1/ ) 2 1 (1/ ) 2 2 1 2 p B B p B p B p B p B B B B which is distributed as 1 ( 1)( 1), 1 ( 1) 1 1 ( 1) 1 M p M p F p p random variable where 58 (M1)( p1),M1 F is F random variable with (M 1)(p 1) and M 1 as the numerator and denominator degrees of freedom, respectively. Thus ˆ is distributed as the quantity 1 1 1 ( 1) 1 (1 )( 1) 1 (M 1)( p 1),M 1 F p p p p . The proof is complete. For the rest of this subsection, a simulation study is performed to investigate the behavior of ˆ based on the distribution of ˆ obtained from Theorem 2.4. Figure 1 and Figure 2 show the expectation and the standard deviation of ˆ , the MLE of , for each value ( ( 1) ,1) 1 p via a simulation study with various combinations of dimensions p 2, 3, 4, 5, 6, 7 and sample sizes M 2, 3, 5, 10, 20, 50, 100. Note that the starting points of on the xaxis are different for various p values since the restriction on is 1 ( 1) p due to the requirement of a positive definite compound symmetry covariance matrix structure. Summarizing the information provided from Figure 1 and Figure 2 we have the following results: About the expectation of ˆ : (1) When 0 , the MLE ˆ is unbiased. This can also be verified by looking at the pdf of ˆ stated in Theorem 2.4 for the special case that 0 . With 0 , ˆ is distributed like the random variable ( , ) 1 1 1 p Beta p , where (M 1) / 2 , (M 1)(p 1) / 2 , and Beta(, ) is the beta random variable . Therefore ˆ is unbiased since 59 1 0 1 1 1 1 1 ( , ) 1 1 1 ( ˆ ) 1 p p p p p p EBeta p E . (2) When is close to one of the end points 1 ( 1) p and 1, ˆ tends to be unbiased. Otherwise, when 0, ˆ overestimates ; when 0 , ˆ underestimates . (3) When the sample size M increases, ˆ becomes more accurate. Actually from the results of Proposition 2.8, ˆ converges in probability to . About the standard deviation of ˆ : (1) When p = 2, the function of the standard deviation of ˆ is like an upsidedown bathtub when M is small. When the same size increases, the bathtub shape become flatter. (2) When p > 2, the bathtub shape is not symmetric and shrinks to the right. (3) Basically, with fixed p and , the standard deviation decreases when the sample size increases. Figures 3 and 4 illustrate the simulated probability density functions for the MLE of for the cases p = 2 and p = 3, respectively. Various sample sizes 2, 5, 20, and 40 are considered for each figure. Summarizing the information provided from these two figures we have the following results about the probability density function of ˆ : (1) With fixed p, when sample size is very small (M = 2), the probability density function is bimodal. Otherwise it is unimodal. (2) With fixed p, when sample size becomes larger, the pdf of ˆ becomes more concentrated and symmetric. (3) With fixed sample size, when is less than 0, the pdf is skewed to the right; otherwise it is skewed to the left. 60 (4) With fixed sample size, when is more extreme, the pdf of ˆ is steeper. Figure 1 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 simulation study of expected rho.hat with p= 2 rho E(rho.hat) M=2 M=3 M=5 M=10 M=20 M=50 M=100 1.0 0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 simulation study for standard deviation of rho.hat with p= 2 rho SD(rho.hat) 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 simulation study of expected rho.hat with p= 3 rho E(rho.hat) M=2 M=3 M=5 M=10 M=20 M=50 M=100 1.0 0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 simulation study for standard deviation of rho.hat with p= 3 rho SD(rho.hat) 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 simulation study of expected rho.hat with p= 4 rho E(rho.hat) M=2 M=3 M=5 M=10 M=20 M=50 M=100 1.0 0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 simulation study for standard deviation of rho.hat with p= 4 rho SD(rho.hat) 61 Figure 2 1.00.50.00.51.01.00.50.00.51.0simulation study of expected rho.hat with p= 5rhoE(rho.hat)M=2M=3M=5M=10M=20M=50M=1001.00.50.00.51.00.00.20.40.60.81.0simulation study for standard deviation of rho.hat with p= 5rhoSD(rho.hat)1.00.50.00.51.01.00.50.00.51.0simulation study of expected rho.hat with p= 6rhoE(rho.hat)M=2M=3M=5M=10M=20M=50M=1001.00.50.00.51.00.00.20.40.60.81.0simulation study for standard deviation of rho.hat with p= 6rhoSD(rho.hat)1.00.50.00.51.01.00.50.00.51.0simulation study of expected rho.hat with p= 7rhoE(rho.hat)M=2M=3M=5M=10M=20M=50M=1001.00.50.00.51.00.00.20.40.60.81.0simulation study for standard deviation of rho.hat with p= 7rhoSD(rho.hat)62 Figure 3 Figure 4 1.0 0.5 0.0 0.5 1.0 0 5 10 15 simulated pdf of the MLE of rho p=2,M=2 N = 1000000 Bandwidth = 0.01909 Density rho=.9 rho=.7 rho=.3 rho=.1 rho=0 rho=.1 rho=.3 rho=.7 rho=.9 1.0 0.5 0.0 0.5 1.0 0 5 10 15 simulated pdf of the MLE of rho p=2,M=5 N = 1000000 Bandwidth = 0.006202 Density rho=.9 rho=.7 rho=.3 rho=.1 rho=0 rho=.1 rho=.3 rho=.7 rho=.9 1.0 0.5 0.0 0.5 1.0 0 5 10 15 simulated pdf of the MLE of rho p=2,M=20 N = 1000000 Bandwidth = 0.002563 Density rho=.9 rho=.7 rho=.3 rho=.1 rho=0 rho=.1 rho=.3 rho=.7 rho=.9 1.0 0.5 0.0 0.5 1.0 0 5 10 15 simulated pdf of the MLE of rho p=2,M=40 N = 1000000 Bandwidth = 0.001765 Density rho=.9 rho=.7 rho=.3 rho=.1 rho=0 rho=.1 rho=.3 rho=.7 rho=.9 0.5 0.0 0.5 1.0 0 5 10 15 simulated pdf of the MLE of rho p=3,M=2 N = 1000000 Bandwidth = 0.009229 Density rho=.4 rho=.2 rho=.1 rho=0 rho=.1 rho=.2 rho=.4 rho=.7 rho=.9 0.5 0.0 0.5 1.0 0 5 10 15 simulated pdf of the MLE of rho p=3,M=5 N = 1000000 Bandwidth = 0.00464 Density rho=.4 rho=.2 rho=.1 rho=0 rho=.1 rho=.2 rho=.4 rho=.7 rho=.9 0.5 0.0 0.5 1.0 0 5 10 15 simulated pdf of the MLE of rho p=3,M=20 N = 1000000 Bandwidth = 0.002121 Density rho=.4 rho=.2 rho=.1 rho=0 rho=.1 rho=.2 rho=.4 rho=.7 rho=.9 0.5 0.0 0.5 1.0 0 5 10 15 simulated pdf of the MLE of rho p=3,M=40 N = 1000000 Bandwidth = 0.001483 Density rho=.4 rho=.2 rho=.1 rho=0 rho=.1 rho=.2 rho=.4 rho=.7 rho=.9 63 2.2.4 Hypothesis Testing for 0 0 H : μ μ Using Approximate χ2 Test Using the results from Subsection 2.2.3 that 2 2 ˆ p and p ˆ , we arrive at the following approximation theorem which can be used to test the hypothesis 0 0 H : μ μ . Theorem 2.5: 1 2 ) ˆ ( )] ˆ ( ˆ ( ˆ ) [ p d Var μ μ μ μ μ . Proof: Recall that 1 1 1 1 ) ' ( ˆ )] ˆ ( ˆ [ M i i i Var C C Σ μ , p p p Σ I J 1 ( 1) ˆ ˆ ˆ (1 ˆ ) ˆ 1 2 1 . Also we have the expression . (1 ) ˆ (1 ˆ ) ' ( ˆ ) 1 ( 1) ˆ 1 ( 1) ˆ (1 ) 1 ( ˆ ) (1 ) ˆ (1 ˆ ) ' ( ˆ ) (1 ) 1 ( 1) 1 ( ˆ ) ' ( ˆ ) 1 ( 1) ˆ ˆ ˆ (1 ˆ ) 1 ( ˆ ) ) ˆ ( )] ˆ ( ˆ ( ˆ ) [ 2 2 1 2 2 2 1 2 1 2 1 μ μ J C C μ μ μ μ I J C C μ μ μ μ I J C C μ μ μ μ μ μ μ M i p i i M i p p i i M i p p i i p p p p Var Since 1 (1 ) ˆ (1 ˆ ) 2 2 p and 0 1 ( 1) ˆ 1 ( 1) ˆ p p p , we have by Slutsky’s theorem that d Var ) ˆ ( )] ˆ ( ˆ ( ˆ ) [ 1 μ μ μ μ μ ( ˆ ) ( ' )( ˆ ) 1 1 μ μ C C Σ μ μ M i i i , which follows a 2 p distribution. The proof is complete. 64 2.3 SIMULATION STUDY FOR MISUSE OF HOMOGENEOUS MEAN MODELS In this section, power under 0 0 H : μ μ based on two test procedures, each of which corresponds to the same hypothesis but different model setting, will be compared for the purpose of showing that the usual test procedure for testing 0 0 H : μ μ is not appropriate when our data are polluted by some reasons but ignored by researchers. In each simulation, a sample of independent bivariate normal data m X , ..., X 1 , m 100 is generated from ( , ) 2 0 C μ Σ i MVN , where i 2 i0 C I C , where 0 0 0 0 0 i i i i i b a a b C . Note that i0 C is (symmetric) circulant, and thus so is i C . Two likelihood ratio tests are denoted by LRTCμ and LRTμ which are stated below respectively:  LRTCμ: LRT for testing 0 0 H : μ μ for homogeneous mean model ~ ( , ) 2 X C μ Σ i i N , and  LRTμ: LRT for testing 0 0 H : μ μ for heterogeneous means model ~ ( , ) 2 X N μ Σ i , where μ and Σ are unknown but has compound symmetry structure. Recall from Theorem 2.2 that the test statistics for LRTCμ is LRTCμ statistic = 0 1 0 0 1 1 2 2 1 2 2 pB B B pB B B p p , where M i i i T i i B 1 1 (X C μˆ ) (X C μˆ ) , M i p i i T i i B 1 2 (X C μˆ ) J (X C μˆ ) , M i i i T i i B 1 0 0 0 1 (X C μ ) (X C μ ) , M i p i i T i i B 1 0 0 0 2 (X C μ ) J (X C μ ) , and Σ 65 M i i T i M i i T i 1 1 1 μˆ C C C X . When C 0 i0 for all i, the two test statistics are the same. Under 0 0 H : μ μ , both of the test statistics are distributed as the random variable stated in Theorem 2.2. We reject the null hypothesis when the test statistics are sufficiently small. The simulation study is described as follows. Data: Data are generated from ( , ) 2 0 C μ Σ i N , where i 2 i0 C I C , 0 μ , Σ , and i0 C are shown in the first four columns in Table 1. Hypotheses: Both tests correspond to the hypothesis of interest 0 0 H : μ μ . Tests and critical value: Two likelihood ratio tests are performed based on the generated data. The critical value for the two tests is the same since the null distribution of both tests are the same. As we can see in Theorem 2.2, the null distribution of the test statistic of LRTCμ does not depend on the matrices i C . Number of simulations: The number of LRT values needed to compute the empirical alpha of the test LRTCμ or the rejection probability of the test LRTμ is 10000. Interpretation of the simulation study: Column 4 of Table 1 shows the diagonal elements i0 a of the matrices i C . For instance, i0 a =  .99(.02) means that the first value of i0 a is .99 10 a , then increases by 0.02 for each one unit increase of i. As denoted in column 5 from Table 1, the value (probability) in each cell is the empirical α for the test LRTCμ given the generated data from the heterogeneous means models. All the values in column 5 are close to 0.05, the significant level specified for the test and is as expected. On the other hand, since the data are polluted, adopting the test LRTμ does not make sense and is not appropriate. If we still consider that the 66 generated data are from the homogeneous mean model ( , ) 2 N μ Σ , the rejection probability for each scenario is shown in column 6 of Table 1. As we can see, the values of this column vary from one scenario to another. Some achieve the probability of 1 and some is less than 0.05. Generally, when the pollution of the data becomes more severe, that is when matrices i0 C is far away from zero matrix with a faster rate, the rejection probability is larger. Under the scenario C 0 i0 , all the three rejection probabilities are less than 0.05 and one of them is even 0. Lastly, the two rejection probabilities of column 6 are 1 even when data suffer only slight contamination ( i0 a =.001(.001) and i0 i0 b a for both of the two cases about Σ ). TABLE 1: Result of simulation study for misuse of homogeneous mean model (1) (2) (3) (4) (5) (6) μ0 Σ 0 0 0 0 0 i i i i i b a a b C Ci = I2 + Ci0 i=1,…, m Values of i0 a Testing 0 0 H : μ μ LRTCμ (Empirical α) LRTμ (Rejection Probability) 30 10 .5 1 1 .5 i0 i0 b a i0 a =.02(.02) .055 1 0 0 i b .045 1 i0 i0 b a .051 1 i0 i0 b a ΣCi0 = 0 i0 a =  .99(.02) .049 .015 0 0 i b .043 0 i0 i0 b a .047 .014 i0 i0 b a .00001(.00001) 0 i a .050 .0528 i0 a =.0001(.0001) .06 .407 =.001(.001) .047 1 .2 1 1 .2 i0 a =.00001(.00001) .056 .057 i0 a =.0001(.0001) .053 .18 =.001(.001) .046 1 i0 a i0 i0 b a i0 a 67 CHAPTER III MULTISAMPLE INFERENCE 3.1 INTRODUCTION In this chapter, we move on to the inference for multisample case when the heterogeneous means models are adopted. Twosample inference will be the starting point. Consider two independent samples M p i x i i x X ,..., X ~ MVN (μ ,Σ ), μ C μ 1 , for all i 1,...,M , and N p j y j j y Y ,...,Y ~ MVN (v , Σ ), v D μ 1 , for all j 1,...,N . Both i C and j D are known p p matrices. The hypotheses of interest are x y H : μ μ 0 versus a x y H : μ μ . The likelihood function is ( ) ( ) . 2 1 ( ) ( ) 2 1 exp ( , , , )     1 1 1 1 2 2 M i N j y j j y T x i i x j j y T i i x N y M x y x y x L constant x C μ Σ x C μ y D μ Σ y D μ μ μ Σ Σ Σ Σ The corresponding log likelihood function is N j y j j y T j j y M i x i i x T x y i i x x y x y M N constant L 1 1 1 1 ( ) ( ) ( ) ( ) 2 1 log 2 log 2 log ( , , , ) y D μ Σ y D μ Σ Σ x C μ Σ x C μ μ μ Σ Σ (3.1) First consider the simple case where both x Σ and y Σ are known. The MLEs for x μ and y μ 68 are, respectively, M i i T i M i i T x i 1 1 1 1 1 μˆ C Σ C C Σ X x x , N j y j T j N j y j T y i 1 1 1 1 1 μˆ D Σ D D Σ Y . x μˆ and y μˆ are independent and 1 1 1 ˆ ~ , M i x i T x p x i μ MVN μ C Σ C , 1 1 1 ˆ ~ , N j y j T y p y i μ MVN μ D Σ D , 1 1 1 1 1 1 ˆ ˆ ~ , N j y j T i M i x i T x y p x y i μ μ MVN μ μ C Σ C D Σ D . Define the statistic ( ˆ ˆ ) ( ˆ ˆ ). 1 1 1 1 1 1 1 0 x y N j y j T i M i x i T i T x y T μ μ C Σ C D Σ D μ μ Under the null hypothesis x y H : μ μ 0 , 2 0 ~ p T . Thus we reject H0 if 2 0 , p T . For the case that both x Σ and y Σ are unknown but equal, likelihood approach is used to test x y H : μ μ 0 in Section 3.2. In Section 3.3, the asymptotic χ2 test for testing x y H : μ μ 0 is derived. Finally in Section 3.4 the LR test for twosample case is extended to ksample case and the exact distribution of the LRT statistic for k H : μ ... μ 0 1 is derived. 69 3.2 LIKELIHOOD RATIO TEST FOR TWOSAMPLE CASE In this section, the case that Σ Σ Σ x y unknown is considered. We also assume that Σ has compound symmetry with the form in (2.8), i C , j D , and Σ commute with each other; that is, T i T i C Σ Σ C 1 1 , T j T j D Σ Σ D 1 1 for all i and j . Before deriving the likelihood ratio test for x y H : μ μ 0 , it is necessary to find the MLEs of the parameters under the null and alternative hypotheses separately. 3.2.1 Estimation Under x y H : μ μ 0 Assume that 0 μ μ μ x y under H0. Using the same technique as shown in onesample case, the MLE of 0 μ , say 0 μˆ , can be derived as ˆ ˆ ˆ ˆ ˆ , 1 1 0 1 1 0 1 1 1 0 1 1 0 0 N j j T j M i i T i N j j T j M i i T i μ C Σ C D Σ D C Σ X D Σ Y where 0 ˆΣ is the MLE for Σ under 0 H . Since T i T i C Σ Σ C 1 0 1 0 ˆ ˆ and T j T j D Σ Σ D 1 0 1 0 ˆ ˆ for all i and j , 0 μˆ reduces to N j j T j M i i T i N j j T j M i i T i 1 1 1 1 1 0 μˆ C C D D C X D Y . (3.2) Therefore 0 ˆΣ can be obtained using the reduced log likelihood function ( ) ( ) , ( ) ( ) 2 1 log 2 log ( , ) 1 0 1 0 1 0 1 0 0 N j j j T j j M i i i T i i M N L constant y D μ Σ y D μ μ Σ Σ x C μ Σ x C μ (3.3) The MLE for Σ under 0 H is thus ˆ ˆ [(1 ˆ ) ˆ ] 0 0 2 0 0 p p Σ I J , 70 where 2 (0) (0) 0 1 1 ( ) 1 ˆ B E M N p , 1 1 1 2 2 1 1 ˆ (0) (0) (0) (0) 0 B E B E p , (3.4) where M i i i T i i B 1 0 0 (0) ) ˆ ( ) ˆ ( 1 μ C X μ C X , M i p i i T i i B 1 0 0 (0) 2 (X C μˆ ) J (X C μˆ ) , N j j j T j j E 1 0 0 (0) ) ˆ ( ) ˆ ( 1 μ D Y μ D Y , N j p j j T j j E 1 0 0 (0) 2 (Y D μˆ ) J (Y D μˆ ) . (3.5) 3.2.2 Estimation Under a x y H : μ μ Under a x y H : μ μ , the log likelihood function is ( ) ( ) ( ) ( ) . 2 1 log 2 log ( , , ) 1 1 1 1 N j y j j y T j j y M i x i i x T i i x x y M N L constant x C μ Σ x C μ y D μ Σ y D μ μ μ Σ Σ Using a similar approach as shown in Section 2.2.3, the MLEs for x μ , y μ , and Σ are, respectively, M i i T i M i i T x i 1 1 1 μˆ C C C X , N j j T j N j j T y j 1 1 1 μˆ D D D Y , and ˆ ˆ 2 ˆ ˆ 2 ( ) ˆ 2[(1 ˆ ) ˆ ] p p p p p Σ I J I I J , where 2 ( ) ( ) 1 1 ( ) 1 ˆ a a B E M N p , 1 1 1 2 2 1 1 ˆ ( ) ( ) ( ) ( ) a a a a B E B E p , (3.6) where 71 M i i i x T i i x a B 1 ( ) ) ˆ ( ) ˆ ( 1 μ C X μ C X , M i p i i x T i i x a B 1 ( ) 2 (X C μˆ ) J (X C μˆ ) , N j j j y T j j y a E 1 ( ) ) ˆ ( ) ˆ ( 1 μ D Y μ D Y , N j p j j y T j j y a E 1 ( ) 2 (Y D μˆ ) J (Y D μˆ ) . (3.7) 3.2.3 Likelihood Ratio Test for Testing x y H : μ μ 0 Subsections 3.2.1 and 3.2.2 derived the MLE’s for parameters under both null and alternative hypotheses. The likelihood ratio test can now be developed. The likelihood ratio is , ( ˆ ) ˆ ( ˆ ) ( ˆ ) ˆ ( ˆ ) 2 1 exp ( ˆ ) ˆ ( ˆ ) ( ˆ ) ˆ ( ˆ ) 2 1 exp (2 )  ˆ  (2 )  ˆ  max ( , , ) max ( , , ) 1 1 1 1 1 1 0 1 0 0 0 1 0 0 2 2 ( ) 2 0 2 ( ) 0 M i N j j j y T i i x j j y T i i x M i N j j j T i i j j T i i M N p M N M N p M N x y x y L L x C μ Σ x C μ y D μ Σ y D μ x C μ Σ x C μ y D μ Σ y D μ Σ Σ μ μ Σ μ μ Σ θ θ where θ (μ , μ , Σ) x y , {( , , )  [(1 ) ]} 2 x y p p μ μ Σ Σ I J , and {( , , )  , [(1 ) ]} 2 0 x y x y p p μ μ Σ μ μ Σ I J . Hence the results (from Appendix A.3) M N p M i N j j j T i i j j T i i ( ˆ ) ˆ ( ˆ ) ( ˆ ) ˆ ( ˆ ) ( ) 1 1 0 1 0 0 0 1 0 0 x C μ Σ x C μ y D μ Σ y D μ and M N p M i N j j j y T i i x j j y T i i x ( ˆ ) ˆ ( ˆ ) ( ˆ ) ˆ ( ˆ ) ( ) 1 1 1 1 x C μ Σ x C μ y D μ Σ y D μ imply that the likelihood ratio is 72 . ( 2 2 ) ( 2 2 ) 1 ( 1 1 ) ( 2 2 ) ( 2 2 ) 1 ( 1 1 ) ( ˆ ) (1 ˆ ) [1 ( 1) ˆ ] ( ˆ ) (1 ˆ ) [1 ( 1) ˆ ]  ˆ   ˆ  2 (0) (0) 1 (0) (0) (0) (0) ( ) ( ) 1 ( ) ( ) ( ) ( ) 2 0 1 0 2 0 2 2 1 0 M N p a a p a a a a M N p p p p M N B E B E p B E B E B E p B E p p Σ Σ Thus we arrive at the following theorem. Theorem 3.1: The likelihood ratio test for testing x y H : μ μ 0 is to reject 0 H if , L C where C is such that (  ) 0 P L C H , and L is defined as: , ( 2 2 ) ( 2 2 ) 1 ( 1 1 ) ( 2 2 ) ( 2 2 ) 1 ( 1 1 ) (0) (0) 1 (0) (0) (0) (0) ( ) ( ) 1 ( ) ( ) ( ) ( ) 2 /( ) B E B E p B E B E B E p B E L p a a p a a a a M N where is the likelihood ratio and ( ) 1 a B , ( ) 2 a B , (0) B1 , and (0) B2 are defined in (3.5) and (3.7). To show the null distribution of L , the following propositions are needed. Proposition 3.1: Under : ( ) 0 0 μ μ μ x y H , ( 2 2 ) 1 ( 1 1 ) (0) (0) (0) (0) B E p B E is distributed as the quantity 2 ( 1)( 1) 2 (1 ) M N p . Proof: First rewrite ( 2 2 ) 1 ( 1 1 ) (0) (0) (0) (0) B E p B E as )( ˆ ), 1 )( ˆ ) ( ˆ ) ( 1 ( ˆ ) ( 2 ) 1 2 ) ( 1 1 ( 2 2 ) ( 1 1 ( 1 1 ) 1 0 0 1 0 0 (0) (0) (0) (0) (0) (0) (0) (0) N j p p j j T j j M i p p i i T i i p p E p B E p B E B p B E X C μ I J X C μ Y D μ I J Y D μ 73 where N j j T j M i i T i N j j T j M i i T i 1 1 1 1 1 0 μˆ C C D D C X D Y . Appendix A.4 shows that Under : ( ) 0 0 μ μ μ x y H we have ˆ ~ ( , ) 1 1 1 0 0 μ μ C C D D Σ N j j T j M i i T i N . Thus the quadratic form ) ( ˆ ) 1 ( ˆ ) ( 0 0 1 1 0 0 μ μ C C D D I J μ μ p p N j j T j M i i T i T p (3.9) is distributed as the quantity p j j 1 2 1 , where j ’s are the latent roots of 1 P defined in (2.14). Using the results in the proof of Proposition 2.3, expression in (3.9) is distributed as a 2 1 2 (1 ) p random variable, and the random variable N j p p j j T j j M i p p i i T i i p 1 p 0 0 1 0 0 )( ) 1 )( ) ( ) ( 1 (X C μ ) (I J X C μ Y D μ I J Y D μ are distributed as chisquare random variables with (M N)(p 1) degrees of freedom times a constant (1 ) 2 . Hence, using the result of sum of independent chisquare random variables, ) ( ˆ ). 1 ( ˆ ) ( )( ) 1 ( ) ( )( ) 1 ( ) ( ( 2 2 ) 1 ( 1 1 ) 0 0 1 1 0 0 1 0 0 1 0 0 (0) (0) (0) (0) μ μ C C D D I J μ μ Y D μ I J Y D μ X C μ I J X C μ p p N j j T j M i i T i T N j p p j j T j j M i p p i i T i i p p p B E p B E (3.8) 74 it follows that ( ) 1 ( ) (0) 2 (0) 2 (0) 1 (0) 1 B E p B E is distributed as the random variable 2 ( 1)( 1) 2 (1 ) M N p . The proof is complete. Proposition 3.2: Under : ( ) 0 0 μ μ μ x y H , (0) (0) B2 E2 is distributed as the quantity 2 1 2 [1 ( 1) ] M N p p . Proof: ( ˆ ) ( ˆ ) Appendix A.4 ( ) ( ) ( ) ( ) 2 2 ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 (0) (0) μ μ C C D D J μ μ X C μ J X C μ Y D μ J Y D μ X C μ J X C μ Y D μ J Y D μ p N j j T j M i i T i T N j p j j T j j M i p i i T i i N j p j j T j j M i p i i T i i B E Under : ( ) 0 0 μ μ μ x y H , referring to the proof of Proposition 2.4, 2 2 1 0 0 1 0 0 [1 ( 1) ] ( ) ( ) ( ) ( ) M N d N j p j j T j j M i p i i T i i p p X C μ J X C μ Y D μ J Y D μ (3.10) and 2 1 2 0 0 1 1 0 0 ( ˆ ) ( ˆ ) [1 ( 1) ] p p d p N j j T j M i i T i T μ μ C C D D J μ μ , (3.11) implies that 2 1 (0) (0) 2 2 2 [1 ( 1) ] M N d B E p p by using the result of sum of two independent chisquare random variables. The proof is complete. 75 Proposition 3.3: ( 2 2 ) 1 ( 1 1 ) (a) (a) (a) (a) B E p B E is distributed as the quantity 2 ( 2)( 1) 2 (1 ) M N p . Proof: Assume that x x E(μˆ ) μ and y y E(μˆ ) μ . So we have ) ( ˆ ). 1 )( ) ( ˆ ) ( 1 ( ) ( ) ( ˆ ) 1 )( ) ( ˆ ) ( 1 ( ) ( )( ˆ ) 1 )( ˆ ) ( ˆ ) ( 1 ( ˆ ) ( 2 ) 1 2 ) ( 1 1 ( 2 2 ) ( 1 1 ( 1 1 ) 1 1 1 1 1 1 ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) p p y y N j j T j T y y N j p p j j y T j j y p p x x M i i T i T x x M i p p i i x T i i x N j p p j j y T j j y M i p p i i x T i i x a a a a a a a a p p p p p p E p B E p B E B p B E Y D μ I J Y D μ μ μ D D I J μ μ X C μ I J X C μ μ μ C C I J μ μ X C μ I J X C μ Y D μ I J Y D μ Applying Proposition 2.3 we have that 2 ( 1)( 1) ( ) ( ) 2 2 (1 ) 1 1 M p d a a B p B and 2 ( 1)( 1) ( ) ( ) 2 2 (1 ) 1 1 N p d a a E p E . Since ( ) ( ) 2 1 1 a a B p B and ( ) ( ) 2 1 1 a a E p E are independent, we have 2 ( 1)( 1) ( ) ( ) ( ) ( ) 2 ( 2 2 ) (1 ) 1 ( 1 1 ) M N p d a a a a B E p B E . The proof is complete. Proposition 3.4: ( ) ( ) 2 2 a a B E is distributed as the random variable 2 2 2 [1 ( 1) ] M N p p . Proof: 76 N j p j j y T j j y M i p i i x T i i x a a B E 1 1 ( ) ( ) 2 2 (X C μˆ ) J (X C μˆ ) (Y D μˆ ) J (Y D μˆ ) Applying Proposition 2.4, ( ) ( ) 2 2 a a B E is distributed as the sum of two independent random variables 2 1 2 [1 ( 1) ] M p p and 2 1 2 [1 ( 1) ] N p p . Therefore 2 2 ( ) ( ) 2 2 2 [1 ( 1) ] M N d a a B E p p . The proof is complete. Now we arrive at the following theorem. Theorem 3.2: The likelihood ratio test statistic in Theorem 3.1 for testing x y H : μ μ 0 is L defined as: A C B D B E B E p B E B E B E p B E L p p p a a p a a a a 1 1 (0) (0) 1 (0) (0) (0) (0) ( ) ( ) 1 ( ) ( ) ( ) ( ) ( 2 2 ) ( 2 2 ) 1 ( 1 1 ) ( 2 2 ) ( 2 2 ) 1 ( 1 1 ) , where ( 2 2 ) 1 ( 1 1 ) (0) (0) (0) (0) B E p A B E , ( 2 2 ) 1 ( 1 1 ) (a) (a) (a) (a) B E p B B E , (0) (0) C B2 E2 , and ( ) ( ) 2 2 a a D B E . (a) B and D are distributed respectively as the following: 2 ( 2)( 1) 2 (1 ) M N p d B and 2 2 2 [1 ( 1) ] M N d D p p . Under x y H : μ μ 0 , A and C are distributed respectively as the following: 2 ( 1)( 1) 2 (1 ) M N p d A and 2 1 2 [1 ( 1) ] M N d C p p . (b) AB, B, CD, and D are mutually independent weighted chisquare random variables. 77 (c) Furthermore, under x y H : μ μ 0 , L is distributed as the random variable ** 1 * 2 1 1 2 1 1 1 F M N F M N p , where * F and ** F are independent and distributed like p1, (MN2)( p1) F , and 1,MN2 F , respectively. Proof of (a): Results are obtained directly from Propositions 3.1 to 3.4. Proof of (b) and (c): First rewrite A and C as follows. A can be expressed as , )( ˆ ) 1 )( ˆ ) ( ˆ ) ( 1 ( ˆ ) ( 1 0 0 1 0 0 B R p p A N j p p j j T j j M i p p i i T i i X C μ I J X C μ Y D μ I J Y D μ where B and R are, respectively, N j p p j j y T j j y M i p p i i x T i i x p p B 1 1 )( ˆ ) 1 )( ˆ ) ( ˆ ) ( 1 (X C μˆ ) (I J X C μ Y D μ I J Y D μ and )( ˆ ˆ ). 1 ( ˆ ˆ ) ( )( ˆ ˆ ) 1 ( ˆ ˆ ) ( 0 1 0 0 1 0 μ μ D D I J μ μ μ μ C C I J μ μ p p y N j j T j T y p p x M i i T i T x p p R (3.12) Similarly, C can be expressed as , ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) 1 0 0 1 0 0 D S C N j p j j T j j M i p i i T i i X C μ J X C μ Y D μ J Y D μ where 78 N j p j j y T j j y M i p i i x T i i x D 1 1 (X C μˆ ) J (X C μˆ ) (Y D μˆ ) J (Y D μˆ ) and ( ˆ ˆ ) ( ˆ ˆ ). ( ˆ ˆ ) ( ˆ ˆ ) 0 1 0 0 1 0 μ μ D D J μ μ μ μ C C J μ μ p y N j j T j T y p x M i i T i T x S (3.13) Some other facts necessary to prove (b) are stated below. (1) B and R are independent (2) D and S are independent (3) 2 1 2 (1 ) p d R and 2 1 2 S p [1 ( p 1) ] d . (4) B and D are independent (5) B and S are independent (6) R and D are independent (7) R and S are independent Facts (1) and (2) are true because both B and D are functions of i i x X C μˆ and j j y Y D μˆ for all i 1,...,M , j 1,..., N , also R and S are functions of x μˆ and y μˆ since 0 μˆ in (3.2) can be expressed as a linear combination of x μˆ and y μˆ as follows: ˆ ( ) [ ˆ ˆ ] * * 1 * * 0 x y μ C D C μ D μ , (3.14) where M i i T i 1 * C C C , N j j T j 1 * D D D , M i i T x i 1 * 1 ) ( ˆ X C C μ and N j j T y j 1 * 1 μˆ (D ) D Y . Combining the facts that i i x X C μˆ and x μˆ are independent as well as j j y Y D μˆ and y μˆ are independent, Facts (1) and (2) are shown. 79 Fact (3) can be shown using the results in part (a) in conjunction with Facts (1) and (2), and the result about sum of independent chisquare random variables. More clearly, the results 2 ( 1)( 1) 2 (1 ) M N p d A and 2 ( 2)( 1) 2 (1 ) M N p d B combined with Fact (1) imply 2 1 2 (1 ) p d R . In addition, the results 2 1 2 [1 ( 1) ] M N d C p p and 2 2 2 [1 ( 1) ] M N d D p p in connection with fact (2) implies 2 1 2 S p [1 ( p 1) ] d . Fact (4) can be shown by applying Proposition 2.5. 2 ) 1 ( 1 (a) (a) B p B and ( ) 2 a B are independent, 2 ) 1 ( 1 (a) (a) E p E and ( ) 2 a E are independent as well. As a matter of fact, 2 ) 1 ( 1 (a) (a) B p B , ( ) 2 a B , 2 ) 1 ( 1 (a) (a) E p E and ( ) 2 a E are mutually independent so fact (4) is shown. Facts (5) and (6) are true using the same argument when Facts (1) and (2) were shown. To show Fact (7), it is necessary to rewrite R and S in (3.12) and (3.13), respectively. In (3.12) the two terms on the righthand side can be expressed respectively as )( ˆ ), 1 2( ˆ ˆ ) ( )( ˆ ) 1 )( ˆ ) ( ˆ ) ( 1 ( ˆ ) ( )( ˆ ˆ ) 1 ( ˆ ˆ ) ( 0 0 * 0 0 0 * 0 0 0 * 0 0 * 0 μ μ C I J μ μ μ μ C I J μ μ μ μ C I J μ μ μ μ C I J μ μ p p T x p p T p p x T x p p x T x p p p p and 80 )( ˆ ). 1 )( ˆ ) 2( ˆ ˆ ) ( 1 ( ˆ ) ( )( ˆ ) 1 )( ˆ ˆ ) ( ˆ ) ( 1 ( ˆ ˆ ) ( 0 0 * 0 0 0 * 0 0 0 * 0 0 * 0 μ μ D I J μ μ μ μ D I J μ μ μ μ D I J μ μ μ μ D I J μ μ p p T p p y T p p y T p p y y T y p p p p We should note that )( ˆ ) 0 1 )( ˆ ) ( ˆ ˆ ) ( 1 ( ˆ ˆ ) ( 0 0 * 0 0 0 * 0 μ μ C I J μ μ μ μ D I J μ μ p p T p p y T x p p by substituting (3.14) into the lefthand side of the above equation. Therefore, R becomes )]( ˆ ). 1 ( ˆ ) [( )( )]( ˆ ) 1 ( ˆ ) [ ( )]( ˆ ) 1 ( ˆ ) [ ( 0 0 * * 0 0 0 * 0 0 * 0 μ μ C D I J μ μ μ μ D I J μ μ μ μ C I J μ μ p p T p p y T y p p x T x p p p R (3.15) Likewise, S can be written as ( ˆ ) [( ) ]( ˆ ). ( ˆ ) [ ]( ˆ ) ( ˆ ) [ ]( ˆ ) 0 0 * * 0 0 0 * 0 0 * 0 μ μ C D J μ μ μ μ C J μ μ μ μ D J μ μ p T p y T p x y T x S (3.16) Since 0 0 μˆ μ can be written as ( ) [ ( ˆ ) ( ˆ )] , ˆ ( ) ( ˆ ˆ ) ( ) ( ) 0 * 0 * * 1 * 0 * * 1 * * * * 1 * * 0 0 C D C μ μ D μ μ μ μ C D C μ D μ C D C D μ x y x y the las
Click tabs to swap between content that is broken into logical sections.
Rating  
Title  Multivariate Normal Inference for Heterogeneous Samples and an Application to Meta Analysis 
Date  20120701 
Author  Lin, LiChi 
Keywords  circulant matrices, compound symmetry, heterogeneous means, likelihood ratio testing, maximum likelihood estimation, multivariate normal 
Department  Statistics 
Document Type  
Full Text Type  Open Access 
Abstract  When extending likelihood inference in the case of the normal distribution to heterogeneous samples, one discovers that this is easily done in the univariate case but is prohibitive in the multivariate cases. In the current work, the exact maximum likelihood estimates for the core mean and the covariance matrix are obtained for samples of different means but with core parameter vector and an unknown covariance matrix but a structured one. Then the celebrated Hotelling's Tsquare statistic is generalized to this case where the exact null distribution is derived. The approximate ChiSquare test is then obtained as well. Next, we derive analogous results in the ksample situation. The generalized Hotelling T square statistic developed allows us to proceed to testing hypotheses in the oneway multivariate ANOVA when samples are heterogeneous. A cutting edge application of this work is its introduction to multivariate meta analysis approaches for multivariate heterogeneous data for the first time. 
Note  Dissertation 
Rights  © Oklahoma Agricultural and Mechanical Board of Regents 
Transcript  MULTIVARIATE NORMAL INFERENCE FOR HETEROGENEOUS SAMPLES AND AN APPLICATION TO META ANALYSIS By LICHI LIN Bachelor of Science in Mathematics ChungYuan Christian University ChungLi, TaoYuan 1990 Master of Science in Statistics National Central University ChungLi, TaoYuan 1993 Submitted to the Faculty of the Graduate College of the Oklahoma State University in partial fulfillment of the requirements for the Degree of DOCTOR OF PHILOSOPHY July, 2012 ii MULTIVARIATE NORMAL INFERENCE FOR HETEROGENEOUS SAMPLES AND AN APPLICATION TO META ANALYSIS Dissertation Approved: Dr. Ibrahim A. Ahmad Dissertation Adviser Dr. Carla L. Goad Dr. Lan Zhu Dr. Tieming Liu Dr. Sheryl A. Tucker Dean of the Graduate College iii TABLE OF CONTENTS Chapter Page I. INTRODUCTION AND LITERATURE REVIEW ..................................................1 1.1 Introduction .......................................................................................................1 1.2 Homogeneous Mean Model for Single Population ...........................................2 1.2.1 Inferences concerning the Mean Vector When Covariance Matrix is Unstructured ..............................................................................3 1.2.2 Inferences concerning the Mean Vector When Covariance Matrix Has Compound Symmetry Structure ............................................5 1.2.3 Inferences concerning the Mean Vector When Covariance Matrix Is Circulant ....................................................................................7 1.2.4 Inferences concerning the Mean Vector When Covariance Matrix Is Block Compound Symmetry ......................................................9 1.2.5 Inferences concerning Both Means and Covariance Matrices ................11 1.3 Homogeneous Mean Models for k Populations with k 2 ............................13 1.4 Meta Analysis .................................................................................................14 1.5 Proposed Heterogeneous Means Models ........................................................16 II. ONESAMPLE INFERENCE ................................................................................18 2.1 Introduction and Preliminary Cases ................................................................18 2.1.1 Inference for μ When Σ Is Known......................................................20 2.1.2 Inference for μ When Σ Is Unknown without Pattern ........................21 2.1.3 Inference for μ When Σ V 2 , 2 Unknown, V Known .................22 2.2 Mainstream: Inference for μ When Σ Has Compound Symmetry Structure and i C are Circulant .........................................................................25 2.2.1 Maximum Likelihood Estimators ..........................................................25 2.2.2 Hypothesis Testing for 0 0 H : μ μ Using LR Test ...............................28 2.2.3 Properties and Useful Results for ML Estimators ..................................42 2.2.4 Hypothesis Testing for 0 0 H : μ μ Using Approximate χ2 Test ............63 2.3 Simulation Study for Misuse of Homogeneous Mean Models ........................64 iv Chapter Page III. MULTISAMPLE INFERENCE ...........................................................................67 3.1 Introduction .....................................................................................................67 3.2 Likelihood Ratio Test for TwoSample Case ..................................................69 3.2.1 Estimation Under x y H : μ μ 0 .......................................................69 3.2.2 Estimation Under a x y H : μ μ ......................................................70 3.2.3 Likelihood Ratio Test for Testing x y H : μ μ 0 ..............................71 3.3 Approximate χ2 Test for x y H : μ μ 0 ...........................................................82 3.4 LRT for kSample Case ..................................................................................85 3.4.1 Estimation Under k H : μ ... μ 0 1 ...................................................86 3.4.2 Estimation Under a i j H : μ μ for some i j ..................................87 3.4.3 Likelihood Ratio Test for Testing k H : μ ... μ 0 1 ..........................88 IV. APPLICATION TO META ANALYSIS .............................................................99 4.1 Introduction and Preliminary Univariate Case ................................................99 4.2 Fixed Effect Model ........................................................................................100 4.3 Random Effects Model ..................................................................................103 4.3.1 TwoStage Method ...............................................................................103 4.3.2 OneStage Method ...............................................................................105 4.3.3 OneStage Method – Simulation Study ................................................109 V. CONCLUSIONS AND FUTURE WORK .........................................................115 5.1 Conclusions ....................................................................................................115 5.2 Future work ....................................................................................................118 REFERENCES ..........................................................................................................120 APPENDICES ...........................................................................................................123 A.1 ........................................................................................................................123 A.2 ........................................................................................................................124 A.3 ........................................................................................................................126 A.4 ........................................................................................................................127 v LIST OF TABLES Table Page Table 1 ......................................................................................................................66 vi LIST OF FIGURES Figure Page Figure 1 .....................................................................................................................60 Figure 2 .....................................................................................................................61 Figure 3 .....................................................................................................................62 Figure 4 .....................................................................................................................62 Figure 5 ...................................................................................................................111 Figure 6 ...................................................................................................................112 Figure 7 ...................................................................................................................113 Figure 8 ...................................................................................................................114 1 CHAPTER I INTRODUCTION AND LITERATURE REVIEW 1.1 INTRODUCTION If n X , ..., X 1 is a sample from a normal population, then to estimate the population mean the usual point estimator is the sample mean n X . However, if the collected data violate the assumption of “identically distributed” setting, that is, if each i X has heterogeneous mean, estimating the “population mean” will no longer make sense, except when structuring those means. Sometimes researchers believe that their collected sample is from a single population with a common constant mean when it is not, and they want to test the “population mean” equal to a specified value 0 without realizing that their data has previously been polluted due to some known or unknown mechanism. Hence the chance of rejection will be affected by the degree the data are polluted. Therefore it is necessary to model the disturbance of the data caused by the external or internal mechanisms and do inference for the parameter of interest. For example, let a random sample X , i 1,..., n, i be assumed independently, normally distributed with heterogeneous means C , i 1,..., n, i and common variance 2 . Let n C ,...,C 1 be known, and assume that ,..., ~ ( , ) 2 1 n i X X indN C . Although each i X has different mean, there is still an “underlying” mean hidden in this model. Once is estimated, each mean , i C i 1,..., n, is obtained. Actually, this model is a linear regression model through the origin. For this univariate 2 case, the model is very easy to estimate, while when extending it to multivariate case, the matrices i C ’s become troublesome. A special case of interest for i C is to assume it is a square matrix. For the remainder of this chapter, a review of the literature for inferences of multivariate homogeneous mean models for single normal population is introduced in Section 1.2 as follows: Subsection 1.2.1 gives a review for inferences concerning the mean vector when the covariance matrix is unstructured. Subsections 1.2.2 to 1.2.4 are about inferences of the means assuming that the covariance matrices are patterned. Finally, Subsection 1.2.5 is about the inferences concerning both the means and covariance matrices. Section 1.3 is about the inferences for multivariate homogeneous mean model for k normal populations with k 2. Section 1.4 gives a brief review for meta analysis. Section 1.5 formally introduces the proposed model under multivariate normal setting and gives an overall introduction for the contents of later chapters. 1.2 HOMOGENEOUS MEAN MODEL FOR SINGLE POPULATION The p dimensional multivariate normal model has mean μ and covariance matrix Σ . The basic statistical problem is to estimate the parameters with a sample of n observations n X ,..., X 1 from the normal distribution with homogeneous mean μ and homogeneous covariance matrix Σ. The maximum likelihood estimator of μ is just the sample mean and the maximum likelihood estimator of Σ is proportional to the matrix of sample variances and sample covariances. The sample covariance matrix is defined by ( )( )' 1 1 1 S X X X X i j n n i , (1.1) where n i i n 1 1 X X , and S is unbiased for estimating Σ and follows Wishart distribution , 1) 1 1 ( n n W Σ . 3 1.2.1 Inferences Concerning the Mean Vector When Covariance Matrix Is Unstructured Tests for the mean μ equal to a specified vector 0 μ have been discussed in many multivariate analysis textbooks (e.g. Anderson 2003, and Rencher 1998) for the cases that Σ is known as well as that Σ is unknown and unstructured. Since n(X μ) is distributed according to N(0, Σ) , it follows that ( )' ( ) 1 X μ Σ X μ n has a central chisquare distribution with p degrees of freedom for the case that Σ is known. For the case that Σ is unknown and unstructured, the likelihood of the homogeneous mean model given observations n x ,..., x 1 is ( ) ( )}, 2 1 (2 )   exp{ ( ) ( )} 2 1 ( ,  ,..., ) (2 )   exp{ 1 2 2 1 1 1 2 1 1 2 1 n i i T i np n n i n i i T i p n L Σ x μ Σ x μ μ Σ x x Σ x μ Σ x μ (1.2) and the corresponding log likelihood is ( ) ( ), 2 1 log   2 log ( ,  ,..., ) 1 1 1 n i i T n i n L μ Σ x x constant Σ x μ Σ x μ where log is the logarithm taken to base e. Let ( )' ( ) 0 1 0 2 X μ S X μ T n . For the rest part of this subsection, the following theorem concerning Hotelling 2 T distribution is stated and the likelihood ratio test for the hypothesis 0 0 H : μ μ is developed and based on the 2 T  statistic (Anderson 2003). Theorem 1.1 (Anderson 2003) Let n X ,..., X 1 be a sample from N(μ, Σ) , and define ( )' ( ) 0 1 0 2 X μ S X μ n T . The distribution of p n p n T 1 2 is noncentral F with p and n p degrees of freedom and noncentrality parameter ( )' ( ) 0 1 0 μ μ Σ μ μ n . If 0 μ μ , then the Fdistribution is central. 4 Since the 2 T statistic follows the Hotelling’s 2 T distribution which is the generalized version of Student’s t distribution, the confidence region of the mean vector can be derived on the basis of the 2 T statistic. The likelihood ratio for testing 0 0 H : μ μ is ( ˆ , ˆ ) ( , ˆ ) max ( , ) max ( , ) 0 0 , 0 μ Σ μ Σ μ Σ μ Σ μ Σ Σ L L L L (1.3) where ( ˆ )( ˆ )'. ˆ 1 , and 1 ˆ ( )( )', ˆ 1 1 1 0 1 0 0 n i i j n i i n i i j n n n Σ X μ X μ μ X X Σ X μ X μ (1.4) Thus (1.3) becomes .  ˆ   ˆ  2 1 exp  ˆ   ˆ  ( ) ( ) 2 1 exp  ˆ   ˆ  ˆ ( )( ) ˆ ( ˆ )( ˆ ) 2 1 exp  ˆ   ˆ  ( ˆ ) ˆ ( ˆ ) 2 1 (2 )  ˆ  exp ( ) ˆ ( ) 2 1 (2 )  ˆ  exp 2 0 2 0 2 0 1 1 1 0 0 1 0 2 0 1 2 2 1 1 0 1 0 0 2 0 2 n n p p n n i T i i n i T i i n n i i T i np n n i i T i np n tr nI tr nI np np tr tr Σ Σ Σ Σ Σ Σ Σ X μ X μ Σ X μ X μ Σ Σ Σ X μ Σ X μ Σ X μ Σ X μ Replacing Σ ˆ and 0 ˆΣ using (1.4), 2 / n becomes  ( )( )'   ( )( )'   ˆ   ˆ  1 0 0 1 0 2/ n i i j n n i i j X μ X μ X X X X Σ Σ . Further, to derive the likelihood ratio criterion, the following Corollary is required. 5 Corollary 1.1 (Anderson 2003) For C nonsingular, (1 ' ) 1 ' ' ' 1 1 C y C y y C y C yy y C y . Defining n i 1 i j A (X X)(X X)' and using Corollary 1.1, we have , 1 /( 1) 1 1 ( )' ( 1) ( ) 1 1 ( )' ( ) 1 ( )( )' 2 0 1 0 0 1 0 0 0 2 / n n T n n n n X μ S X μ A X μ X μ X μ A X μ A where 2 T is defined in Theorem 1.1. Thus the likelihood ratio test for 0 0 H : μ μ has rejection region { ,..., : } 0 2 1 T C n x x where (1 , , ) ( 1) 0 F p n p n p n p C is such that (  ) 0 0 2 P T C H , the significance level of the test. 1.2.2 Inferences Concerning the Mean Vector When Covariance Matrix Has Compound Symmetry Structure Define the p variate mean vector ( ,..., ) 1 p μ . Wilks (1946) derived the exact likelihood ratio criterion for testing : 0 H equality of p entries of the mean vector μ or p H : μ 1 0 , where is an unknown real number and p 1 is a p1 vector with all entries equal to 1, when the covariance matrix Σ has compound symmetry structure as defined in (1.5). This could be done when the likelihood ratio criterion, which was also derived in the same paper, for testing : Σ 0 H has compound symmetry vs : Σ a H is unstructured, does not have a significantly small value. The compound symmetry covariance matrix is of the form 6 , 2 2 2 2 2 2 2 2 2 Σ (1.5) where 0 and ( 1) 1 1 p to ensure positive definiteness of the compound symmetry covariance structure of Σ . This structure assumes that the unknown p variances are all equal through the common intraclass correlation. Geisser (1963) derived the likelihood ratio test for testing 0 0 H : μ μ where 0 μ is a known constant, when the underlying covariance matrix has a compound symmetry structure as shown in (1.5). In this paper, the likelihood ratio test statistic L for testing 0 0 H : μ μ under the covariance matrix structure in (1.5) is of the form 1 1, 1 ( 1) 1,( 1)( 1) 1 1 1 1 1 1 n p p n p F n F n L , (1.6) 1 2 1 2 1 ( 1) 2 ( 1)( 1) 2 1 1 1 n p p n p L , (1.7) or 2 1 1 L B B p , (1.8) where p1,(n1)( p1) F and 1,n1 F are independent F random variables with degrees of freedom indicated in subscripts and 2 p1 , 2 1 , 2 ( p1)(n1) , and 2 n1 are independent chisquare random variables with the corresponding degrees of freedom shown in subscripts. 1 B and 2 B are independent beta variables ( 1)) 2 1 ( 1)( 1), 2 1 Beta( p n p and ) 2 1 ( 1), 2 1 Beta( n , respectively, based on the following properties about beta random variables. 7 Properties of beta random variables: (Bailey 1992) Let U and V be independent, ~ ( ) 2 U m , ~ ( ) 2 V n . Then ) 2 , 2 ~ ( m n Beta U V U . The rth raw moment of L can be calculated easily and approximations to the distribution of the product has been studied by Tukey and Wilks (1946) such that finding approximate critical values for the test is feasible. The hypothesis 0 0 H : μ μ is rejected when L is sufficiently small. 1.2.3 Inferences Concerning the Mean Vector When Covariance Matrix Is Circulant A circulant matrix of order p , or circulant in short, is a p p square matrix of the form 1 2 0 1 0 2 0 1 1 ( ) a a a a a a a a a a p p p ij A . (1.9) The elements of each row of the matrix A are identical to those of the previous row, but are moved one position to the right and wrapped around such that the last element of the previous row becomes the first element of the current row. Note that the whole circulant is evidently determined by the first row. Also we may denote the circulant A in (1.9) by ( , ,..., ) 0 1 1 p A circ a a a . So A is a p p circulant if and only if ij j i p a a ( ) , where ( j i)  p is defined as when . when , ( )  j i i j p j i i j j i p For more details about circulant matrices, refer to Davis (1979) and Graybill (1983). If a positive definite covariance matrix is circulant, it must also be symmetric. Examples for circulant 8 covariance matrices ( , ,..., ) 1 2 1 2 2 p circ with p 4 and p 5 are, respectively, 1 1 1 1 1 2 1 2 1 1 1 1 2 1 2 1 2 , and 1 1 1 1 1 1 2 2 1 2 2 1 1 2 1 1 2 1 1 2 2 1 2 2 1 2 , satisfying j p j of the symmetric circulant covariance matrix Σ of the form 1 1 1 1 2 1 2 1 1 2 p p p Σ . (1.10) If assuming 1 1 ... p in (1.10), the covariance matrix is said to be compound symmetric defined in (1.5). Olkin and Press (1969) have found the MLEs of the mean μ and covariance matrix Σ and have derived the exact likelihood ratio criteria for testing equality of p entries of the mean vector μ and the mean vector μ equal to zero when the covariance matrix Σ has a circulant structure. Their derivations for estimation and testing started by making the transformations on X and S such that Y XΓ 1/ 2 n , V ΓSΓ , where X and S are sample mean and sample covariance matrix as defined in (1.1). Γ is orthogonal such that it transforms the circulant covariance matrix Σ to diagonal form. Note that Y and V are independent. They also derived the likelihood ratio tests and asymptotic approximations of the test statistics for means and covariance matrices. They simultaneously tested (i) that the mean vector μ are zero and the covariance matrix is circulant, (ii) that the p entries of the mean vector μ are all equal and the covariance matrix is circulant, both against general alternatives that all the entries of μ are real numbers and the covariance matrix is positive definitive. 9 1.2.4 Inferences Concerning the Mean Vector When Covariance Matrix Is Block Compound Symmetry The estimating and testing problems for block compound symmetry arising from multivariate normal distributions was first studied by Votaw (1948). He proposed twelve hypotheses and tested them using likelihood ratio method. An introduction of the six hypotheses for one sample will be mentioned in Subsection 1.2.5. The other six hypotheses for k samples (k 2) are stated in Section 1.3. A more recent paper that estimated and tested concerning means and covariance matrices under block compound symmetry covariance structure is given by Szatrowski (1982). In his paper, two types of covariance structures – block compound symmetry of type I (BCSI) and block compound symmetry of type II (BCSII) were considered. The problem of testing 0 0 H : μ μ given that the covariance matrix has the block compound symmetry structure was also considered. In his paper, estimating and testing were based on maximum likelihood method. Null distributions of likelihood ratio statistics of the form  ˆ   ˆ  2/ Σ Σ n were simplified for some special cases of Votaw’s six hypotheses for single population, where is the parameter space under the alternative hypothesis, ω is the parameter space under the null hypothesis. Σ ˆ is the MLE of covariance matrix under the alternative hypothesis and Σ ˆ is the MLE of covariance matrix under the null hypothesis. Also the moments of 2/ n were obtained under the null and the approximate null distributions of 2log were found using Box’s approximation (1949). A BCSI assumption can be illustrated by the following example. Suppose that a standard test score of college calculus is a random variable 1 X with mean 1 . There are a set of three other alternative tests, namely 2 X , 3 X ,and 4 X with means 2 , 3 , and 4 , respectively. So the vector ( , , , )' 1 2 3 4 X X X X X forms a 41 normal random vector with mean 10 ( , , , )' 1 2 3 4 μ . Under the block compound symmetry of type I (BCSI) assumption, the covariance structure is of the form C D D B C D B D C B D D A C C C . (1.11) The hypothesis of interest is the interchangeability of variables 2 X , 3 X , and 4 X . It is equivalent to the hypothesis that the vector X has mean ( , , , )' 1 2 2 2 μ and the covariance structure is of the form in (1.11). That is the random vectors ( , , )' 1 2 3 4 X X X X , ( , , )' 1 2 4 3 X X X X , ( , , )' 1 3 2 4 X X X X , ( , , )' 1 3 4 2 X X X X , ( , , )' 1 4 2 3 X X X X , and ( , , )' 1 4 3 2 X X X X have the same distribution. For a more general case, consider b distinct standard tests and h sets of alternative tests, each of which measures ni abilities. That is, X is partitioned into b + h subsets and forms a b n p h i i 1 variate random vector. Under the BCSI assumption, within each subset of variates, the means are equal, the variances are equal, and the covariances are equal and between any two distinct subsets of variates, the covariances are equal. In regard to the BCSII assumption, we may consider the following example. Assume that there are two types of tests of cognitive abilities. Each type of cognitive tests measures the abilities of verbal (V) and thinking (T). So the two types of test scores are assumed to be a multivariate 41 normal random vector ( , , )' 1 2 3 4 Y Y Y Y Y with mean ( , , , )' 1 2 3 4 μ , where 1 Y and 2 Y are scores of verbal ability for type I and type II tests, respectively; 3 Y and 4 Y are scores of thinking ability for type I and type II tests, respectively. Under the compound symmetry of type II (CSII) assumption, the mean of Y reduces to ( , , , )' 1 1 3 3 μ , and the covariance matrix is of the form 11 F E D B E F B D C A F E A C E F . (1.12) The test of hypothesis of interest would be 1 2 , 3 4 and that the covariance matrix has BCSII structure shown in (1.12). Or equivalently be the test of simultaneous interchangeability of two types of measures for verbal and thinking abilities. For example, the distributions of ( , , )' 1 2 3 4 Y Y Y Y and ( , , )' 2 1 4 3 Y Y Y Y are the same but the distributions of ( , , )' 1 2 3 4 Y Y Y Y and ( , , )' 2 1 3 4 Y Y Y Y are not the same. These kinds of tests can also be applied to medical research especially for repeated measurements (Crowder & Hand 1990) data when comparing the effect of treatment and control groups (Morrison, 1972). For a more general case, one can consider n types of tests and h types of measures of cognitive abilities such that Y is an nh random vector. 1.2.5 Inferences Concerning Both Means and Covariance Matrices Wilks (1946) tested the hypothesis that a normal pvariate distribution has a complete symmetry covariance matrix structure as shown in (1.5) versus the hypothesis that the covariance matrix is unstructured by likelihood ratio test. In this paper, he also derived the LRT for testing p μ 1 and Σ is compound symmetry simultaneously against the general alternative that all the entries of μ are real numbers and the covariance matrix is positive definitive. Votaw (1948) first studied the problem of estimating and testing for block compound symmetry in data arising from multivariate normal distributions. He extended Wilks’ result by considering a normal pvariate random vector which can be partitioned in q mutually independent subsets of which b subsets contain exactly one variate each and the remaining q  b = h subsets (h ≥ 1) contain n1,…, nh variates, respectively, where nα ≥ 2; α = 1,…, h; b + n1+…+ nh = p. Let (1b, n1,…, nh ) denotes such a partition of a the pvariate random vector. Without loss of 12 generality, assume n1 ≤ … ≤ nh . A special case is that b = 0. For assumptions of block compound symmetry of type I and type II, Section 1.2.4 has given a brief introduction. In his paper, Votaw (1948) proposed 6 null hypotheses for testing the means or covariances or both based on a single sample. These hypotheses are: 1) ( ) 1 H mvc , 2) ( ) 1 H vc , 3) ( ) 1 H m , 4) ( ) 1 H mvc , 5) ( ) 1 H vc , and 6) ( ) 1 H m . The hypotheses 13 are for BCSI assumptions and the remaining three are for BCSII assumptions. The null hypotheses 1, 2, 4, and 5 are against the alternative hypothesis that the means are real numbers and the covariance matrix is positive definite. The statements of the above six hypotheses are as follows: ( ) 1 H mvc is the hypothesis that within each subset of variates, the means are equal, the variances are equal, and the covariances are equal and that between any two distinct subsets of variates, the covariances are equal. ( ) 1 H vc is the hypothesis that within each subset of variates, the variances are equal and the covariances are equal and that between any two distinct subsets of variates, the covariances are equal. ( ) 1 H m is the hypothesis that within each subset of variates, the means are equal, given that the variances are equal and the covariances are equal and that between any two distinct subsets, the covariances are equal. ( ) 1 H mvc is the hypothesis that within each subset of variates, the means are equal, the variances are equal, and the covariances are equal and that between any two distinct subsets of variates, the diagonal covariances are equal and the offdiagonal covariances are equal. ( ) 1 H vc is the hypothesis that within each subset of variates, the variances are equal and the covariances are equal and that between any two distinct subsets of variates, the diagonal covariances are equal and the offdiagonal covariances are equal. 13 ( ) 1 H m is the hypothesis that within each subset of variates, the means are equal, given that the variances are equal and the covariances are equal and that between any two distinct subsets of variates, the diagonal covariances are equal and the offdiagonal covariances are equal. Votaw derived the likelihood ratio for each hypothesis. In his paper, he also developed an explicit expression of the likelihood ratio criterion for each hypothesis and found its rth moment and approximate distribution when the corresponding hypothesis is true. Olkin and Press (1969) have considered the problem of 1) testing the null that Σ has complete symmetry versus the alternative hypothesis that Σ is a circulant; 2) testing the null that I 2 Σ versus the alternative hypothesis that Σ is a circulant; 3) testing the null hypothesis that Σ is a circulant versus the alternative hypothesis that Σ is positive definite. 1.3 HOMOGENEOUS MEAN MODELS FOR k POPULATIONS WITH k 2 Votaw (1948) tested the following hypotheses based on k samples: 1*) H (MVC  mvc) k , 2*) H (VC  mvc) k , 3*) H (M  mVC) k , 4*) H (MVC  mvc) k , 5*) H (VC  mvc) k , and 6*) H (M  mVC) k . The hypotheses 13 are for BCSI assumptions and the rest three are for BCSII assumptions. The statements of the above six hypotheses are as follows: H (MVC  mvc) k is the hypothesis that k normal pvariate distributions are the same given that they all satisfy ( ) 1 H mvc which is introduced in section 1.2.5. H (VC  mvc) k is the hypothesis that k normal pvariate distribution have the same variancecovariance matrix given that they all satisfy ( ) 1 H mvc . H (M  mVC) k is the hypothesis that k normal pvariate distributions are the same given that they all satisfy ( ) 1 H mvc and that they all have the same variancecovariance matrix. 14 H (MVC  mvc) k is the hypothesis that k normal pvariate distributions are the same given that they all satisfy ( ) 1 H mvc which is introduced in section 1.2.5. H (VC  mvc) k is the hypothesis that k normal pvariate have the same variancecovariance matrix given that they all satisfy ( ) 1 H mvc . H (M  mVC) k is the hypothesis that k normal pvariate distributions are the same given that they all satisfy ( ) 1 H mvc and that they all have the same variancecovariance matrix. For each of the above six hypotheses, Votaw developed the likelihood ratio test in terms of deriving the explicit expression of the likelihood ratio criteria 2 L , where is the likelihood ratio, for the hypotheses 1* – 4* and N L 2 / for the remaining two hypotheses, where N is total number of sample sizes for all k populations. He also found the rth moment and approximate distribution for each test hypothesis. Geisser (1963) compared the means of k pvariate normal populations under the assumption that the k normal populations have the common compound (complete) symmetry covariance structure using multivariate analysis of variance approach implemented by use of the information criterion (Chapter 9, Kullback 1959). 1.4. META ANALYSIS Meta analysis has been widely used to synthesize results from systematic reviews of reliable research in many fields. There has been a massive growth in application of meta analysis to areas such as medical research, health care, education (Glass, 1976), criminal justice, social policy, etc. See Kulinskaya et al. (2008) and Sutton et al. (2000) for a detailed account of meta analysis. A recent development of meta analysis has been summarized by Sutton and Higgins (2008). One uses a fixed effect model to combine treatment or parameter estimates when assuming no heterogeneity between the study results. In fact, point estimates of parameters from different 15 studies are almost always different. If the differences of the point estimates are only simply due to sampling error, that is, the source of variation between studies is random variation, we can use a fixed effect model. Sometimes the researchers prefer to believe that the true unknown parameters from different studies vary from one study to the next, the studies represent a random sample of the parameters that could have been observed and comes from a specific distribution. Under this situation, a random effects model will be considered in the analysis. The standard fixed effect model in metaanalysis is that if we have k independent studies, with data, each of which reports an estimate ( ) ˆ i for a common parameter . Each estimate ( ) ˆ i is assumed independently, normally distributed as ˆ ~ ( , ), 1,..., , 2 ( ) i k n N i i i (1.13) where is the sample size of ith study and 2 i is the underlying variance parameter for ith study. Given ( ( ) ˆ i , 2 i , i n ) the ML estimator for and its variance are, respectively, , ˆ ~ 1 2 1 ( ) 2 k i i i k i i i i n n (1.14) and . 1 ( ~) 1 2 k i i i n Var (1.15) Now consider the multivariate models with k independent samples, each of which i ini X X 1,..., is from ( , ) p i MVN μ Σ population for i 1,..., k . Suppose μ is the parameter vector of interest. The ML estimator for μ based on ith sample is ˆ , 1,..., . ( ) i k i i μ X (1.16) 16 Here we assume Σi is known for all i 1,...,k . In fact, ˆ 's (i) μ are independent and ), 1,..., . 1 ˆ ~ ( , ( ) i k n MVN i i i μ μ Σ (1.17) Given ( ( ) ˆ i μ , 2 i Σ , i n ) for the k studies, the ML estimator for μ and its variancecovariance matrix based on the k independent samples are respectively ~ ˆ , 1 ( ) 1 1 1 1 k i i i i k i i i μ n Σ n Σ (1.18) (~) . 1 1 1 k i i i Cov μ n Σ (1.19) Statistical inferences are based on the fact that ~ . ~ ' ~ 2 , 1 1 p k i i i μ μ n Σ μ μ (1.20) Applications of the proposed heterogeneous means normal model to random and fixed effects meta analysis will be developed and presented in Chapter 4. The proposed models will be stated in the next section. 1.5 PROPOSED HETEROGENEOUS MEANS MODELS Consider an independent sample M X ,..., X 1 such that X ~ (μ , Σ) i p i MVN , where μ C μ i i for all i 1,...,M , and both μ and Σ are unknown. The matrices i C are p p for all i 1,...,M and the covariance matrix Σ is positive definite. Some further restrictions will be considered later for i C when necessary. The likelihood function is . 1 2 1 2 1 2 1 1 2 1 ( ) ( )} 2 1 (2 )   exp{ ( ) ( )} 2 1 ( ,  ,..., ) (2 )   exp{ M i i i T i i Mp M M i i i T i i p M L Σ x C μ Σ x C μ μ Σ x x Σ x C μ Σ x C μ (1.21) 17 The covariance matrix Σ is patterned in order to make the maximum likelihood estimator (MLE) of μ vector not involve the ML estimator of Σ . Based on the likelihood function for a given sample, inferences for onesample and multisample data are presented in Chapters 2 and 3, respectively. The likelihood ratio test for onesample case for 0 0 H : μ μ is derived explicitly under some constraints on the matrices i C and covariance matrix Σ . Especially, i C is assumed circulant for all i. Σ is assumed compound (complete) symmetry of the form in (1.5). The distributions of the MLEs of the intraclass correlation and variance 2 , namely ˆ and 2 ˆ , respectively, are obtained and the behavior of ˆ is investigated in terms of its mean and standard deviation by a simulation study. For the twosample and multisample cases, the likelihood ratio test for testing k H : μ ... μ 0 1 is derived exactly assuming equal compound symmetry covariance matrix for the k populations. Large sample χ2 test is gained for each of onesample and twosample cases. An application of the proposed model to meta analysis is developed in Chapter 4. In traditional meta analysis, the sample from each study is assumed independently, identically distributed, while the sample from the proposed model is not the case. In Chapter 4, applications of the proposed model to fixed and random effects models for multivariate meta analysis (Jackson et al., 2011, Nam et al., 2003) about continuous outcomes will be developed and presented. Since the outcome measures in the proposed model are noncomparative continuous, onestage method for individual patient / participant data (IPD) random effects model is suggested by Higgins et al. (2001) to investigate the heterogeneity of the effects (parameters) among several studies. 18 CHAPTER II ONESAMPLE INFERENCE 2.1 INTRODUCTION AND PRELIMINARY CASES Consider an independent sample of size M, M X ,..., X 1 ~ (μ , Σ) p i MVN , where μ C μ i i for all i 1,...,M , and both μ and Σ are unknown. The matrices i C are p p for all i 1,...,M and the covariance matrix Σ is positive definite. Some further restrictions will be considered later for i C when necessary. The likelihood function is already shown in (1.21), thus the log likelihood function is ( ) ( ). 2 1 log   2 ( ) ( ) 2 1 log   2 log(2 ) 2 log ( ,  ,..., ) 1 1 1 1 1 M i i i T i i M i i i T M i i M constant Mp M L Σ x C μ Σ x C μ μ Σ x x Σ x C μ Σ x C μ (2.1) For simplicity, log ( ,  ,..., ) 1 M L μ Σ x x will be expressed as log L(μ, Σ  x) from now on. Let M i i i T i i Q 1 1 (x C μ) Σ (x C μ) . Our goal is to find the MLEs for μ and Σ . We can start by rewriting the log likelihood function in (2.1) such that maximizing log L(μ, Σ  x) , or equivalently minimizing Q with respect to μ , becomes easier. But Q can be expressed as 19 ( )( ) [ ], ( )( ) ( )( ) ( ) ( ) ( ) ( ) 1 1 1 1 1 1 1 1 1 1 1 Σ x C μ x C μ Σ V Σ x C μ x C μ Σ x C μ x C μ x C μ Σ x C μ x C μ Σ x C μ tr tr tr tr Q tr tr M i T i i i i M i T i i i i M i T i i i i M i i i T i i M i i i T i i where M i T i i i i 1 V (x C μ)(x C μ) . Define ˆ , 1 1 1 1 1 M i i T i M i i T i μ C Σ C C Σ X then V can be expressed as ( ˆ )( ˆ ) ( ˆ )( ˆ ) , ( ˆ )( ˆ ) ( ˆ ˆ )( ˆ ˆ ) 1 1 1 1 M i M i T i i i i T i i i i M i T i i i i M i T i i i i i i i i x C μ C μ C μ C μ C μ x C μ A C μ C μ C μ C μ V x C μ C μ C μ x C μ C μ C μ where M i T i i i i 1 A (x C μˆ )(x C μˆ ) . Hence we have ( ˆ )( ˆ ) , ( ˆ )( ˆ ) ( ˆ )( ˆ ) ( ˆ )( ˆ ) 1 1 1 1 1 1 1 1 1 1 M i T i i i i M i M i T i i i i T i i i i M i T i i i i tr tr tr tr Q tr tr Σ A Σ C μ C μ C μ C μ Σ x C μ C μ C μ Σ C μ C μ x C μ Σ A Σ C μ C μ C μ C μ where the second equality is justified by ( ˆ ) ( ˆ ) 0. ( ˆ )( ˆ ) ( ˆ )( ˆ ) 1 1 1 1 1 1 M i i i T i T M i T i T i i M i T i i i i tr tr μ μ C Σ x C μ Σ x C μ C μ C μ Σ x C μ μ μ C Likewise, we have ( ˆ )( ˆ ) 0. 1 1 M i T i i i i trΣ C μ C μ x C μ Hence Q can be expressed as 20 [ ( ˆ )] [ ( ˆ )]. [ ( ˆ )][ ( ˆ )] 1 1 1 1 1 1 M i i T i M i T i i tr Q tr tr Σ A C μ μ Σ C μ μ Σ A Σ C μ μ C μ μ Therefore, the log likelihood becomes ( ˆ ) ( ˆ ) , 2 1 log   2 log ( ,  ) 1 1 1 Σ A μ μ C Σ C μ μ μ Σ x Σ M i i T i T tr M L constant (2.2) where M i T i i i i 1 A (x C μˆ )(x C μˆ ) , M i i T i M i i T i 1 1 1 1 1 μˆ C Σ C C Σ x . We can base on the log likelihood expressed in (2.2) to find the MLEs for μ and/or Σ under some specified conditions. 2.1.1 Inference for μ When Σ Is Known From the log likelihood derived in (2.2), we can see that the third term of the righthand side is the only one involving μ . If M i i T i 1 1C Σ C is a positive definite matrix, the minimum of Q, w.r.t. μ , occurs at ˆ , 1 1 1 1 1 M i i T i M i i T i μ C Σ C C Σ X (2.3) which is the MLE of μ , a linear combination of 's i X . Note that μˆ is normally distributed with mean , ( ˆ ) 1 1 1 1 1 1 1 1 1 1 C Σ C C Σ C μ μ μ C Σ C C Σ X M i i T i M i i T i M i i T i M i i T i E E 21 and the covariance matrix Cov(μˆ ) obtained in the following way. Since μˆ satisfies the identity M i i T i M i i T i 1 1 1 1 C Σ C μˆ C Σ X , taking covariance on both sides yields ( ˆ ) . ( ˆ ) ( is positive definite) ( ˆ ) ( ) ( )( ) ˆ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M i i T i M i i T i M i i T i M i i T i M i i T i i M i T i T T i M i i T i T M i i T i M i i T i M i i T i M i i T i Cov Cov Cov Cov Cov Cov μ C Σ C C Σ C μ C Σ C C Σ C C Σ Σ Σ C C Σ C Σ C Σ C μ C Σ C C Σ X C Σ C Σ C μ C Σ X Hence μˆ is normally distributed as 1 1 1 ˆ ~ , M i i T p i μ MVN μ C Σ C , (2.4) which leads to the result 2 1 1 ( ˆ ) ( ˆ ) ~ p M i i T i T μ μ C Σ C μ μ . Therefore, for testing 0 0 H : μ μ we reject 0 H if 2 0 , 1 1 0 ( ˆ ) ( ˆ ) p M i i T i T μ μ C Σ C μ μ . 2.1.2 Inference for When Σ Is Unknown without Pattern When Σ is unknown we have the MLE of μ which has the same form as that in (2.3) with Σ replaced by Σ ˆ , the MLE of Σ . Hence the MLE of μ is μ 22 ˆ ˆ ˆ , 1 1 1 1 1 M i i T i M i i T i μ C Σ C C Σ X (2.5) where Σ ˆ is the MLE of Σ . Therefore based on a result of Anderson (2003, Lemma 3.2.2, p. 69) in connection with (2.2), we have the MLE of Σ M i T i i i i M 1 ( ˆ )( ˆ ) . ˆ 1 Σ X C μ X C μ (2.6) We should note that the expression of μˆ in (2.5) involves Σ ˆ . Recall that in the iid case, if μ μ i for all i, the MLE of μ does not involve Σ at all. In general, there are no explicit solutions for μˆ and Σ ˆ and the equations in (2.5) and (2.6) need to be solved iteratively for μˆ and Σ ˆ . Thus the approximation 2 1 1 ( ˆ ) ˆ ( ˆ ) p D M i i T i μ μ C Σ C μ μ (Crowder and Hand, 1990) is still attainable such that testing 0 0 H : μ μ asymptotically can be done. Nevertheless, to remove Σ ˆ in (2.5) such that the MLEs μˆ and Σ ˆ can be gained explicitly, we should consider a patterned covariance matrix Σ with details about inference for μ covered in Section 2.2. Before doing so, let us consider another structure of Σ in the next subsection. 2.1.3 Inference for μ When Σ V 2 , 2 Unknown, V Known Recall that X ~ (μ , Σ) i p i MVN . In this subsection, we consider the case that Σ V 2 , where 0 2 is an unknown constant and is a known positive definite matrix. So μ and 2 are the only unknown parameters. Therefore, the maximum likelihood estimator of μ is M i i T i M i i T i 1 1 1 1 1 μˆ C V C C V X . To find the MLE of 2 , let us consider the log likelihood function first. Define 2 , the log V 23 likelihood function is ( ) ( ), 2 1 ln 2 ln ( ,  ) 1 1 M i i i T i i Mp L μ x constant x C μ V x C μ which yields ( ) ( ). 2 1 2 ln 1 1 2 M i i i T i i L Mp x C μ V x C μ Setting the above equation zero and solving for , the MLE of 2 is M i i i T i i Mp 1 2 1 ( ˆ ) ( ˆ ) 1 ˆ X C μ V X C μ . Since μˆ is a linear combination of 's i X , the distribution of μˆ can be found as ˆ ~ ( , ). 1 1 2 1 M i i T p i μ MVN μ C V C Next, the distribution of 2 2 Mpˆ / can be shown to follow 2 distribution with p(M 1) degrees of freedom. We may also show that μˆ and 2 ˆ are independent. To proceed, partition the quantity M i i i T i i 1 1 (X C μ) V (X C μ) . That is, ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) 2 ( ˆ ) ( ˆ ) ( ˆ ˆ ) ( ˆ ˆ ) ( ) ( ) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 X C μ V X C μ μ μ C V C μ μ μ μ C V C μ μ X C μ V X C μ μ μ C V X C μ X C μ C μ C μ V X C μ C μ C μ X C μ V X C μ M i i T i T M i i i T i i M i i T i T M i i i T i T M i i i T i i M i i i i i T i i i i M i i i T i i We need to note that M i i i T i T 1 1 (μˆ μ) C V (X C μˆ ) equals 0 due to the fact that 24 ( ˆ ) 0 1 1 M i i i T i μ C X V C . Therefore we have M i i i T i i 1 2 1 (X C μ) V (X C μ) equal to ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ). 1 2 1 1 2 1 μ μ C V C μ μ μ C X V μ C X M i i T i T M i i i T i i We can show that both terms of the above quantity are independent by showing that each pair of μˆ and X C μˆ i i for all i 1,...,M are independent. Since both μˆ and X C μˆ i i are normally distributed, we can show that they are statistically independent by just showing that their covariance matrix is zero. That is, 0. ( ) ( , ) , ( ˆ , ˆ ) ( ˆ , ˆ ) ( ˆ , ) ( ˆ , ˆ ) 1 1 2 1 1 1 2 1 1 1 1 2 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 T k M i i T i T k M i i T i T k M i i T i T k M i i T i T k M i i T k k i T k M i i T i T k k M i i T i M i i T i k k k k Cov Cov Cov Cov Cov Cov C V C C C V C C C V C C V V C V C C C V C C V X X C V C C C V C C V X X μ μ C μ X C μ μ X μ C μ This implies that M i i i T i i 1 2 1 (X C μˆ ) V (X C μˆ ) and ( ˆ ) ( ˆ ) 1 2 1 μ μ C V C μ μ M i i T i T are statistically independent. In addition, using the result of sum of two independent chisquare random variables (Bain & Engelhardt 1992, page 284), we have 2 1 2 1 ( ˆ ) ( ˆ ) ~ p M i i T i T μ μ C V C μ μ implying that 2 ( 1) 1 2 2 2 1 ˆ / ( ˆ ) ( ˆ ) ~ M p M i i i T i i Mp X C μ V X C μ . Therefore, under H0: 0 μ μ , we have 25 p M p M i i T i T F M M p 2 ,( 1) 0 1 1 0 ~ ˆ ( 1) ( ˆ ) ( ˆ ) μ μ C V C μ μ , which can be used for testing 0 0 H : μ μ . 2.2 MAINSTREAM: INFERENCE FOR μ WHEN Σ HAS COMPOUND SYMMETRY STRUCTURE AND i C ARE CIRCULANT 2.2.1 Maximum Likelihood Estimators There are three conditions considered before deriving the MLEs for the unknown parameters. The theories developed later for section 2.2 are based on these three assumptions stated below. Condition (1). If T i T i C Σ Σ C 1 1 for all i, the MLE for μ in (2.3) reduces to ˆ . 1 1 1 M i i T i M i i T i μ C C C X (2.7) Condition (2). To guarantee T i T i C Σ Σ C 1 1 in Condition (1), we assume that i C is a circulant matrix for every i and Σ has a compound symmetry structure. The following theorem will be applied to this condition. Theorem 2.0: (Schott (1997): Theorem 7.58, page 303) Suppose that A and B are m×m circulant matrices. Then their product commutes; That is, AB = BA. Let Σ have the structure [(1 ) ], 2 p p Σ I J (2.8) 26 by rewriting the covariance matrix defined in (1.5), where 1 ( 1) p to ensure positive definiteness of Σ . Note that the eigenvalues for Σ in (2.8) are [1 ( 1) ] 2 p with multiplicity 1 and (1 ) 2 with multiplicity p 1. Thus Σ is a symmetric circulant matrix and we say Σ has compound symmetry which has been introduced in Subsection 1.2.2. For each i 1,...,M , if i C is also a circulant matrix, then we have T i T i C Σ Σ C 1 1 which results in the reduced form of μˆ shown in (2.7). Working on the log likelihood function in (2.1) with Σ of the form in (2.8), we may get the MLEs for and 2 . To find the MLEs for and 2 , first note that the determinant and inverse of Σ are, respectively, ( ) (1 ) [1 ( 1) ] 2 1 p p p Σ , and ] 1 ( 1) [ (1 ) 1 2 1 p p p Σ I J . (cf. Graybill, 1983, Theorem 8.34, page 190.) Let 2 , the log likelihood function in (2.1) becomes ( ) ( ). (1 ) 1 ( 1) 1 2 1 ( ) ( ) (1 ) 1 2 1 log ( 1) log(1 ) log[1 ( 1) ] 2 log ( , ,  ) 1 1 M i p i i T i i M i i i T i i p p p p M L constant x C μ J x C μ x C μ x C μ μ x (2.9) Let M i i i T i i B 1 ) ˆ ( ) ˆ ( 1 μ C x μ C x , and M i p i i T i i B 1 2 (x C μˆ ) J (x C μˆ ) , where p J is a p p square matrix with all elements equal to 1. To find the maximum likelihood estimators for ( ) 2 and , we take the first partial derivative of the log likelihood function in (2.9) with respect to and separately. So we have 2 2 (1 ) 1 ( 1) 1 1 2 (1 ) 1 2 log ( , ,  ) 2 2 B p B L M p μ x 27 and 2. 2 (1 ) [1 ( 1) ] 1 ( 1) 1 (1 ) 1 2 1 1 ( 1) 1 1 ( 1) 2 2 2 (1 ) [1 ( 1) ] (1 )[1 ( 1) ] {(1 )( 1) [1 ( 1) ]} 1 (1 ) 1 2 1 1 ( 1) 1 1 ( 1) 2 log ( , ,  ) 2 2 2 2 2 2 2 B p p B p M p p B p p p p B p L M p p μ x Setting 0 log ( ˆ , ,  ) L μ x and 0 log ( ˆ , ,  ) L μ x and solving for and , we have 2 1 ( 1) ˆ ˆ 1 (1 ˆ ) ˆ 1 B p B Mp (2.10) and 2 0. [1 ( 1) ˆ ] 1 ( 1) ˆ 1 ˆ(1 ˆ ) 1 1 ( 1) ˆ 1 1 ˆ ( 1) 2 2 2 B p p B p p p M (2.11) Note that ˆ in (2.10) can also be expressed as 1 ( 1) ˆ [1 ( 1) ˆ ] 1 ˆ 2 (1 ˆ ) ˆ 1 p p B B Mp . Inserting ˆ in (2.10) into (2.11) and solving for ˆ yields [1 ( 1) ˆ ] 1 2 0, [1 ( 1) ˆ ][( 1) ˆ 1 ( 1) ˆ ] 1 [ ( 1) ˆ 1 ( 1) ˆ ] 2 0 ( 1) ˆ [1 ( 1) ˆ ] 1 ˆ 2 [1 ( 1) ˆ ] 1 [1 ( 1) ˆ ] 2 0 0 1 ( 1) ˆ [1 ( 1) ˆ ] 1 [1 ( 1) ˆ ] 2 ˆ(1 ˆ ) 1 ( 1) ˆ 0 [1 ( 1) ˆ ] [1 ( 1) ˆ ] 1 [1 ( 1) ˆ ] 2 ˆ(1 ˆ ) 1 (1 ˆ )[1 ( 1) ˆ ] ( 1) ˆ 2 2 2 2 2 2 2 2 2 2 p B B p p p B p p B p p B B p B p B p p B p B Mp p p p B p B p Mp p 28 which implies 1 . 1 2 1 1 ˆ B B p (2.12) Substituting ˆ in (2.12) into (2.10) , the MLE for 2 is 1. 1 { 1 ˆ 1} (1 ˆ ) 1 2 1 ( 2/ 1 1) ˆ 1 (1 ˆ ) 1 2 1 ( 1) ˆ ˆ 1 (1 ˆ ) ˆ 1 ˆ 2 B Mp B B Mp B B B B Mp B p B Mp Hence we arrive at the following lemma. Lemma 2.1: Let ,..., ~ ( , ) 1 X X μ Σ M p i N , where μ C μ i i for all i 1,...,M , i C is circulant and [(1 ) ] 2 p p Σ I J defined in (2.8) such that T i T i C Σ Σ C 1 1 . Then the MLEs for μ , 2 , and are, respectively, M i i T i M i i T i 1 1 1 μˆ C C C X , 1 1 ˆ 2 B Mp , and 1 1 2 1 1 ˆ B B p , where M i i i T i i B 1 ) ˆ ( ) ˆ ( 1 μ C X μ C X , and M i p i i T i i B 1 2 (X C μˆ ) J (X C μˆ ) . 2.2.2 Hypothesis Testing for 0 0 H : μ μ Using LR Test In this subsection, the likelihood ratio test will be derived for 0 0 H : μ μ . Restrictions T i T i C Σ Σ C 1 1 for all i 1,...,M are still valid here and we also assume that Σ R 2 , where p p R (1)I J , and both 2 and are unknown. The following theorem states the likelihood ratio test for 0 0 H : μ μ under the above assumptions. 29 Theorem 2.1: Let ,..., ~ ( , ) 1 X X μ Σ M p i N , where μ C μ i i for all i 1,...,M , i C is circulant and [(1 ) ] 2 p p Σ I J defined in (2.8). The likelihood ratio test for testing 0 0 H : μ μ is to reject 0 H if , W C where C is such that (  ) 0 P W C H , and W is defined as: , ( ) ( ) ( ˆ ) ( ˆ ) ( ) ( ) ( ) ( ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) 1 0 0 1 1 1 1 0 0 0 0 1 1 M i p i i T i i M i p i i T i i p M i M i p i i T i i i i T i i M i M i p i i T i i i i T i i p p W X C μ J X C μ X C μ J X C μ X C μ X C μ X C μ J X C μ X C μ X C μ X C μ J X C μ where μˆ is defined in (2.7). Proof: The likelihood ratio for testing 0 0 H : μ μ is max ( , ) max ( , ) 0 μ Σ μ Σ θ θ L L , which can be further derived as follows: ˆ ( )( ) ˆ ( ˆ )( ˆ ) , 2 1 exp  ˆ   ˆ  ( ˆ ) ˆ ( ˆ ) 2 1  ˆ  exp ( ) ˆ ( ) 2 1  ˆ  exp ( ˆ ) ˆ ( ˆ ) 2 1 (2 )  ˆ  exp ( ) ˆ ( ) 2 1 (2 )  ˆ  exp max ( , ) max ( , ) 1 1 1 0 0 1 0 2 0 1 2 1 1 0 1 0 0 2 0 1 2 2 1 1 0 1 0 0 2 0 2 0 M i T i i i i M i T i i i i M M i i i T i i M M i i i T i i M M i i i T i i Mp M M i i i T i i Mp M tr tr L L Σ X C μ X C μ Σ X C μ X C μ Σ Σ Σ X C μ Σ X C μ Σ X C μ Σ X C μ Σ X C μ Σ X C μ Σ X C μ Σ X C μ μ Σ μ Σ θ θ where {( , )  is pd}, {( , )  , is pd} 0 0 μ Σ Σ μ Σ μ μ Σ . Showing 30 tr Mp M i T i i i i 1 0 0 1 0 Σˆ (X C μ )(X C μ ) and tr Mp M i T i i i i 1 1 Σˆ (X C μˆ )(X C μˆ ) from Appendix A.1 we obtain , ( ˆ ) (1 ˆ ) [1 ( 1) ˆ ] ( ˆ ) (1 ˆ ) [1 ( 1) ˆ ] ˆ ˆ 2 1 exp ˆ ˆ 2 0 1 0 2 0 2 1 2 2 0 2 2 0 M p p p p M M M M p p Mp Mp Σ Σ Σ Σ (2.13) where M i i i T i i Mp 1 0 0 2 0 ( ) ( ) 1 ˆ X C μ X C μ , 1 ( ) ( ) ( ) ( ) 1 1 ˆ 1 0 0 1 0 0 0 M i i i T i i M i p i i T i i p X C μ X C μ X C μ J X C μ , M i i i T i i Mp 1 2 ( ˆ ) ( ˆ ) 1 ˆ X C μ X C μ , and 1 ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) 1 1 ˆ 1 1 M i i i T i i M i p i i T i i p X C μ X C μ X C μ J X C μ . Using the above expressions for 2 0 ˆ , 0 ˆ , 2 ˆ , and ˆ in (2.13), we gain the likelihood ratio test as stated in this theorem (detail shown in Appendix A.2). The proof is complete. Although the likelihood ratio has been derived in Theorem 2.1, the null distribution of W in Theorem 2.1 is still not derived yet. Define 31 M i i i T i i B 1 0 0 0 1 (X C μ ) (X C μ ) and M i p i i T i i B 1 0 0 0 2 (X C μ ) J (X C μ ) . B1and B2 have been defined in Lemma 2.1. Hence W can be expressed as . 2 2 1 1 2 2 1 1 1 2 2 1 2 2 0 1 0 0 1 0 1 0 0 1 B B p B B B p B pB B B pB B B W p p p p Under the null hypothesis 0 0 H : μ μ , the exact, asymptotic, or approximate distributions of W is of our great interest. To find the exact null distribution of W , the following propositions are needed. Proposition 2.1: Under 0 0 H : μ μ , 0 0 2 1 1 B p B is distributed as a chisquare random variable with M( p 1) degrees of freedom times a constant (1 ) 2 ; that is, 2 ( 1) 2 0 0 2 (1 ) 1 1 M p d B p B . In addition, 0 0 2 1 1 1 B p B M is strongly convergent to a constant ( 1) (1 ) 2 p , that is, 2 ( 1) (1 ). 1 1 1 1 2 0 0 B p p B M wp Proof: Under 0 0 H : μ μ , the distribution of 0 X C μ i i is (0, Σ) p N , where the covariance matrix p p Σ (1 )I J 2 . It follows from Box (1954) that the quantities )( ) 1 ( ) ( 0 0 X C μ I J X C μ p p i i T i i i p Q , i 1,...,M 32 are independently, identically distributed like a p j j 1 2 1 random variable, where j ’s are the latent roots of , 1 (1 ) (1 ) (1 ) ) 1 ) (1 ) ( 1 ( 2 2 2 1 p p p p p p p p p p p p p p p p I J I J J J P Σ I J I J I J (2.14) and 2 1 ’s are independent chisquare random variables with 1 degree of freedom. Hence M i i M i p p i i T i i Q p B p B 1 1 0 0 0 0 )( ) 1 2 ( ) ( 1 1 X C μ I J X C μ is distributed as sum of M independent p j j 1 2 1 random variables. Because p p p I J 1 is symmetric idempotent, the latent roots of p p p I J 1 are 0’s or 1’s. In fact, the latent roots of p p p I J 1 are 1 with multiplicity p 1 and 0 with multiplicity 1. Therefore the latent roots of 1 P are (1 ) 2 with multiplicity p 1 and 0 with multiplicity 1. So we have 2 1 2 ~ (1 ) p iid i Q distribution (2.15) for all i 1,...,M , implying that 2 (1 ) (1 ) . 1 1 2 ( 1) 2 1 2 1 2 0 0 M p M d i p d B p B Moreover, based on SLLN in connection with (2.15), we have 33 (1 ) ( 1) ( 1) (1 ). )( ) 1 ( ) ( 1 2 1 1 1 1 2 2 2 1 0 0 0 0 E p p M p B p B M wp M i p p i i T i i X C μ I J X C μ The proof is complete. Proposition 2.2: Under 0 0 H : μ μ , 0 B2 is distributed as a chisquare random variable with M degrees of freedom times a constant 1 ( 1) 2 p p , that is, 2 2 0 2 1 ( 1) M d B p p . In addition, 0 2 1 B M is strongly convergent to a constant 1 ( 1) 2 p p , that is, 2 1 ( 1) 1 1 2 0 B p p M wp . Proof: Recall that M i p i i T i i B 1 0 0 0 2 (X C μ ) J (X C μ ) , where ( ) ( ) 0 0 X C μ J X C μ p i i T i i ’s are iid random variables. First we have 2 1 1 0 0 ) ( ) ( p j j d p i i T i i X C μ J X C μ , (2.16) where j ’s are latent roots of p p p p p P ΣJ (1 )I J J 1 ( p 1) J 2 2 2 . Note that the latent roots of p J is p with multiplicity 1 and 0 with multiplicity p 1. So j ’s are 1 ( 1) 2 p p with multiplicity 1 and 0 with multiplicity . Hence (2.16) becomes ( ) ( ) 1 ( 1) , 2 1 2 0 0 p p d p i i T i i X C μ J X C μ implying that 2 2 0 2 1 ( 1) M d B p p , (2.17) p 1 34 and 1 ( 1) (1) 1 ( 1) . ( ) ( ) 1 2 1 1 2 2 2 1 0 0 0 E p p p p M B M wp M i p i i T i i X C μ J X C μ The proof is complete. Proposition 2.3: 2 1 1 B p B is distributed as a chisquare random variable with (M 1)(p 1) degrees of freedom times a constant (1 ) 2 , that is, 2 ( 1)( 1) 2 2 (1 ) 1 1 M p d B p B . Proof: Assume that E(μˆ ) μ . So M i p p i i T i i 1 p )( ) 1 (X C μ) (I J X C μ can be written as )( ˆ ). 1 2 ( ˆ ) ( )( ˆ ) 1 2 ( ˆ ) ( 1 1 )( ) 1 ( ) ( 1 1 1 M i p p i i T i T p p M i i T i T M i p p i i T i i p p B p B p μ μ C I J X C μ μ μ C C I J μ μ X C μ I J X C μ Because both i C and p p p I J 1 are circulant matrices, i C and p p p I J 1 are commutable such that T p p p p i T i p p C I J I J )C 1 ) ( 1 ( . In connection with the fact that ( ˆ ) 0 1 M i i i T i C X C μ , we have )( ˆ ) 0 1 ( ˆ ) ( 1 M i p p i i T i T p μ μ C I J X C μ implying that 35 )( ˆ ). 1 ( ˆ ) ( )( ) 1 2 ( ) ( 1 1 1 1 μ μ C C I J μ μ X C μ I J X C μ p p M i i T i T M i p p i i T i i p p B p B (2.18) From Proposition 2.1 we have 2 ( 1) 2 1 )( ) (1 ) 1 ( ) ( M p M d i p p i i T i i p X C μ I J X C μ . Also we need to know the distribution of the second term of the last expression in (2.18). Because we have that ˆ ~ ( , ) 1 1 μ μ 0 C C Σ M i i T i MVN , the quadratic form )( ˆ ) 1 ( ˆ ) ( 1 μ μ C C I J μ μ p p M i i T i T p is distributed like the quantity p j j 1 2 1 , where j ’s are the latent roots of the matrix ) 1 ( 1 1 1 3 p p M i i T i M i i T i p P C C Σ C C I J . Note that Σ and M i i T i 1 C C commute so 3 1 ) 1 P Σ (I J P p p p , where 1 P is defined in (2.14). Hence we have 2 1 2 1 )( ˆ ) (1 ) 1 ( ˆ ) ( p d p p M i i T i T p μ μ C C I J μ μ . Note that 2 1 1 B p B and )( ˆ ) 1 ( ˆ ) ( 0 1 0 μ μ C C I J μ μ p p M i i T i T p are independent chisquare random variables since 2 1 1 B p B is a function of X C μˆ i i ’s , also X C μˆ i i and μˆ are 36 independent due to the fact that (X C μˆ , μˆ ) 0 i i Cov . In addition, using the result of sum of two independent chisquare random variables, we have 2 (1 ) . 1 1 2 ( 1)( 1) 2 M p d B p B The proof is complete. Proposition 2.4: B2 is distributed as a chisquare random variable with M 1degrees of freedom times a constant 1 ( 1) 2 p p ; that is, 2 1 2 2 1 ( 1) M d B p p . Proof: Assume that E(μˆ ) μ . Using the fact that ( ˆ ) 0 1 M i i i T i C X C μ , we have the expression for B2 that ( ) ( ) ( ˆ ) ( ˆ ). 2 ( ˆ ) ( ˆ ) 1 1 1 X C μ J X C μ μ μ C C J μ μ X C μ J X C μ p M i i T i T M i p i i T i i M i p i i T i i B The second term ( ˆ ) ( ˆ ) 1 μ μ C C J μ μ p M i i T i T of the last expression above is distributed as the quantity p j j 1 2 1 where j ’s are the latent roots of the matrix 2 1 1 1 4 P C C Σ C C J ΣJ P p p M i i T i M i i T i , where 2 P is previously defined in the proof of Proposition 2.2. Hence ( ˆ ) ( ˆ ) 1 ( 1) . 2 1 2 1 p p d p M i i T i T μ μ C C J μ μ 37 Since B2 and ( ˆ ) ( ˆ ) 1 μ μ C C J μ μ p M i i T i T are independent and we have from Proposition 2.2 that 2 2 1 ( ) ( ) 1 ( 1) M M d i p i i T i i p p X C μ J X C μ , B2 is distributed as the quantity 2 1 2 1 ( 1) M p p . The proof of Proposition 2.4 is complete. The following proposition can be used to show independence of 2 1 1 B p B and B2 required when finding the exact null distribution of the likelihood ratio test statistic W stated in Theorem 2.1. Proposition 2.5: Let Y X C μˆ i i i , M i i T i M i i T i 1 1 1 μˆ C C C X , p p A I J 1 , p p B J 1 , M i i T i S 1 Y AY A and M i i T i S 1 Y BY B . Then i T i Y AY and j T j Y BY are independent for all i and j and hence A S and B S are independent. Remark 2.1: Since i Y is a linear combination of M X vec X ,..., X 1 it can be expressed as Y M X i i , where i M is a pMp matrix with the structure T i M T i i T p i i T i i T i i M C Q C C Q C I C Q C C Q C C Q C 1 1 1 1 1 1 1 1 , where M i i T i 1 Q C C . Rewrite both A S and B S we have 38 M i i T i S 1 M X A M X A and M i i T i S 1 M X B M X B , X is distributed as a multivariate normal ( ,..., , ) 1 C μ C μ Σ Mp M M MVN vec I . And we have X M AM X A M i i T i T S 1 and X M BM X B M i i T i T S 1 . To show that A S and B S are independent, it suffices to show that M AM Σ M BM 0 M i i T M i M i i T i I 1 1 , (2.19) where Σ has compound symmetry with the structure [(1 ) ] 2 p p Σ I J and is circulant. The calculation of the matrix M i i T i M i i T i 1 1 M AM Σ M BM is complicated so another way to prove Proposition 2.5 is to show first that i T i Y AY and j T j Y BY are independent for all i and j. Proof of Proposition 2.5: Because both A and B are symmetric and idempotent, we may rewrite i T i Y AY and j T j Y BY respectively by i T i T i i T i Y AY Y AAY (AY ) AY , and i T i T i i T i Y BY Y BBY (BY ) BY . Note that i T i Y AY and j T j Y BY are squared lengths of i AY and j BY , respectively. So we only have to show that i AY and j BY are independent. Consider the distribution of the random vector 39 X BM AM BM X AM X BY AY j i j i j i , where i M is defined in Remark 2.1. j i BY AY is a linear combination of X which is normal so j i BY AY is normal. Thus showing (AY ,BY ) 0 i j Cov implies that i AY and j BY are independent normal random vectors then the proof is done. Since B is symmetric, we have (AY ,BY ) A (Y ,Y )B A (Y ,Y )B i j T i j i j Cov Cov Cov . Thus it suffices to show that ( , ) i j Cov Y Y is a circulant matrix so A , B , and ( , ) i j Cov Y Y commute implying that A (Y ,Y )B (Y ,Y )AB 0 i j i j Cov Cov , using the fact that AB 0 . To show that ( , ) i j Cov Y Y is a circulant matrix, we may use a direct proof. We have ( , ) ( ) ( ) ( ˆ ) , ( , ) ( )( ) ( ) ( ) ( ˆ ) ( , ) ( , ) ( , ) ( ˆ ) ( , ) ( , ˆ ) ( ˆ , ) ( ˆ , ˆ ) ( , ) ( , ˆ ) ( ˆ , ) ( ˆ , ˆ ) ( , ) ( ˆ , ˆ ) 1 1 1 1 1 1 1 1 T j i j T i j T i j i i j T j i j T i j T j T T i j i i T j i j M i i T i i T j M i i T i j i i T i j i j T i j i j i j i j i j i j i j i i j j Cov Var Var Var Cov Var Var Var Cov Cov Cov Var Cov Cov Cov Cov Cov Cov Cov Cov Cov Cov X X X C Q C C Q C X C μ C X X X Q C C C Q C X C μ C X X X Q C X C C Q C X X C μ C X X X μ C C μ X C μ μ C X X X C μ C μ X C μ C μ Y Y X C μ X C μ where M i i T i 1 Q C C . Note that ( , ) i j Cov X X = Σ if i j and 0 otherwise. Also from Section 2.3.1 we have μ Q Σ 1 ( ˆ ) Var and the fact that Σ , i C , and Q are circulant matrices implying that their inverse and transpose are also circulant so the commutability holds. Hence ( , ) i j Cov Y Y becomes 40 ( ) , if . ( ) , if , if , if ( , ) 1 1 1 1 1 1 1 1 i j I i j i j i j Cov T i j T i i T i j T i j T i j T i i T i i T i i i j C Q C Σ C Q C Σ 0 ΣC Q C C Q C Σ C Q ΣC Σ ΣC Q C C Q C Σ C Q ΣC Y Y Therefore ( , ) i j Cov Y Y is circulant. The proof of Proposition 2.5 is complete. Now, it is time to state and prove the following main results using Propositions 2.1 – 2.5. Theorem 2.2: The likelihood ratio test for testing 0 0 H : μ μ in Theorem 2.1 is to reject 0 H if W C , where C is such that (  ) 0 P W C H , and W is expressed as , 2 2 1 1 2 2 1 1 1 2 2 1 2 2 1 1 0 1 0 0 1 0 1 0 0 1 A C B D B B p B B B p B pB B B pB B B W p p p p p p where M i i i T i i B 1 1 (X C μˆ ) (X C μˆ ) and M i p i i T i i B 1 2 (X C μˆ ) J (X C μˆ ) , M i i i T i i B 1 0 0 0 1 (X C μ ) (X C μ ) and M i p i i T i i B 1 0 0 0 2 (X C μ ) J (X C μ ) , 0 0 2 1 1 B p A B , 2 1 1 B p B B , 0 C B2 , D B2 . Furthermore, under 0 0 H : μ μ , W is distributed as the random variable ** 1 * 1 1 1 1 1 1 1 F M F M p , 41 where * F and ** F are independent and distributed as p1,(M1)( p1) F , and 1,M1 F random variables, respectively. Proof: Recall from the proofs of propositions 2.1  2.4 that A B R, (2.20) where )( ˆ ) 1 ( ˆ ) ( 0 1 0 μ μ C C I J μ μ p p M i i T i T p R . Also C D S , (2.21) where ( ˆ ) ( ˆ ) 0 1 0 μ μ C C J μ μ p M i i T i T S . If we can show that B, R, D, and S are mutually independent, combined with the following facts (7), (8), (9), and (10), then the proof is done. Note that Facts (1)  (6) for showing pairwise independence among B, R, D, and S are sufficient for showing mutual independence among them. The facts needed to prove this theorem are shown below: (1) B and R are independent, (2) D and S are independent, (3) B and D are independent (Proposition 2.5), (4) B and S are independent, (5) R and D are independent, (6) R and S are independent, 42 (7) )( ˆ ) (1 ) (( 1)( 1)) 1 2 ( ˆ ) ( 1 1 2 2 1 M p p B p B B M d i p p i i T i i X C μ I J X C μ , (8) )( ˆ ) (1 ) ( 1) 1 ( ˆ ) ( 2 2 0 1 0 p p R d p p M i i T i T μ μ C C I J μ μ , (9) 2 ( ˆ ) ( ˆ ) 1 ( 1) ( 1) 2 2 1 D B p p M M d i p i i T i i X C μ J X C μ , and (10) ( ˆ ) ( ˆ ) ~ 1 ( 1) (1) 2 2 0 1 0 S p p p M i i T i T μ μ C C J μ μ . First, Facts (1), (2), (4), and (5) hold due to the facts that X C μˆ i i and μˆ are independent for each i. Fact (6) is true because I J J 0 p p p p ) 1 ( . Fact (3) is the result of Proposition 2.5. Facts (7) and (9) are direct results of Propositions 2.3 and 2.4, respectively. Facts (8) and (10) are shown in the proofs of Proposition 2.3 and 2.4, respectively. Hence the result that R , S , B , and D are independent in connection with the expressing of W D S B B R D S R B D W p p p 1 1 1 ( ) ( ) 1 1 1 fulfills the proof of Theorem 2.2. 2.2.3 Properties and Useful Results about ML Estimators In addition to the likelihood ratio test for testing 0 0 H : μ μ , the null distribution of the statistic ) ˆ ( )] ˆ ( ˆ ( ˆ ) [ 0 1 0 μ μ μ μ μ Var (2.22) also draws our attention. The exact null distribution of (2.22) is not easy to obtain while we may at least find its asymptotic distribution. First note that 43 Σ C C μ ˆ ) ' ( ) ˆ ( ˆ 1 1 M i i i Var , and p p p Σ I J 1 ( 1) ˆ ˆ ˆ (1 ˆ ) ˆ 1 2 1 . The quadratic form (2.22) can be phrased as: ˆ ( 1) ( ˆ ) ˆ ( ˆ ) 2 0 1 1 0 M M p M i i T i T μ μ C R C μ μ , (2.23) where p p p R I J 1 ( 1) ˆ ˆ (1 ˆ ) ˆ 1 1 . The following propositions are helpful for developing an approximate distribution of the statistic in (2.22) under the hypothesis 0 0 H : μ μ . Details of the derivation of the approximate null distribution of (2.22) will be shown in Subsection 2.2.4. Before deriving the approximate null distribution of the statistic in (2.22), let us first look at the following proposition about the MLE of 2 . Proposition 2.6: Let M i i i T i i B 1 1 (X C μˆ ) (X C μˆ ) . Then 1 2 1 ˆ B Mp is the MLE of 2 , and 2 2 ˆ 1 ˆˆ M M is an unbiased estimator for 2 . In addition the following results hold. a) 2 1 2 ) 1 1 ( 1 (ˆ ) M B Mp E E , and 2 1 2 ( 1) 1 ) ˆˆ ( B M p E E . 44 b) ) 1 [1 ( 1) ] ( 2( 1) ( ˆ ) 2 4 2 2 M p O M p M V , and ) 1 [1 ( 1) ] ( ( 1) 2 ) ˆˆ ( 2 2 4 M p O M p V . c) Both 2 ˆ and 2 ˆˆ are consistent estimators of 2 ; that is, 2 2 ˆ p and 2 2 ˆˆ p . Proof: Recall that the MLE of μ is M i i T i M i i T i 1 1 1 μˆ C C C X . To find E(B1) , ( ˆ ) 2 E , ( 1 ) 2 E B , and ( ˆ ) 2 V , recall that B1 can be expressed as 1 ( ) ( ) ( ˆ ) ( ' )( ˆ ), 1 1 X C μ X C μ μ μ C C μ μ M i i i M i i i T i i B and the following result (cf. Christenson (2002), Theorem 1.3.2) is needed. If E(Y) μ and Cov(Y) V then E(Y'AY) tr(AV) μ'Aμ. So we have ( ) ( ) ( 1) , [ ( ) 0] ( ' )( ' ) 0 ( 1) ( ) ( ) ( ˆ ) ( ' )( ˆ ) 2 1 1 1 1 1 1 Mtr tr M p tr tr E B E E M i i i M i i i M i M i i i M i i i T i i Σ Σ Σ C C C C Σ X C μ X C μ μ μ C C μ μ which implies that 2 2 ) 1 1 (1 1 (ˆ ) M B Mp E E . Next, ( ˆ ) ( ' )( ˆ ) 2 , 2 ( ) ( ) ( ˆ ) ( ' )( ˆ ) ( 1 ) ( ) ( ) 2 1 1 1 2 1 2 E A B C E E B E M i i i M i i i M i i i T i i M i i i T i i μ μ C C μ μ X C μ X C μ μ μ C C μ μ X C μ X C μ 45 where A, B, and C are, respectively, given by 2 ( ) ( ) 2 ( ), 2 ( ) ( ) ( ) ( ) ( ) ( ) (Neudecker & Magnus (1979),Theorem 4.2) ( ) ( ) 2 2 2 2 2 2 2 1 2 2 1 Σ Σ Σ Σ Σ X C μ X C μ X C μ X C μ X C μ X C μ X C μ X C μ M tr tr M M tr M tr Mtr E E E A E j j T i i j j T i i i j M i i i T i i M i i i T i i ( ) ( ) ( ˆ ) ( ' )( ˆ ) 1 1 X C μ X C μ μ μ C C μ μ M i i i M i i i T i i B E , and ( ) 2 ( ). ( ˆ ) ( ' )( ˆ ) 2 2 2 1 Σ Σ μ μ C C μ μ tr tr C E M i i i Let us attend to the representation of B. Define M i i i 1 'C C Q , we have M i i T i 1 1 μˆ Q C X . Thus the quadratic form (μˆ μ)Q(μˆ μ) can be rewritten as: ( ) ( ). ( ) ( ) ( ) ( ) ( is symmetric, circulant) ( ˆ ) ( ˆ ) ( ) ( ) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M i M j j j T i j T i i M i i T i M i i T i T M i i T i M i i T i M i i T i T M i i T i M i i T i T M i i T i C C C C X μ C Q C X μ C X C μ Q C X C μ C X Qμ Q C X Qμ Q μ μ Q μ μ Q C X Q Qμ Q Q C X Q Qμ So B can be expressed as 46 ( ) ( ) ( ) ( ) . ( ) ( ) ( ) ( ) 1 1 1 1 1 1 1 1 M i M j M k j j T i j T k k i i T k k M i M j j j T i j T i i M k k k T k k E C C B E C C X C μ X C μ X μ C Q C X μ X C μ X C μ X μ C Q C X μ (2.24) Consider the term in (2.24): ( ) ( ) ( ) ( ) 1 X C μ X C μ X μ C Q C X μ j j T i j T k k i i T k k E C C . (2.25) To calculate (2.25), the results of Magnus (1979) can be applied to the following two cases. Case 1: i j , i j k Σ C Q C Σ Σ C Q C Σ X C μ X C μ X C μ C Q C X C μ ( ) 2 ( ) ( ) ( ) ( ) ( ) 1 1 1 T k k T k k k k T k k T k k k k T k k tr tr tr E (2.26) For i, j such that i j k Σ C Q C Σ X C μ X C μ X C μ C Q C X C μ X C μ X C μ X C μ C Q C X C μ ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 1 1 T i i i i T i i T k k i i T k k i i T i i T k k i i T k k tr tr E E E (2.27) Case 2: i j , In this case, only one of i and j equal to k, or neither of them equal to k. For these two scenarios, (2.25) is equal to zero. That is, ( ) ( ) ( ) ( ) 0 1 X C μ X C μ X μ C Q C X μ j j T i j T k k i i T k k E C C . (2.28) Thus (2.24) becomes ( ) 2 ( ) ( 1) ( ) 2 ( ) 2 2 2 2 2 B trΣ tr Σ M trΣ M trΣ tr Σ (2.29) So we have 47 ( 1) [ ] 2( 1) [1 ( 1) ] , ( 1) 2( 1) ( ) 2 ( ) 2 2 ( ) ( ) 2 ( ) ( 1 ) 2 2 2 2 2 4 2 2 2 2 2 2 2 2 2 2 2 M p M p p M tr M tr M tr M tr M tr tr tr tr E B A B C Σ Σ Σ Σ Σ Σ Σ Σ and 2( 1) [1 ( 1) ] . 2( 1) ( ) ( 1) 2( 1) ( ) ( 1) ( 1) ( 1 ) ( 1) 2 4 2 2 2 2 2 2 2 2 M p p M tr M tr M tr M tr V B E B E B Σ Σ Σ Σ Hence we have [1 ( 1) ] , 1 ) 2 1 1 ( 1 [( ˆ ) ] 2 4 2 2 4 2 2 2 2 2 p M p M M p M B E E yielding ). 1 [1 ( 1) ] ( 2( 1) ) 1 [1 ( 1) ] (1 1 ) 2 1 (1 ( ˆ ) [( ˆ ) ] [ ˆ ] 2 4 2 2 4 2 4 2 2 4 2 2 2 2 2 M p O M p M M p M p M M V E E Therefore we have that 2 2 ˆ p . The proof is complete. Remark 2.2: Proposition 2.6 (a) and (b) can be shown more effortlessly by using the results about the distribution of B1 which will be stated in Theorem 2.3 later in this subsection. Theorem 2.3 (a) states that B1 is distributed as the quantity 2 ( 1)( 1) 2 (1 ) M p + 2 ( 1) 2 [1 ( 1) ] M p , where 2 (M1)( p1) and 2 (M1) are independent chisquared random variables with (M 1)( p 1) and (M 1) degrees of freedom, respectively. Hence the results 48 ( 1) , ( 1)( 1)(1 ) ( 1)[1 ( 1) ] ( ) [ (1 ) [1 ( 1) ] ] 2 2 2 2 ( 1) 2 2 ( 1)( 1) 2 1 M p M p M p E B E p M p M and 2 4 4 2 2 2 4 2 4 2 ( 1) 2 2 ( 1)( 1) 2 1 2( 1) [1 ( 1) ] 2( 1) {( 1)(1 ) [1 ( 1) ] } 2( 1)( 1)(1 ) 2( 1)[1 ( 1) ] ( ) [ (1 ) [1 ( 1) ] ] M p p M p p M p M p Var B Var p M p M obtained from Theorem 2.3 (a) are exactly the results of Proposition 2.6 (a) and (b), respectively. The following proposition is helpful to prove Theorem 2.3 (a). Proposition 2.7: (a) If p j j ij iid i A X 1 ~ , i 1,...,M , where ij X are independent 2 random variables with 1 degree of freedom. Then p j j M M d i i A 1 2 1 . (b) If 2 2 2 ( 1) 2 (1 ) [1 ( 1) ] M p M d A p , where 2 M( p1) and 2 M are independent, 2 1 2 2 1 2 (1 ) [1 ( 1) ] C p p d , where 2 p1 and 2 1 are independent, A BC , where B and C are independent, then B is distributed as the quantity 2 1 2 2 ( 1)( 1) 2 (1 ) [1 ( 1) ] M p M p . Proof: (a) Let j Y be independent chisquared random variables with M degrees of freedom. The moment generating function of M i i A 1 is 49 (1 2 ) ( ) ( ) ( ) ( ), ( ) (1 2 ) ( ) ( ) ( ) ( ) 1 1 1 1 1 1 1 1 / 2 1 1/ 2 1 1 1 t M t E e E e M t M t t M t M t E e E e p j j j p j j j j j j ij p j j ij p j j ij i M i i Y p t Y j t Y p j Y j p j M j M p j j M p j X j M M t X i M X i A A which is the moment generating function of the random variable p j j j Y 1 . (b) Since B and C are independent, we have the moment generating of A which can be expressed asM (t) M (t) M (t)M (t) A B C B C . The mgf of A is 1 2 (1 ) 1 2 [1 ( 1) ] . ( (1 )) ( [1 ( 1) ]) ( ) ( ) ( ) ( ) 2 ( 1) / 2 2 / 2 2 2 (1 ) [1 ( 1) ] (1 ) [1 ( 1) ] 2 2 ( 1) 2 2 2 ( 1) 2 2 2 2 ( 1) 2 M p M A p p t t p M t M t p M t M t M t M t M p M M p M M p M The mgf of C is 1 2 (1 ) 1 2 [1 ( 1) ] . ( (1 )) ( [1 ( 1) ]) ( ) ( ) ( ) ( ) 2 ( 1) / 2 2 1/ 2 2 2 (1 ) [1 ( 1) ] (1 ) [1 ( 1) ] 2 1 2 1 2 1 2 2 1 2 2 1 2 2 1 2 t t p M t M t p M t M t M t M t p C p p p p p Thus the mgf of B is 2 ( 1)( 1) / 2 2 ( 1) / 2 2 ( 1) / 2 2 1/ 2 2 ( 1) / 2 2 / 2 1 2 (1 ) 1 2 [1 ( 1) ] 1 2 (1 ) 1 2 [1 ( 1) ] 1 2 (1 ) 1 2 [1 ( 1) ] ( ) ( ) ( ) M p M p M p M C A B t t p t t p t t p M t M t M t which is the mgf of 2 1 2 2 ( 1)( 1) 2 (1 ) [1 ( 1) ] M p M p random variable, where 2 (M1)( p1) and 2 M1 are independent chisquared random variables with (M 1)(p 1) and M 1degrees of freedom, respectively. The proof is complete. 50 Proposition 2.7 will be used to prove the following theorem. Theorem 2.3: (a) M i i i T i i B 1 1 (X C μˆ ) (X C μˆ ) is distributed as the quantity 2 1 2 2 ( 1)( 1) 2 (1 ) [1 ( 1) ] M p M p , where 2 (M1)( p1) and 2 M1 are independent chisquared random variables with (M 1)( p 1) and (M 1) degrees of freedom, respectively. (b) B1has an approximate 2 ( 1) 2 M h g distribution, where (1) p p p g 2 2 ( 1)(1 ) [1 ( 1) ] , and (2) 2 2 2 ( 1)(1 ) [1 ( 1) ] p p p h . Proof: (a) Recall that B1 can be expressed as 1 ( ) ( ) ( ˆ ) ( ' )( ˆ ). 1 1 X C μ X C μ μ μ C C μ μ M i i i M i i i T i i B (2.30) The first term of the last expression in (2.30) has the same distribution as that of sum of M independent random variables 2 1 1 p j j ,where 2 1 are independent chisquare random variables with 1 degree of freedom. j ’s are eigenvalues of [(1 ) ] 2 p p Σ I J . The eigenvalues of Σ are (1 ) 2 with multiplicity p 1and 1 ( 1) 2 p with multiplicity 1. Thus M i i i T i i 1 (X C μ) (X C μ) is distributed as M independent random variables each of which is 2 1 2 2 1 2 (1) 1 ( 1) p p ; that is, 51 2 2 2 ( 1) 2 1 ( ) ( ) (1 ) 1 ( 1) M p M M d i i i T i i p X C μ X C μ . Similarly for the second term of the last expression in (2.30), ( ˆ ) ( ' )( ˆ ) 1 μ μ C C μ μ M i i i is distributed like the quantity 2 1 2 2 1 2 (1) 1 ( 1) p p because μˆ μ is distributed as ( , ( ' ) ) 1 1 0 C C Σ M i i i N , implying that ( ˆ ) ( ' )( ˆ ) 1 μ μ C C μ μ M i i i is distributed as the quantity p j j 1 2 1 ,where j ’s are eigenvalues of C C C C Σ Σ 1 1 1 ( ' )( ' ) M i i i M i i i . Since B1and ( ˆ ) ( ' )( ˆ ) 1 μ μ C C μ μ M i i i are independent, based on Proposition 2.7 B1is distributed as the quantity 2 1 2 2 ( 1)( 1) 2 (1 ) [1 ( 1) ] M p M p , where 2 (M1)( p1) and 2 M1 are independent chisquared random variables with (M 1)( p 1) and (M 1) degrees of freedom, respectively. The proof of part (a) is complete. (b) Box (1954, Theorem 3.1) showed that p j p i j i g 1 1 2 / and p i i p j j h 1 2 2 1 / are chosen so that the distribution of p j j 1 2 1 has the same first two moments as 2 2 h g . Since B1 is distributed like sum of M 1 independent random variables 2 1 1 p j j , B1 has an approximate 2 ( 1) 2 M h g distribution. The proof of Theorem 2.3 (b) is complete. Corollary 2.1: The test statistic for testing 2 0 2 0 H : is 52 M i i i T i i Mp 1 2 ˆ (X C μˆ ) (X C μˆ ) . Under 0 H , 2 ( 1) ˆ . 2 0 2 ~ ˆ ˆ M h g Mp , where p p p g 2 2 ( 1)(1 ˆ ) [1 ( 1) ˆ ] ˆ , and 2 2 2 ( 1)(1 ˆ ) [1 ( 1) ˆ ] ˆ p p p h . Proof: It follows directly from Theorem 2.3. The maximum likelihood estimator of is biased while its approximate mean is and the approximate variance can also be obtained. Some results about the maximum likelihood estimator of are shown in the next proposition. Proposition 2.8: The MLE of , namely 1) 1 2 ( 1 1 ˆ B B p , where M i i i T i i B 1 ) ˆ ( ) ˆ ( 1 μ C X μ C X and M i p i i T i i B 1 2 (X C μˆ ) J (X C μˆ ) , has the following properties: a) , [1 ( 1) ] 1 ( 1) ( 1, 2) 2 p p p Corr B B b) 1) 1 2 ( 1 1 ( ˆ ) B B p E E , ( 1) [1 ( 1) ] (1 ) 1 2 ( ˆ ) 2 2 p p p M V , and c) ˆ in probability. Proof: Recall that 53 ( ) ( ) ( ˆ ) ( ˆ ), 1 ( ˆ ) ( ˆ ) 1 1 X C μ X C μ μ μ Q μ μ X C μ X C μ T M i i i T i i M i i i T i i B (2.31) ( ) ( ) ( ˆ ) ( ˆ ), 2 ( ˆ ) ( ˆ ) 1 1 X C μ J X C μ μ μ QJ μ μ X C μ J X C μ p T M i p i i T i i M i p i i T i i B (2.32) and M i M j j j T i j T i i C C 1 1 1 (μˆ μ) Q(μˆ μ) (X μ) C Q C (X μ), (2.33) where M i i T i 1 Q C C . Similarly, ( ˆ ) ( ˆ ) ( ) ( ). 1 1 1 M i M j p j j T i j T p i i T μ μ QJ μ μ X C μ C Q C J X C μ (2.34) Note that i C , 1 Q , T j C , and p J commute with each other since all of them are circulant matrices. The commutation property will be used when necessary in calculations. So Cov(B1,B2) can be expressed as , ( ˆ ) ( ˆ ), ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ), ( ) ( ) ( ) ( ), ( ˆ ) ( ˆ ) ( 1, 2) ( ) ( ), ( ) ( ) 1 1 1 1 D E F G Cov Cov J Cov Cov B B Cov p M i p i i T i i p M i i i T i i M i p i i T i i M i i i T i i μ μ Q μ μ μ μ QJ μ μ μ μ Q μ μ X C μ X C μ X C μ X C μ μ μ QJ μ μ X C μ X C μ X C μ J X C μ The derivations of D, E, F, and G are shown below. 54 ( ) ( ) 2 ( ) ( ) ( ) 2 ( ) 2 [1 ( 1) ] , ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ), ( ) ( ) if , covariance is zero. ( ) ( ), ( ) ( ) ( ) ( ), ( ) ( ) 2 2 4 1 1 1 1 1 1 1 tr tr tr tr tr M tr Mp p E E E Cov i j Cov D Cov p M i p p p M i p i i T i i i i T i i p i i T i i i i T i i M i p i i T i i i i T i i M i M j p j j T i i j j T i i M j p j j T j j M i i i T i i Σ J Σ Σ J Σ Σ J Σ J Σ X C μ X C μ X C μ J X C μ X C μ X C μ X C μ J X C μ X C μ X C μ X C μ J X C μ X C μ X C μ X C μ J X C μ X C μ X C μ X C μ J X C μ 2 ( ) 2 ( ) 2 ( ), ( ) ( ) 2 ( ) ( ) ( ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ), ( ˆ ) ( ˆ ) 1 1 2 1 1 1 1 1 1 QQ ΣQJ Q Σ ΣJ Σ J Σ QQ Σ QJ Q Σ QQ ΣQJ Q Σ QQ Σ QJ Q Σ μ μ Q μ μ μ μ QJ μ μ μ μ Q μ μ μ μ QJ μ μ μ μ Q μ μ μ μ QJ μ μ p p p p p p p p p tr tr tr tr tr tr tr tr E E E G Cov if , covariance is zero ( ) ( ),( ) ( ) ( ) ( ), ( ) ( ) ( ) ( ),( ) ( ) ( ) ( ), ( ) ( ) ( ) ( ), ( ˆ ) ( ˆ ) , , 1 1 , , 1 1 1 1 1 1 1 1 1 1 1 i j Cov C C C C Cov C C C C Cov C C C C Cov C C E Cov M i j k i j k p j j T i j T k k i i T k k M i j k i j k p j j T i j T k k i i T k k M i M j M k p j j T i j T k k i i T k k M i M j p j j T i j T i i M i i i T i i p M i i i T i i X μ X μ X μ C Q C J X μ X μ X μ X μ C Q C J X μ X μ X μ X μ C Q C J X μ X C μ X C μ X μ C Q C J X μ X C μ X C μ μ μ QJ μ μ 55 2 ( ) 2 ( ) 2 ( ). ( ) ( ) 2 ( ) ( ) ( ) 0 ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ), ( ) ( ) ( ) ( ), ( ) ( ) 1 2 1 1 1 1 1 1 , 1 1 1 1 1 1 , 1 1 1 1 Σ C C Q J Σ Σ QQ J Σ J Σ Σ C C Q J Σ Σ C C Q J Σ Σ C C Q J Σ X μ X μ X μ C C Q J X μ X μ X μ X μ C C Q J X μ X μ X μ X μ C C Q J X μ X μ X μ X μ C C Q J X μ X μ X μ X μ C C Q J X μ X μ X μ X μ C C Q J X μ p p M i i p T i M i i p T i p i T i p i T i M i k i k i p i i T i T k k i i T k k i p i i T i T k k i i T k k M i i p i i T i T i i i i T i i i p i i T i T i i i i T i i M i k i k i p i i T i T k k i i T k k M i i p i i T i T i i i i T i i tr tr tr tr tr tr tr tr E C C E C C E C C C C E C C E C C E C C C C Cov C C C C Cov C C C C Similarly, ( ˆ ) ( ˆ ), ( ) ( ) 2 ( ) 2 1 μ μ Q μ μ X C μ X C μ J Σ p M i p i i T i i F Cov J tr . So D, E, F, and G are, respectively, 2 ( ) 2 [1 ( 1) ] , 2 2 4 D M tr Mp p p J Σ and 2 ( ) 2 [1 ( 1) ] . 2 2 4 E F G tr p p p J Σ Therefore, Cov(B1,B2) and Corr(B1,B2) are respectively 2( 1) ( ) 2( 1) [1 ( 1) ] , ( 1, 2) 2 2 4 M tr M p p Cov B B D E F G p J Σ (2.35) and . [1 ( 1) ] 1 ( 1) 2( 1) [1 ( 1) ] 2( 1) [1 ( 1) ] 2( 1) [1 ( 1) ] ( 1) ( 2) ( 1, 2) ( 1, 2) 2 2 4 2 2 4 2 4 p p p M p p M p p M p p Var B Var B Cov B B Corr B B 56 Finally, we may compute the approximate mean and variance of 2 1 B / B using the firstorder Taylor’s series in two variables f (x, y) x / y , y 0 . Hence we have 1 ( 1) , [1 ( 1) ] ( 1) ( ) ( 1) ( ) ( 1) ( 2) ) 1 2 ( 2 2 p p p p M tr M tr E B E B B B E p Σ J Σ (2.36) and , [1 ( 1) ] ( 1)(1 ) 1 2 [1 ( 1) ] 2[1 ( 1) ] ( 1) 2[1 ( 1) ] ( 1) 4[1 ( 1) ] ( 1) 2[1 ( 1) ] 1 2 [1 ( 1) ] ( 2) ( 1) ( 2, 1) 2 ( 1) ( 1) ( 2) ( 2) ( 1) ( 2) ) 1 2 ( 2 2 2 2 2 2 2 2 2 2 p p p M p p p M p p M p p M p p M p E B E B Cov B B E B V B E B V B E B E B B B V (2.37) implying that ˆ in probability. The proof is complete. The following theorem states the exact distribution of the MLE of . Theorem 2.4: The MLE of , say 1) 1 2 ( 1 1 ˆ B B p with M i i i T i i B 1 1 (X C μˆ ) (X C μˆ ) and M i p i i T i i B 1 2 (X C μˆ ) J (X C μˆ ) is distributed as the quantity 1 1 1 ( 1) 1 (1 )( 1) 1 (M 1)( p 1),M 1 F p p p p . 57 Remark 2.3: ˆ is between ( 1) 1 p and 1 since the ratio 1 2 B B is between 0 and p. To show this, first we have that 0 1 2 B B implying 1 ( 1) p since B1 0and B2 0 for nonzero vectors X C μˆ i i . Secondly, consider the identity i i i p i i p p i x x x ( p )x x ( p )x 1 1J I J . Since all the three quantities i i xx , i p i x ( p )x 1J , and i p p i x ( p )x 1I J are positive for nonzero vectors i x , the inequality M i i p i M i i i x x x p x 1 1 1 ( J ) holds and it implies that x x x x p M i M i p i i i i 1 1 J / . Hence 1. Proof of Theorem 2.4: Recall from (2.31) and (2.32) that 1 ( ) ( ) ( ˆ ) ( ˆ ), 1 μ μ Q μ μ μ C X μ C X T M i i i T i i B and 2 ( ) ( ) ( ˆ ) ( ˆ ), 1 μ μ QJ μ μ μ C X J μ C X p T M i p i i T i i B where M i i T i 1 Q C C . And we have B1 B1(1/ p)B2(1/ p)B2 . Since from Proposition 2.5 B1(1/ p)B2and (1/ p)B2 are independent, and from Propositions 2.3 and 2.4 B1(1/ p)B2 and (1/ p)B2 are distributed as (1 ) (( 1)( 1)) 2 2 M p and 1 ( 1) ( 1) 2 2 p M random variables, respectively, we have , 1 (1/ ) 2 1 (1/ ) 2 (1/ ) 2 1 (1/ ) 2 2 1 2 p B B p B p B p B p B B B B which is distributed as 1 ( 1)( 1), 1 ( 1) 1 1 ( 1) 1 M p M p F p p random variable where 58 (M1)( p1),M1 F is F random variable with (M 1)(p 1) and M 1 as the numerator and denominator degrees of freedom, respectively. Thus ˆ is distributed as the quantity 1 1 1 ( 1) 1 (1 )( 1) 1 (M 1)( p 1),M 1 F p p p p . The proof is complete. For the rest of this subsection, a simulation study is performed to investigate the behavior of ˆ based on the distribution of ˆ obtained from Theorem 2.4. Figure 1 and Figure 2 show the expectation and the standard deviation of ˆ , the MLE of , for each value ( ( 1) ,1) 1 p via a simulation study with various combinations of dimensions p 2, 3, 4, 5, 6, 7 and sample sizes M 2, 3, 5, 10, 20, 50, 100. Note that the starting points of on the xaxis are different for various p values since the restriction on is 1 ( 1) p due to the requirement of a positive definite compound symmetry covariance matrix structure. Summarizing the information provided from Figure 1 and Figure 2 we have the following results: About the expectation of ˆ : (1) When 0 , the MLE ˆ is unbiased. This can also be verified by looking at the pdf of ˆ stated in Theorem 2.4 for the special case that 0 . With 0 , ˆ is distributed like the random variable ( , ) 1 1 1 p Beta p , where (M 1) / 2 , (M 1)(p 1) / 2 , and Beta(, ) is the beta random variable . Therefore ˆ is unbiased since 59 1 0 1 1 1 1 1 ( , ) 1 1 1 ( ˆ ) 1 p p p p p p EBeta p E . (2) When is close to one of the end points 1 ( 1) p and 1, ˆ tends to be unbiased. Otherwise, when 0, ˆ overestimates ; when 0 , ˆ underestimates . (3) When the sample size M increases, ˆ becomes more accurate. Actually from the results of Proposition 2.8, ˆ converges in probability to . About the standard deviation of ˆ : (1) When p = 2, the function of the standard deviation of ˆ is like an upsidedown bathtub when M is small. When the same size increases, the bathtub shape become flatter. (2) When p > 2, the bathtub shape is not symmetric and shrinks to the right. (3) Basically, with fixed p and , the standard deviation decreases when the sample size increases. Figures 3 and 4 illustrate the simulated probability density functions for the MLE of for the cases p = 2 and p = 3, respectively. Various sample sizes 2, 5, 20, and 40 are considered for each figure. Summarizing the information provided from these two figures we have the following results about the probability density function of ˆ : (1) With fixed p, when sample size is very small (M = 2), the probability density function is bimodal. Otherwise it is unimodal. (2) With fixed p, when sample size becomes larger, the pdf of ˆ becomes more concentrated and symmetric. (3) With fixed sample size, when is less than 0, the pdf is skewed to the right; otherwise it is skewed to the left. 60 (4) With fixed sample size, when is more extreme, the pdf of ˆ is steeper. Figure 1 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 simulation study of expected rho.hat with p= 2 rho E(rho.hat) M=2 M=3 M=5 M=10 M=20 M=50 M=100 1.0 0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 simulation study for standard deviation of rho.hat with p= 2 rho SD(rho.hat) 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 simulation study of expected rho.hat with p= 3 rho E(rho.hat) M=2 M=3 M=5 M=10 M=20 M=50 M=100 1.0 0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 simulation study for standard deviation of rho.hat with p= 3 rho SD(rho.hat) 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 simulation study of expected rho.hat with p= 4 rho E(rho.hat) M=2 M=3 M=5 M=10 M=20 M=50 M=100 1.0 0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 simulation study for standard deviation of rho.hat with p= 4 rho SD(rho.hat) 61 Figure 2 1.00.50.00.51.01.00.50.00.51.0simulation study of expected rho.hat with p= 5rhoE(rho.hat)M=2M=3M=5M=10M=20M=50M=1001.00.50.00.51.00.00.20.40.60.81.0simulation study for standard deviation of rho.hat with p= 5rhoSD(rho.hat)1.00.50.00.51.01.00.50.00.51.0simulation study of expected rho.hat with p= 6rhoE(rho.hat)M=2M=3M=5M=10M=20M=50M=1001.00.50.00.51.00.00.20.40.60.81.0simulation study for standard deviation of rho.hat with p= 6rhoSD(rho.hat)1.00.50.00.51.01.00.50.00.51.0simulation study of expected rho.hat with p= 7rhoE(rho.hat)M=2M=3M=5M=10M=20M=50M=1001.00.50.00.51.00.00.20.40.60.81.0simulation study for standard deviation of rho.hat with p= 7rhoSD(rho.hat)62 Figure 3 Figure 4 1.0 0.5 0.0 0.5 1.0 0 5 10 15 simulated pdf of the MLE of rho p=2,M=2 N = 1000000 Bandwidth = 0.01909 Density rho=.9 rho=.7 rho=.3 rho=.1 rho=0 rho=.1 rho=.3 rho=.7 rho=.9 1.0 0.5 0.0 0.5 1.0 0 5 10 15 simulated pdf of the MLE of rho p=2,M=5 N = 1000000 Bandwidth = 0.006202 Density rho=.9 rho=.7 rho=.3 rho=.1 rho=0 rho=.1 rho=.3 rho=.7 rho=.9 1.0 0.5 0.0 0.5 1.0 0 5 10 15 simulated pdf of the MLE of rho p=2,M=20 N = 1000000 Bandwidth = 0.002563 Density rho=.9 rho=.7 rho=.3 rho=.1 rho=0 rho=.1 rho=.3 rho=.7 rho=.9 1.0 0.5 0.0 0.5 1.0 0 5 10 15 simulated pdf of the MLE of rho p=2,M=40 N = 1000000 Bandwidth = 0.001765 Density rho=.9 rho=.7 rho=.3 rho=.1 rho=0 rho=.1 rho=.3 rho=.7 rho=.9 0.5 0.0 0.5 1.0 0 5 10 15 simulated pdf of the MLE of rho p=3,M=2 N = 1000000 Bandwidth = 0.009229 Density rho=.4 rho=.2 rho=.1 rho=0 rho=.1 rho=.2 rho=.4 rho=.7 rho=.9 0.5 0.0 0.5 1.0 0 5 10 15 simulated pdf of the MLE of rho p=3,M=5 N = 1000000 Bandwidth = 0.00464 Density rho=.4 rho=.2 rho=.1 rho=0 rho=.1 rho=.2 rho=.4 rho=.7 rho=.9 0.5 0.0 0.5 1.0 0 5 10 15 simulated pdf of the MLE of rho p=3,M=20 N = 1000000 Bandwidth = 0.002121 Density rho=.4 rho=.2 rho=.1 rho=0 rho=.1 rho=.2 rho=.4 rho=.7 rho=.9 0.5 0.0 0.5 1.0 0 5 10 15 simulated pdf of the MLE of rho p=3,M=40 N = 1000000 Bandwidth = 0.001483 Density rho=.4 rho=.2 rho=.1 rho=0 rho=.1 rho=.2 rho=.4 rho=.7 rho=.9 63 2.2.4 Hypothesis Testing for 0 0 H : μ μ Using Approximate χ2 Test Using the results from Subsection 2.2.3 that 2 2 ˆ p and p ˆ , we arrive at the following approximation theorem which can be used to test the hypothesis 0 0 H : μ μ . Theorem 2.5: 1 2 ) ˆ ( )] ˆ ( ˆ ( ˆ ) [ p d Var μ μ μ μ μ . Proof: Recall that 1 1 1 1 ) ' ( ˆ )] ˆ ( ˆ [ M i i i Var C C Σ μ , p p p Σ I J 1 ( 1) ˆ ˆ ˆ (1 ˆ ) ˆ 1 2 1 . Also we have the expression . (1 ) ˆ (1 ˆ ) ' ( ˆ ) 1 ( 1) ˆ 1 ( 1) ˆ (1 ) 1 ( ˆ ) (1 ) ˆ (1 ˆ ) ' ( ˆ ) (1 ) 1 ( 1) 1 ( ˆ ) ' ( ˆ ) 1 ( 1) ˆ ˆ ˆ (1 ˆ ) 1 ( ˆ ) ) ˆ ( )] ˆ ( ˆ ( ˆ ) [ 2 2 1 2 2 2 1 2 1 2 1 μ μ J C C μ μ μ μ I J C C μ μ μ μ I J C C μ μ μ μ μ μ μ M i p i i M i p p i i M i p p i i p p p p Var Since 1 (1 ) ˆ (1 ˆ ) 2 2 p and 0 1 ( 1) ˆ 1 ( 1) ˆ p p p , we have by Slutsky’s theorem that d Var ) ˆ ( )] ˆ ( ˆ ( ˆ ) [ 1 μ μ μ μ μ ( ˆ ) ( ' )( ˆ ) 1 1 μ μ C C Σ μ μ M i i i , which follows a 2 p distribution. The proof is complete. 64 2.3 SIMULATION STUDY FOR MISUSE OF HOMOGENEOUS MEAN MODELS In this section, power under 0 0 H : μ μ based on two test procedures, each of which corresponds to the same hypothesis but different model setting, will be compared for the purpose of showing that the usual test procedure for testing 0 0 H : μ μ is not appropriate when our data are polluted by some reasons but ignored by researchers. In each simulation, a sample of independent bivariate normal data m X , ..., X 1 , m 100 is generated from ( , ) 2 0 C μ Σ i MVN , where i 2 i0 C I C , where 0 0 0 0 0 i i i i i b a a b C . Note that i0 C is (symmetric) circulant, and thus so is i C . Two likelihood ratio tests are denoted by LRTCμ and LRTμ which are stated below respectively:  LRTCμ: LRT for testing 0 0 H : μ μ for homogeneous mean model ~ ( , ) 2 X C μ Σ i i N , and  LRTμ: LRT for testing 0 0 H : μ μ for heterogeneous means model ~ ( , ) 2 X N μ Σ i , where μ and Σ are unknown but has compound symmetry structure. Recall from Theorem 2.2 that the test statistics for LRTCμ is LRTCμ statistic = 0 1 0 0 1 1 2 2 1 2 2 pB B B pB B B p p , where M i i i T i i B 1 1 (X C μˆ ) (X C μˆ ) , M i p i i T i i B 1 2 (X C μˆ ) J (X C μˆ ) , M i i i T i i B 1 0 0 0 1 (X C μ ) (X C μ ) , M i p i i T i i B 1 0 0 0 2 (X C μ ) J (X C μ ) , and Σ 65 M i i T i M i i T i 1 1 1 μˆ C C C X . When C 0 i0 for all i, the two test statistics are the same. Under 0 0 H : μ μ , both of the test statistics are distributed as the random variable stated in Theorem 2.2. We reject the null hypothesis when the test statistics are sufficiently small. The simulation study is described as follows. Data: Data are generated from ( , ) 2 0 C μ Σ i N , where i 2 i0 C I C , 0 μ , Σ , and i0 C are shown in the first four columns in Table 1. Hypotheses: Both tests correspond to the hypothesis of interest 0 0 H : μ μ . Tests and critical value: Two likelihood ratio tests are performed based on the generated data. The critical value for the two tests is the same since the null distribution of both tests are the same. As we can see in Theorem 2.2, the null distribution of the test statistic of LRTCμ does not depend on the matrices i C . Number of simulations: The number of LRT values needed to compute the empirical alpha of the test LRTCμ or the rejection probability of the test LRTμ is 10000. Interpretation of the simulation study: Column 4 of Table 1 shows the diagonal elements i0 a of the matrices i C . For instance, i0 a =  .99(.02) means that the first value of i0 a is .99 10 a , then increases by 0.02 for each one unit increase of i. As denoted in column 5 from Table 1, the value (probability) in each cell is the empirical α for the test LRTCμ given the generated data from the heterogeneous means models. All the values in column 5 are close to 0.05, the significant level specified for the test and is as expected. On the other hand, since the data are polluted, adopting the test LRTμ does not make sense and is not appropriate. If we still consider that the 66 generated data are from the homogeneous mean model ( , ) 2 N μ Σ , the rejection probability for each scenario is shown in column 6 of Table 1. As we can see, the values of this column vary from one scenario to another. Some achieve the probability of 1 and some is less than 0.05. Generally, when the pollution of the data becomes more severe, that is when matrices i0 C is far away from zero matrix with a faster rate, the rejection probability is larger. Under the scenario C 0 i0 , all the three rejection probabilities are less than 0.05 and one of them is even 0. Lastly, the two rejection probabilities of column 6 are 1 even when data suffer only slight contamination ( i0 a =.001(.001) and i0 i0 b a for both of the two cases about Σ ). TABLE 1: Result of simulation study for misuse of homogeneous mean model (1) (2) (3) (4) (5) (6) μ0 Σ 0 0 0 0 0 i i i i i b a a b C Ci = I2 + Ci0 i=1,…, m Values of i0 a Testing 0 0 H : μ μ LRTCμ (Empirical α) LRTμ (Rejection Probability) 30 10 .5 1 1 .5 i0 i0 b a i0 a =.02(.02) .055 1 0 0 i b .045 1 i0 i0 b a .051 1 i0 i0 b a ΣCi0 = 0 i0 a =  .99(.02) .049 .015 0 0 i b .043 0 i0 i0 b a .047 .014 i0 i0 b a .00001(.00001) 0 i a .050 .0528 i0 a =.0001(.0001) .06 .407 =.001(.001) .047 1 .2 1 1 .2 i0 a =.00001(.00001) .056 .057 i0 a =.0001(.0001) .053 .18 =.001(.001) .046 1 i0 a i0 i0 b a i0 a 67 CHAPTER III MULTISAMPLE INFERENCE 3.1 INTRODUCTION In this chapter, we move on to the inference for multisample case when the heterogeneous means models are adopted. Twosample inference will be the starting point. Consider two independent samples M p i x i i x X ,..., X ~ MVN (μ ,Σ ), μ C μ 1 , for all i 1,...,M , and N p j y j j y Y ,...,Y ~ MVN (v , Σ ), v D μ 1 , for all j 1,...,N . Both i C and j D are known p p matrices. The hypotheses of interest are x y H : μ μ 0 versus a x y H : μ μ . The likelihood function is ( ) ( ) . 2 1 ( ) ( ) 2 1 exp ( , , , )     1 1 1 1 2 2 M i N j y j j y T x i i x j j y T i i x N y M x y x y x L constant x C μ Σ x C μ y D μ Σ y D μ μ μ Σ Σ Σ Σ The corresponding log likelihood function is N j y j j y T j j y M i x i i x T x y i i x x y x y M N constant L 1 1 1 1 ( ) ( ) ( ) ( ) 2 1 log 2 log 2 log ( , , , ) y D μ Σ y D μ Σ Σ x C μ Σ x C μ μ μ Σ Σ (3.1) First consider the simple case where both x Σ and y Σ are known. The MLEs for x μ and y μ 68 are, respectively, M i i T i M i i T x i 1 1 1 1 1 μˆ C Σ C C Σ X x x , N j y j T j N j y j T y i 1 1 1 1 1 μˆ D Σ D D Σ Y . x μˆ and y μˆ are independent and 1 1 1 ˆ ~ , M i x i T x p x i μ MVN μ C Σ C , 1 1 1 ˆ ~ , N j y j T y p y i μ MVN μ D Σ D , 1 1 1 1 1 1 ˆ ˆ ~ , N j y j T i M i x i T x y p x y i μ μ MVN μ μ C Σ C D Σ D . Define the statistic ( ˆ ˆ ) ( ˆ ˆ ). 1 1 1 1 1 1 1 0 x y N j y j T i M i x i T i T x y T μ μ C Σ C D Σ D μ μ Under the null hypothesis x y H : μ μ 0 , 2 0 ~ p T . Thus we reject H0 if 2 0 , p T . For the case that both x Σ and y Σ are unknown but equal, likelihood approach is used to test x y H : μ μ 0 in Section 3.2. In Section 3.3, the asymptotic χ2 test for testing x y H : μ μ 0 is derived. Finally in Section 3.4 the LR test for twosample case is extended to ksample case and the exact distribution of the LRT statistic for k H : μ ... μ 0 1 is derived. 69 3.2 LIKELIHOOD RATIO TEST FOR TWOSAMPLE CASE In this section, the case that Σ Σ Σ x y unknown is considered. We also assume that Σ has compound symmetry with the form in (2.8), i C , j D , and Σ commute with each other; that is, T i T i C Σ Σ C 1 1 , T j T j D Σ Σ D 1 1 for all i and j . Before deriving the likelihood ratio test for x y H : μ μ 0 , it is necessary to find the MLEs of the parameters under the null and alternative hypotheses separately. 3.2.1 Estimation Under x y H : μ μ 0 Assume that 0 μ μ μ x y under H0. Using the same technique as shown in onesample case, the MLE of 0 μ , say 0 μˆ , can be derived as ˆ ˆ ˆ ˆ ˆ , 1 1 0 1 1 0 1 1 1 0 1 1 0 0 N j j T j M i i T i N j j T j M i i T i μ C Σ C D Σ D C Σ X D Σ Y where 0 ˆΣ is the MLE for Σ under 0 H . Since T i T i C Σ Σ C 1 0 1 0 ˆ ˆ and T j T j D Σ Σ D 1 0 1 0 ˆ ˆ for all i and j , 0 μˆ reduces to N j j T j M i i T i N j j T j M i i T i 1 1 1 1 1 0 μˆ C C D D C X D Y . (3.2) Therefore 0 ˆΣ can be obtained using the reduced log likelihood function ( ) ( ) , ( ) ( ) 2 1 log 2 log ( , ) 1 0 1 0 1 0 1 0 0 N j j j T j j M i i i T i i M N L constant y D μ Σ y D μ μ Σ Σ x C μ Σ x C μ (3.3) The MLE for Σ under 0 H is thus ˆ ˆ [(1 ˆ ) ˆ ] 0 0 2 0 0 p p Σ I J , 70 where 2 (0) (0) 0 1 1 ( ) 1 ˆ B E M N p , 1 1 1 2 2 1 1 ˆ (0) (0) (0) (0) 0 B E B E p , (3.4) where M i i i T i i B 1 0 0 (0) ) ˆ ( ) ˆ ( 1 μ C X μ C X , M i p i i T i i B 1 0 0 (0) 2 (X C μˆ ) J (X C μˆ ) , N j j j T j j E 1 0 0 (0) ) ˆ ( ) ˆ ( 1 μ D Y μ D Y , N j p j j T j j E 1 0 0 (0) 2 (Y D μˆ ) J (Y D μˆ ) . (3.5) 3.2.2 Estimation Under a x y H : μ μ Under a x y H : μ μ , the log likelihood function is ( ) ( ) ( ) ( ) . 2 1 log 2 log ( , , ) 1 1 1 1 N j y j j y T j j y M i x i i x T i i x x y M N L constant x C μ Σ x C μ y D μ Σ y D μ μ μ Σ Σ Using a similar approach as shown in Section 2.2.3, the MLEs for x μ , y μ , and Σ are, respectively, M i i T i M i i T x i 1 1 1 μˆ C C C X , N j j T j N j j T y j 1 1 1 μˆ D D D Y , and ˆ ˆ 2 ˆ ˆ 2 ( ) ˆ 2[(1 ˆ ) ˆ ] p p p p p Σ I J I I J , where 2 ( ) ( ) 1 1 ( ) 1 ˆ a a B E M N p , 1 1 1 2 2 1 1 ˆ ( ) ( ) ( ) ( ) a a a a B E B E p , (3.6) where 71 M i i i x T i i x a B 1 ( ) ) ˆ ( ) ˆ ( 1 μ C X μ C X , M i p i i x T i i x a B 1 ( ) 2 (X C μˆ ) J (X C μˆ ) , N j j j y T j j y a E 1 ( ) ) ˆ ( ) ˆ ( 1 μ D Y μ D Y , N j p j j y T j j y a E 1 ( ) 2 (Y D μˆ ) J (Y D μˆ ) . (3.7) 3.2.3 Likelihood Ratio Test for Testing x y H : μ μ 0 Subsections 3.2.1 and 3.2.2 derived the MLE’s for parameters under both null and alternative hypotheses. The likelihood ratio test can now be developed. The likelihood ratio is , ( ˆ ) ˆ ( ˆ ) ( ˆ ) ˆ ( ˆ ) 2 1 exp ( ˆ ) ˆ ( ˆ ) ( ˆ ) ˆ ( ˆ ) 2 1 exp (2 )  ˆ  (2 )  ˆ  max ( , , ) max ( , , ) 1 1 1 1 1 1 0 1 0 0 0 1 0 0 2 2 ( ) 2 0 2 ( ) 0 M i N j j j y T i i x j j y T i i x M i N j j j T i i j j T i i M N p M N M N p M N x y x y L L x C μ Σ x C μ y D μ Σ y D μ x C μ Σ x C μ y D μ Σ y D μ Σ Σ μ μ Σ μ μ Σ θ θ where θ (μ , μ , Σ) x y , {( , , )  [(1 ) ]} 2 x y p p μ μ Σ Σ I J , and {( , , )  , [(1 ) ]} 2 0 x y x y p p μ μ Σ μ μ Σ I J . Hence the results (from Appendix A.3) M N p M i N j j j T i i j j T i i ( ˆ ) ˆ ( ˆ ) ( ˆ ) ˆ ( ˆ ) ( ) 1 1 0 1 0 0 0 1 0 0 x C μ Σ x C μ y D μ Σ y D μ and M N p M i N j j j y T i i x j j y T i i x ( ˆ ) ˆ ( ˆ ) ( ˆ ) ˆ ( ˆ ) ( ) 1 1 1 1 x C μ Σ x C μ y D μ Σ y D μ imply that the likelihood ratio is 72 . ( 2 2 ) ( 2 2 ) 1 ( 1 1 ) ( 2 2 ) ( 2 2 ) 1 ( 1 1 ) ( ˆ ) (1 ˆ ) [1 ( 1) ˆ ] ( ˆ ) (1 ˆ ) [1 ( 1) ˆ ]  ˆ   ˆ  2 (0) (0) 1 (0) (0) (0) (0) ( ) ( ) 1 ( ) ( ) ( ) ( ) 2 0 1 0 2 0 2 2 1 0 M N p a a p a a a a M N p p p p M N B E B E p B E B E B E p B E p p Σ Σ Thus we arrive at the following theorem. Theorem 3.1: The likelihood ratio test for testing x y H : μ μ 0 is to reject 0 H if , L C where C is such that (  ) 0 P L C H , and L is defined as: , ( 2 2 ) ( 2 2 ) 1 ( 1 1 ) ( 2 2 ) ( 2 2 ) 1 ( 1 1 ) (0) (0) 1 (0) (0) (0) (0) ( ) ( ) 1 ( ) ( ) ( ) ( ) 2 /( ) B E B E p B E B E B E p B E L p a a p a a a a M N where is the likelihood ratio and ( ) 1 a B , ( ) 2 a B , (0) B1 , and (0) B2 are defined in (3.5) and (3.7). To show the null distribution of L , the following propositions are needed. Proposition 3.1: Under : ( ) 0 0 μ μ μ x y H , ( 2 2 ) 1 ( 1 1 ) (0) (0) (0) (0) B E p B E is distributed as the quantity 2 ( 1)( 1) 2 (1 ) M N p . Proof: First rewrite ( 2 2 ) 1 ( 1 1 ) (0) (0) (0) (0) B E p B E as )( ˆ ), 1 )( ˆ ) ( ˆ ) ( 1 ( ˆ ) ( 2 ) 1 2 ) ( 1 1 ( 2 2 ) ( 1 1 ( 1 1 ) 1 0 0 1 0 0 (0) (0) (0) (0) (0) (0) (0) (0) N j p p j j T j j M i p p i i T i i p p E p B E p B E B p B E X C μ I J X C μ Y D μ I J Y D μ 73 where N j j T j M i i T i N j j T j M i i T i 1 1 1 1 1 0 μˆ C C D D C X D Y . Appendix A.4 shows that Under : ( ) 0 0 μ μ μ x y H we have ˆ ~ ( , ) 1 1 1 0 0 μ μ C C D D Σ N j j T j M i i T i N . Thus the quadratic form ) ( ˆ ) 1 ( ˆ ) ( 0 0 1 1 0 0 μ μ C C D D I J μ μ p p N j j T j M i i T i T p (3.9) is distributed as the quantity p j j 1 2 1 , where j ’s are the latent roots of 1 P defined in (2.14). Using the results in the proof of Proposition 2.3, expression in (3.9) is distributed as a 2 1 2 (1 ) p random variable, and the random variable N j p p j j T j j M i p p i i T i i p 1 p 0 0 1 0 0 )( ) 1 )( ) ( ) ( 1 (X C μ ) (I J X C μ Y D μ I J Y D μ are distributed as chisquare random variables with (M N)(p 1) degrees of freedom times a constant (1 ) 2 . Hence, using the result of sum of independent chisquare random variables, ) ( ˆ ). 1 ( ˆ ) ( )( ) 1 ( ) ( )( ) 1 ( ) ( ( 2 2 ) 1 ( 1 1 ) 0 0 1 1 0 0 1 0 0 1 0 0 (0) (0) (0) (0) μ μ C C D D I J μ μ Y D μ I J Y D μ X C μ I J X C μ p p N j j T j M i i T i T N j p p j j T j j M i p p i i T i i p p p B E p B E (3.8) 74 it follows that ( ) 1 ( ) (0) 2 (0) 2 (0) 1 (0) 1 B E p B E is distributed as the random variable 2 ( 1)( 1) 2 (1 ) M N p . The proof is complete. Proposition 3.2: Under : ( ) 0 0 μ μ μ x y H , (0) (0) B2 E2 is distributed as the quantity 2 1 2 [1 ( 1) ] M N p p . Proof: ( ˆ ) ( ˆ ) Appendix A.4 ( ) ( ) ( ) ( ) 2 2 ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 (0) (0) μ μ C C D D J μ μ X C μ J X C μ Y D μ J Y D μ X C μ J X C μ Y D μ J Y D μ p N j j T j M i i T i T N j p j j T j j M i p i i T i i N j p j j T j j M i p i i T i i B E Under : ( ) 0 0 μ μ μ x y H , referring to the proof of Proposition 2.4, 2 2 1 0 0 1 0 0 [1 ( 1) ] ( ) ( ) ( ) ( ) M N d N j p j j T j j M i p i i T i i p p X C μ J X C μ Y D μ J Y D μ (3.10) and 2 1 2 0 0 1 1 0 0 ( ˆ ) ( ˆ ) [1 ( 1) ] p p d p N j j T j M i i T i T μ μ C C D D J μ μ , (3.11) implies that 2 1 (0) (0) 2 2 2 [1 ( 1) ] M N d B E p p by using the result of sum of two independent chisquare random variables. The proof is complete. 75 Proposition 3.3: ( 2 2 ) 1 ( 1 1 ) (a) (a) (a) (a) B E p B E is distributed as the quantity 2 ( 2)( 1) 2 (1 ) M N p . Proof: Assume that x x E(μˆ ) μ and y y E(μˆ ) μ . So we have ) ( ˆ ). 1 )( ) ( ˆ ) ( 1 ( ) ( ) ( ˆ ) 1 )( ) ( ˆ ) ( 1 ( ) ( )( ˆ ) 1 )( ˆ ) ( ˆ ) ( 1 ( ˆ ) ( 2 ) 1 2 ) ( 1 1 ( 2 2 ) ( 1 1 ( 1 1 ) 1 1 1 1 1 1 ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) p p y y N j j T j T y y N j p p j j y T j j y p p x x M i i T i T x x M i p p i i x T i i x N j p p j j y T j j y M i p p i i x T i i x a a a a a a a a p p p p p p E p B E p B E B p B E Y D μ I J Y D μ μ μ D D I J μ μ X C μ I J X C μ μ μ C C I J μ μ X C μ I J X C μ Y D μ I J Y D μ Applying Proposition 2.3 we have that 2 ( 1)( 1) ( ) ( ) 2 2 (1 ) 1 1 M p d a a B p B and 2 ( 1)( 1) ( ) ( ) 2 2 (1 ) 1 1 N p d a a E p E . Since ( ) ( ) 2 1 1 a a B p B and ( ) ( ) 2 1 1 a a E p E are independent, we have 2 ( 1)( 1) ( ) ( ) ( ) ( ) 2 ( 2 2 ) (1 ) 1 ( 1 1 ) M N p d a a a a B E p B E . The proof is complete. Proposition 3.4: ( ) ( ) 2 2 a a B E is distributed as the random variable 2 2 2 [1 ( 1) ] M N p p . Proof: 76 N j p j j y T j j y M i p i i x T i i x a a B E 1 1 ( ) ( ) 2 2 (X C μˆ ) J (X C μˆ ) (Y D μˆ ) J (Y D μˆ ) Applying Proposition 2.4, ( ) ( ) 2 2 a a B E is distributed as the sum of two independent random variables 2 1 2 [1 ( 1) ] M p p and 2 1 2 [1 ( 1) ] N p p . Therefore 2 2 ( ) ( ) 2 2 2 [1 ( 1) ] M N d a a B E p p . The proof is complete. Now we arrive at the following theorem. Theorem 3.2: The likelihood ratio test statistic in Theorem 3.1 for testing x y H : μ μ 0 is L defined as: A C B D B E B E p B E B E B E p B E L p p p a a p a a a a 1 1 (0) (0) 1 (0) (0) (0) (0) ( ) ( ) 1 ( ) ( ) ( ) ( ) ( 2 2 ) ( 2 2 ) 1 ( 1 1 ) ( 2 2 ) ( 2 2 ) 1 ( 1 1 ) , where ( 2 2 ) 1 ( 1 1 ) (0) (0) (0) (0) B E p A B E , ( 2 2 ) 1 ( 1 1 ) (a) (a) (a) (a) B E p B B E , (0) (0) C B2 E2 , and ( ) ( ) 2 2 a a D B E . (a) B and D are distributed respectively as the following: 2 ( 2)( 1) 2 (1 ) M N p d B and 2 2 2 [1 ( 1) ] M N d D p p . Under x y H : μ μ 0 , A and C are distributed respectively as the following: 2 ( 1)( 1) 2 (1 ) M N p d A and 2 1 2 [1 ( 1) ] M N d C p p . (b) AB, B, CD, and D are mutually independent weighted chisquare random variables. 77 (c) Furthermore, under x y H : μ μ 0 , L is distributed as the random variable ** 1 * 2 1 1 2 1 1 1 F M N F M N p , where * F and ** F are independent and distributed like p1, (MN2)( p1) F , and 1,MN2 F , respectively. Proof of (a): Results are obtained directly from Propositions 3.1 to 3.4. Proof of (b) and (c): First rewrite A and C as follows. A can be expressed as , )( ˆ ) 1 )( ˆ ) ( ˆ ) ( 1 ( ˆ ) ( 1 0 0 1 0 0 B R p p A N j p p j j T j j M i p p i i T i i X C μ I J X C μ Y D μ I J Y D μ where B and R are, respectively, N j p p j j y T j j y M i p p i i x T i i x p p B 1 1 )( ˆ ) 1 )( ˆ ) ( ˆ ) ( 1 (X C μˆ ) (I J X C μ Y D μ I J Y D μ and )( ˆ ˆ ). 1 ( ˆ ˆ ) ( )( ˆ ˆ ) 1 ( ˆ ˆ ) ( 0 1 0 0 1 0 μ μ D D I J μ μ μ μ C C I J μ μ p p y N j j T j T y p p x M i i T i T x p p R (3.12) Similarly, C can be expressed as , ( ˆ ) ( ˆ ) ( ˆ ) ( ˆ ) 1 0 0 1 0 0 D S C N j p j j T j j M i p i i T i i X C μ J X C μ Y D μ J Y D μ where 78 N j p j j y T j j y M i p i i x T i i x D 1 1 (X C μˆ ) J (X C μˆ ) (Y D μˆ ) J (Y D μˆ ) and ( ˆ ˆ ) ( ˆ ˆ ). ( ˆ ˆ ) ( ˆ ˆ ) 0 1 0 0 1 0 μ μ D D J μ μ μ μ C C J μ μ p y N j j T j T y p x M i i T i T x S (3.13) Some other facts necessary to prove (b) are stated below. (1) B and R are independent (2) D and S are independent (3) 2 1 2 (1 ) p d R and 2 1 2 S p [1 ( p 1) ] d . (4) B and D are independent (5) B and S are independent (6) R and D are independent (7) R and S are independent Facts (1) and (2) are true because both B and D are functions of i i x X C μˆ and j j y Y D μˆ for all i 1,...,M , j 1,..., N , also R and S are functions of x μˆ and y μˆ since 0 μˆ in (3.2) can be expressed as a linear combination of x μˆ and y μˆ as follows: ˆ ( ) [ ˆ ˆ ] * * 1 * * 0 x y μ C D C μ D μ , (3.14) where M i i T i 1 * C C C , N j j T j 1 * D D D , M i i T x i 1 * 1 ) ( ˆ X C C μ and N j j T y j 1 * 1 μˆ (D ) D Y . Combining the facts that i i x X C μˆ and x μˆ are independent as well as j j y Y D μˆ and y μˆ are independent, Facts (1) and (2) are shown. 79 Fact (3) can be shown using the results in part (a) in conjunction with Facts (1) and (2), and the result about sum of independent chisquare random variables. More clearly, the results 2 ( 1)( 1) 2 (1 ) M N p d A and 2 ( 2)( 1) 2 (1 ) M N p d B combined with Fact (1) imply 2 1 2 (1 ) p d R . In addition, the results 2 1 2 [1 ( 1) ] M N d C p p and 2 2 2 [1 ( 1) ] M N d D p p in connection with fact (2) implies 2 1 2 S p [1 ( p 1) ] d . Fact (4) can be shown by applying Proposition 2.5. 2 ) 1 ( 1 (a) (a) B p B and ( ) 2 a B are independent, 2 ) 1 ( 1 (a) (a) E p E and ( ) 2 a E are independent as well. As a matter of fact, 2 ) 1 ( 1 (a) (a) B p B , ( ) 2 a B , 2 ) 1 ( 1 (a) (a) E p E and ( ) 2 a E are mutually independent so fact (4) is shown. Facts (5) and (6) are true using the same argument when Facts (1) and (2) were shown. To show Fact (7), it is necessary to rewrite R and S in (3.12) and (3.13), respectively. In (3.12) the two terms on the righthand side can be expressed respectively as )( ˆ ), 1 2( ˆ ˆ ) ( )( ˆ ) 1 )( ˆ ) ( ˆ ) ( 1 ( ˆ ) ( )( ˆ ˆ ) 1 ( ˆ ˆ ) ( 0 0 * 0 0 0 * 0 0 0 * 0 0 * 0 μ μ C I J μ μ μ μ C I J μ μ μ μ C I J μ μ μ μ C I J μ μ p p T x p p T p p x T x p p x T x p p p p and 80 )( ˆ ). 1 )( ˆ ) 2( ˆ ˆ ) ( 1 ( ˆ ) ( )( ˆ ) 1 )( ˆ ˆ ) ( ˆ ) ( 1 ( ˆ ˆ ) ( 0 0 * 0 0 0 * 0 0 0 * 0 0 * 0 μ μ D I J μ μ μ μ D I J μ μ μ μ D I J μ μ μ μ D I J μ μ p p T p p y T p p y T p p y y T y p p p p We should note that )( ˆ ) 0 1 )( ˆ ) ( ˆ ˆ ) ( 1 ( ˆ ˆ ) ( 0 0 * 0 0 0 * 0 μ μ C I J μ μ μ μ D I J μ μ p p T p p y T x p p by substituting (3.14) into the lefthand side of the above equation. Therefore, R becomes )]( ˆ ). 1 ( ˆ ) [( )( )]( ˆ ) 1 ( ˆ ) [ ( )]( ˆ ) 1 ( ˆ ) [ ( 0 0 * * 0 0 0 * 0 0 * 0 μ μ C D I J μ μ μ μ D I J μ μ μ μ C I J μ μ p p T p p y T y p p x T x p p p R (3.15) Likewise, S can be written as ( ˆ ) [( ) ]( ˆ ). ( ˆ ) [ ]( ˆ ) ( ˆ ) [ ]( ˆ ) 0 0 * * 0 0 0 * 0 0 * 0 μ μ C D J μ μ μ μ C J μ μ μ μ D J μ μ p T p y T p x y T x S (3.16) Since 0 0 μˆ μ can be written as ( ) [ ( ˆ ) ( ˆ )] , ˆ ( ) ( ˆ ˆ ) ( ) ( ) 0 * 0 * * 1 * 0 * * 1 * * * * 1 * * 0 0 C D C μ μ D μ μ μ μ C D C μ D μ C D C D μ x y x y the las 



A 

B 

C 

D 

E 

F 

I 

J 

K 

L 

O 

P 

R 

S 

T 

U 

V 

W 


