

small (250x250 max)
medium (500x500 max)
Large
Extra Large
large ( > 500x500)
Full Resolution


ADVANCED RETINAL IMAGING: FEATURE EXTRACTION, 2D REGISTRATION, AND 3D RECONSTRUCTION By THITIPORN CHANWIMALUANG Bachelor of Engineering Chulalongkorn University Bangkok, Thailand 1995 Master of Science Pennsylvania State University State College, PA 2001 Submitted to the Faculty of the Graduate College of Oklahoma State University in partial ful llment of the requirements for the Degree of DOCTOR OF PHILOSOPHY December, 2006 COPYRIGHT c By THITIPORN CHANWIMALUANG December, 2006 ADVANCED RETINAL IMAGING: FEATURE EXTRACTION, 2D REGISTRATION, AND 3D RECONSTRUCTION Dissertation Approved: Guoliang Fan, Ph.D. Dissertation Advisor Gary Yen, Ph.D. Daqing Piao, Ph.D. Sundar Madihally, Ph.D. Gordon Emslie, Ph.D. Dean of the Graduate College iii ACKNOWLEDGMENTS First, I would like to express my sincere thanks to my advisor, Professor Guoliang Fan, for his guidance and support throughout my Ph.D. study. He has taught me not only the technical knowledge, but also attitudes towards research. I am also gratitude to Professor Gary Yen who is my committee chair and a coPI on this project. I would like to thank Professor Daqing Piao and Professor Sundar Mahidally for serving on my dissertation committee. My thanks also go to Professor Stephen R. Fransen, who is an ophthalmologist at the University of Oklahoma Health Science Center, for providing ETDRS retinal images and his medical advices. I also want to thank the current and previous members of my research group, visual computing and image processing laboratory (VCIPL), for their helps and friendship. Most important of all, I want to express my deepest gratitude towards my family: my parents, my sisters, and my grandmother, who always have faith in me. I cannot thank them enough for their unconditional love, supports, patience, and understand ing. To my grandmother who is 81 years old, always concerns about my health and wishes to see me coming home with my success before her death: grandma, I will come home soon and you are part of my success. To my parents who have taught me honesty, perseverance, and merit: you are my good examples and part of my accom plishment. To my sisters who are willing to listen to every of my silly complaints: you are my encouragement. This work was supported by an OHRS award for project number HR0333 from the Oklahoma Center for the Advancement of Science and Technology (OCAST). iv TABLE OF CONTENTS Chapter Page 1 Introduction 1 1.1 Motivations and Objectives . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Signi cance and Background . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 NIH's ETDRS Protocols . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Technical Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.5 Research Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.6 Original Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.6.1 Feature Extraction and Correspondence Selection . . . . . . . 9 1.6.2 2D Retinal Image Registration . . . . . . . . . . . . . . . . . 10 1.6.3 3D Retinal Surface Reconstruction . . . . . . . . . . . . . . . 12 1.7 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2 Literature Reviews 16 2.1 Overview of Retinal Image Analysis Research . . . . . . . . . . . . . 16 2.2 Structure Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1 Anatomical Structure Segmentation . . . . . . . . . . . . . . . 17 2.2.2 Pathological Structure Segmentation . . . . . . . . . . . . . . 22 2.3 2D Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.1 ViewBased Registration . . . . . . . . . . . . . . . . . . . . . 23 2.3.2 TemporalBased Registration . . . . . . . . . . . . . . . . . . 24 2.3.3 ModalBased Registration . . . . . . . . . . . . . . . . . . . . 25 2.4 3D Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 v 2.4.1 3D Surface Reconstruction . . . . . . . . . . . . . . . . . . . 27 2.4.2 Local Depth Reconstruction . . . . . . . . . . . . . . . . . . . 27 2.5 Image Quality Assessment (IQA) . . . . . . . . . . . . . . . . . . . . 28 2.6 Image Classi cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3 Motion Parameters And Motion Estimation 33 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Transformations of 2D/2D . . . . . . . . . . . . . . . . . . . . . . . 34 3.3 Transformations of 3D/3D . . . . . . . . . . . . . . . . . . . . . . . 35 3.4 Transformations of 3D/2D . . . . . . . . . . . . . . . . . . . . . . . 36 3.4.1 Image Formation . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4.2 Camera Models . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5 A ne Structure From Motion . . . . . . . . . . . . . . . . . . . . . . 40 3.5.1 Two View Geometry . . . . . . . . . . . . . . . . . . . . . . . 41 3.5.2 Multiple View Geometry . . . . . . . . . . . . . . . . . . . . . 43 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4 Feature Extraction 49 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.1.1 Entropy Thresholding . . . . . . . . . . . . . . . . . . . . . . 50 4.1.2 Blood Vessel Extraction . . . . . . . . . . . . . . . . . . . . . 51 4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3 Proposed Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3.1 Vascular Tree Extraction . . . . . . . . . . . . . . . . . . . . . 53 4.3.2 Bifurcation/crossover Detection . . . . . . . . . . . . . . . . . 60 4.4 Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.4.1 Thresholding Algorithm . . . . . . . . . . . . . . . . . . . . . 61 4.4.2 Blood Vessel Extraction Evaluation . . . . . . . . . . . . . . . 63 vi 4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5 2D Retinal Image Registration 72 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.1.1 Literature Reviews . . . . . . . . . . . . . . . . . . . . . . . . 73 5.1.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.2.1 NIH ETDRS Protocol . . . . . . . . . . . . . . . . . . . . . . 76 5.2.2 Image Quality Assessment (IQA): Field De nition . . . . . . . 77 5.2.3 Global and Local Entropy . . . . . . . . . . . . . . . . . . . . 78 5.2.4 AreaBased Retinal Image Registration . . . . . . . . . . . . . 78 5.2.5 FeatureBased Retinal Image Registration . . . . . . . . . . . 81 5.3 Proposed Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3.1 Vascular Tree Extraction . . . . . . . . . . . . . . . . . . . . . 82 5.3.2 Translation Estimation . . . . . . . . . . . . . . . . . . . . . . 83 5.3.3 A ne/Quadratic Transformations . . . . . . . . . . . . . . . . 88 5.4 Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.4.1 Vascular Tree Extraction . . . . . . . . . . . . . . . . . . . . . 91 5.4.2 IQA Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.4.3 Registration Performance Evaluation . . . . . . . . . . . . . . 94 5.4.4 Discussions of the Proposed Algorithm . . . . . . . . . . . . . 97 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6 3D Retinal Surface Reconstruction 107 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.2 A ne Camera for 3D Retinal Surface Reconstruction . . . . . . . . . 109 6.3 Correspondence Selection . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.4 Proposed Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 vii 6.4.1 Lens Distortion Removal . . . . . . . . . . . . . . . . . . . . . 113 6.4.2 Initial Retinal A ne Surface . . . . . . . . . . . . . . . . . . . 116 6.4.3 A ne Bundle Adjustment . . . . . . . . . . . . . . . . . . . . 117 6.4.4 Euclidean Reconstruction of Retinal Surface . . . . . . . . . . 120 6.4.5 PointBased Surface Approximation . . . . . . . . . . . . . . . 121 6.5 Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.5.1 Surface Approximation on Synthesized Data . . . . . . . . . . 125 6.5.2 Surface Approximation on Retinal Images . . . . . . . . . . . 127 6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 7 Constrained Optimization for 3D Surface Reconstruction 136 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 7.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 7.3 Constrained Optimization for 3D Reconstruction . . . . . . . . . . . 139 7.3.1 Constrained A ne Bundle Adjustment (CABA) . . . . . . . . 139 7.3.2 A ne Bundle Adjustment with Lens Distortion Update . . . . 140 7.3.3 Constrained A ne Bundle Adjustment with Lens Distortion Update (CABALDU) . . . . . . . . . . . . . . . . . . . . . . 141 7.4 Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.4.1 Surface Approximation on Synthesized Data . . . . . . . . . . 143 7.4.2 Surface Approximation on Retinal Images . . . . . . . . . . . 146 7.4.3 More Discussions . . . . . . . . . . . . . . . . . . . . . . . . . 148 7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 8 Conclusions and Future Works 154 BIBLIOGRAPHY 157 viii LIST OF TABLES Table Page 3.1 2D Transformation Models . . . . . . . . . . . . . . . . . . . . . . . 47 3.2 3D Transformation Models . . . . . . . . . . . . . . . . . . . . . . . 48 4.1 Threshold values of three di erent approaches for twenty retinal images 68 5.1 Field Coverage Speci cation . . . . . . . . . . . . . . . . . . . . . . . 101 5.2 The Transformation Models For 2D Retinal Registration . . . . . . . 102 5.3 The energy concentration of binary imagebased ECC vs. logical op eration XOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.4 Ideal vertical/horizontal displacements . . . . . . . . . . . . . . . . . 103 5.5 The registration error and overlaps between elds. . . . . . . . . . . . 103 ix LIST OF FIGURES Figure Page 1.1 Human retina. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Diabetic retinopathy. (a) Normal vision. (b) Same scene viewed by a person with diabetic retinopathy. . . . . . . . . . . . . . . . . . . . . 4 1.3 Di erent stages of diabetic retinopathy. (a) An example of normal reti nal image. (b) A retinal image with microaneurysms. (c) Proliferated diabetic retinopathy with fragile newly grown blood vessels. (d) A retina image with some types of scar tissues. (http://www.inoveon.com) 5 1.4 Architecture of the proposed research. . . . . . . . . . . . . . . . . . 7 1.5 Outline of the dissertation. . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1 Anatomical structures in a human retina. . . . . . . . . . . . . . . . . 18 2.2 Viewbased registration. See more detail discussion in Chapter 5. . . 24 2.3 Temporalbased registration [1]. . . . . . . . . . . . . . . . . . . . . . 25 2.4 Modalbased registration [2]. . . . . . . . . . . . . . . . . . . . . . . . 26 2.5 3D retinal surface reconstruction [3]. . . . . . . . . . . . . . . . . . . 28 2.6 3D retinal surface reconstruction [4]. . . . . . . . . . . . . . . . . . . 29 2.7 Optic nerve's depth reconstruction [5]. . . . . . . . . . . . . . . . . . 30 2.8 Overview of retinal image analysis research. The grayshaded boxes refer to the topics involved in this work. . . . . . . . . . . . . . . . . 32 3.1 Results from 2D coordinate transformation. . . . . . . . . . . . . . . 34 3.2 Results from 3D coordinate transformation. . . . . . . . . . . . . . . 36 x 3.3 Pinhole camera geometry. Camera coordinate system is aligned with world coordinate system. . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4 Onedimensional image formation. Orthographic is projected along zaxis. Perspective is projected along principal ray direction. Weak perspective is a combination between perspective and orthographic. A point, rst, is projected along zaxis to a plane Z = d. Then, perspective projection from the plane. . . . . . . . . . . . . . . . . . . 41 3.5 The concatenated image (CI) space. . . . . . . . . . . . . . . . . . . . 45 4.1 The graylevel pro le of the cross section of a blood vessel. . . . . . . 54 4.2 Illustration of 12 matched lter kernels along di erent directions where = 2:0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3 (a) The computation of the new cooccurrence matrix. (b) The original cooccurrence matrix in a normalized log scale ( = 261:63). (c) The a modi ed cooccurrence matrix in a normalized log scale = 9:00). 57 4.4 Four quadrants of a cooccurrence matrix. . . . . . . . . . . . . . . . 58 4.5 (a) An original retinal image. (b) Matched ltering result. (c) Local entropy thresholding result. (d) Vascular tree. . . . . . . . . . . . . . 59 4.6 (a) Onepixel wide vascular tree. (b) Onepixel wide vascular tree with intersections and crossovers overlaying on grayscaled image. . . . . . 60 4.7 (a) An original image. (b) Our thresholding result (threshold = 101). (c) Local entropy thresholding result (threshold = 75). (d) Cross en tropy result (threshold = 112). . . . . . . . . . . . . . . . . . . . . . . 61 4.8 (a) An original image. (b) Our thresholding result (threshold = 136). (c) Local entropy thresholding result (threshold = 98). (d) Cross en tropy result (threshold = 253). . . . . . . . . . . . . . . . . . . . . . . 62 xi 4.9 First row: entropy plots of an image shown in Fig. 4.10. Second row: entropy plots of an image shown in Fig. 4.11. (a),(d) Local entropy. (b),(e) Relative entropy. (c),(f) Modi ed local entropy. . . . . . . . . 64 4.10 (a) An original retinal image. (b) A matched ltering result. (c) A groundtruth segmentation. (d) The local entropy thresholding result. (e) The relative entropy thresholding result. (f) The modi ed local entropy threshold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.11 (a) An original retinal image. (b) A matched ltering result. (c) A groundtruth segmentation. (d) The local entropy thresholding result. (e) The relative entropy results. (f) The modi ed local entropy threshold. 66 4.12 Performance of our approach versus other methods, Staal's algorithm [6, 7], Jiang's algorithm [8], and Hoover's algorithm [9]. . . . . . . . 67 4.13 (a),(b) Examples of normal retinal images. (c),(d) Handlabeled ground truth images. (e),(f) Our segmentation results. . . . . . . . . . . . . . 69 4.14 (a),(b) Examples of obscure bloodvessel retinal images. (c),(d) Hand labeled groundtruth images. (e),(f) Our segmentation results. . . . . 70 4.15 (a),(b) Examples of retinal images with lesions. (c),(d) Handlabeled groundtruth images. (e),(f) Our segmentation results. . . . . . . . . 71 5.1 ETDRS sevenstandard elds (right/left eyes) (http://eyephoto.ophth.wisc.edu/Photograp5.2 The owchart of the proposed algorithm (LPCs: landmark point cor respondences and SPCs: sampling point correspondences.) . . . . . . 79 5.3 Vascular tree and the centerline extraction of elds 2 and elds 4 of Julie's retinal images. (a) Match ltered image: eld 2; (b) Match ltered image: eld 4; (c) Binary image: eld 2; (d) Binary image: eld 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 xii 5.4 Vascular tree and the centerline extraction of elds 2 and elds 4 of Julie's retinal images. (a) Thinned binary image: eld 2; (b) Thinned binary image: eld 4; (c) Crossover/bifurcation points: eld 2; (d) Crossover/bifurcation points: eld 4. . . . . . . . . . . . . . . . . . . 81 5.5 Sample plots of binary imagebased ECC vs. logical operation, XOR, at every possible translation in the coarsest scale (downsampled by 16). (a) The binary imagebased ECC of Julie's elds 1/2; (b) The binary imagebased ECC of Julie's elds 2/3 (c) The logical operation XOR of Julie's elds 1/2; (d) The logical operation XOR of Julie's elds 2/3. 84 5.6 The overlaps of two possible translation models (left column and right column) that produce similar values of ECC peaks for elds 1/7. . . 86 5.7 An example of the sampling process. (Left) A grid is placed on the thinned binary vascular tree; (b) The sampling points. . . . . . . . . 90 5.8 Handlabeled groundtruths (top) and the extracted vascular trees using the proposed algorithm (bottom) of eld 1 (a) and eld 2 (b). . . . . 92 5.9 Vertical/horizontal displacements of six ETDRS image pairs. . . . . . 93 5.10 The failure rates of translation estimation for six ETDRS eld pairs. . 95 5.11 Error comparisons for consecutive transformation models. . . . . . . . 95 5.12 The failure rates of 468 image pairs which proceed from the translation model to a ne and quadratic models based on LPCs only. . . . . . . 96 5.13 The failure rates (after manual validation) of a ne/quadratic models (using both LPCs and SPCs) and the proposed algorithm for six eld pairs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.14 Two examples of using the quadratic transformation on elds 2/5 and elds 1/6 with insu cient LPCs, where the registration errors are 0.4 and 1.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 xiii 5.15 A translation modelbased registration result where the vertical/horizontal displacements in elds 1/7 are depicted. The registration errors are as follows: elds 1/2 = 15:81, elds 1/6 = 22:06, elds 1/7 = 13:83, elds 2/3 = 12:60, elds 2/4 = 21:56, and elds 2/5 = 18:97. The median error is 17:39. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.16 The nal registration result of an ETDRS image set. The quadratic transformation is applied to elds 2/3. The a ne model is applied to elds 1/2, 1/6, 1/7, 2/4, and 2/5. The registration errors are given as as follows: elds 1/2: 2:00, elds 1/6: 0:52, elds 1/7: 1:23, elds 2/3: 1:66, elds 2/4: 2:78, and elds 2/5: 1:49. The median error is 1:58. The sampling points (SPCs) are involved in elds 1/6, 2/3 and elds 2/4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.17 (a) An example of a poor quality retinal image pair with no LPC in the overlap. (b) An example of a retinal image pair with pathology. 106 6.1 The structure from motion (SFM) problem. . . . . . . . . . . . . . . 109 6.2 The owchart of the proposed algorithm. . . . . . . . . . . . . . . . . 112 6.3 Retinal images are obtained from a fundus camera which composes of an actual camera and a digital camera attached to a fundus camera. . 113 6.4 The sparse Jacobian matrix for a ne bundle adjustment comprises 3 cameras and 4 points. . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.5 The sparse Hessian matrix comprises 3 cameras and 4 points. . . . . . 119 6.6 3D point clouds: (a) Top view. (b) Side view. . . . . . . . . . . . . . 122 6.7 The set up of four synthetic cameras. Point cloud is constructed on a spherical surface with spreading angle of 90o. . . . . . . . . . . . . . . 123 6.8 Four images generated by the four cameras shown in Fig. 6.7. . . . . 124 6.9 The errors of spreading angles (%) versus noise variances. The overall spreading angle of an original synthetic partial sphere is 90o. . . . . . 126 xiv 6.10 The errors between reconstructed points and surface approximation versus noise variances. . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.11 3D surface reconstruction on synthetic data. (a) A partial synthetic spherical shape. (b) A ne reconstruction of a partial synthetic spheri cal shape. (c) Euclidean reconstruction of a partial synthetic spherical shape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.12 The errors versus 2D spatial locations of the initial point selection. (a) Without a ne bundle adjustment. (b) A ne bundle adjustment is involved. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6.13 (a) An example of a grid pattern. (b) A lens distortionfree grid pattern.131 6.14 (a) An example of a retinal image. (b) A lens distortionfree image. . 132 6.15 Two sets of retinal images with marked point correspondences. First and second rows show stereo pairs of eld 1. Third and fourth rows show stereo pairs of eld 2. . . . . . . . . . . . . . . . . . . . . . . . . 133 6.16 The 3D retinal reconstruction results with the retinal image mapped onto sphere surfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 6.17 The errors between reconstructed 3D points and the approximated spherical surface without/with radial distortion removal (a) and with out/with a ne bundle adjustment (with the radial distortion removed) (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.1 Retinal images are obtained from a fundus camera which composes of an actual camera and a digital camera attached to a fundus camera [3]. 137 7.2 An original partial sphere with spreading angle = 90o . . . . . . . . . 143 7.3 Lens distortionfree synthesized data with noise: the errors between reconstructed points and surface approximation versus noise variances. 143 xv 7.4 Lens distortionfree synthesized data with noise: the errors of spreading angles in percentage versus noise variances. The overall spreading angle of an original synthetic partial sphere is 90o. . . . . . . . . . . . . . . 144 7.5 Noisefree synthesized data with lens distortion coe cients set to kc = [0:03; 0:08; 0:02; 0:01; 0:02]: Tested on CABALDU optimiza tion procedure. A plot between average errors between approximated surface and 3D points versus the number of synthesized points. . . . 145 7.6 Synthesized data with both noise and lens distortion: the errors be tween reconstructed points and surface approximation versus noise variances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 7.7 Noisefree synthesized data with lens distortion coe cients kc = [0:03; 0:08; 0:02; 0:01; 0:02]: (a) No optimization. (b) With ABA. (c) With CABA. (d) With CABALDU. . . . . . . . . . . . . . . . . . . 147 7.8 Four images generated by the four synthesized cameras: blue crosses ( ) are original images and red dots ( ) are images with noise (variance = 3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 7.9 Four images generated by the four synthesized cameras: blue crosses ( ) are original images and red dots ( ) are images with lens distortions (coe cients = [0:2;0:3;0:1; 0:1; 0:1]). . . . . . . . . . . . . . . . . 150 7.10 Noisefree synthesized data with lens distortion coe cients set to kc = [0:03; 0:08; 0:02; 0:01; 0:02]: (a) No optimization versus ABA. (b) ABA versus CABA. (c) CABA versus CABALDU. . . . . . . . . 151 7.11 Retinal Images: (a) No optimization versus ABA. (b) ABA versus CABA. (c) CABA versus CABALDU. . . . . . . . . . . . . . . . . . 152 7.12 The 3D retinal reconstruction results with the retinal image mapped onto sphere surfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 xvi CHAPTER 1 Introduction 1.1 Motivations and Objectives Diabetes is the leading cause of blindness among workingage Americans, and many patients with visionthreatening diabetic retinopathy remain asymptomatic until blind ness occurs [10]. The great majority of this blindness can be prevented with proper eye examination and treatment by ophthalmologists who rely on the results of random ized clinical trials, called Early Treatment Diabetic Retinopathy Study (ETDRS), to guide their treatment of patients with diabetes [11]. ETDRS requires sets of retinal images to be captured from di erent elds of an eye. Because ophthalmologists rely on multiple retinal images for disease diagnosis and evaluation, these images need to cover a required area of the retina. The retinal images need to meet the image quality criteria de ned by ETDRS protocol. Each set of fundus photographs should be assessed for quality before the patient leaves the imaging center. A photographer is required to decide whether a particular image set meets the three ETDRS's image quality assessment (IQA) requirements: (1) clarity & focus; (2) eld de nition; and (3) stereo e ect. A software tool to standardize and certify image quality is in de mand. Several types of visual models including graphical retinal mapping, 3D retinal surface reconstruction, and 2D retinal registration can (1) assist ophthalmologists in diagnosing, analyzing, and evaluating the disease; (2) facilitate clinical studies; and (3) be used as a spatial map during laser surgical procedures. Furthermore, many respectable researches in psychology have said that a person can always put a smile on his/her face but eyes usually reveal his/her true feelings. We agree with the statement 1 completely considering that eyes are the only place where we can, without any invasive medical techniques, directly see inside a human body. We can observe, in real time, a retina's blood vessels which are signi cant indicators to not only eyerelated diseases but also various other diseases. In this work, advanced retinal imaging approaches have been developed according to the aforementioned needs as follows. Objective 1: To develop an e cient feature extraction algorithm that can ef fectively segment blood vessels and to select reliable point correspondences for latter processing. Objective 2: To develop a robust 2D retinal image registration algorithm that can register seven ETDRS retinal images together and perform image quality assessment on eldcoverage. Objective 3: To study an accurate 3D retinal surface reconstruction algorithm that can recover the 3D shape of retina from ETDRS retinal images. Figure 1.1: Human retina. 2 1.2 Signi cance and Background Diabetic retinopathy is a complication of diabetes [12]. Vision of a person with dia betic retinopathy is shown in Fig. 1.2(b). The disease a ects blood vessels inside the retina. The retina is an area lying at the back of the eyeball as shown in Fig. 1.1. In accordance with [12, 11, 13], the earliest stage of the disease, the tiny blood vessels, or capillaries, become thinner, weaker and eventually they leak blood (microaneurysm) as illustrated in Fig. 1.3(b). A patient's sight at this stage is still good but an oph thalmologist can detect and notice the abnormalities in the retina. As the disease progresses, some blood vessels are blocked. These trigger the retina to grow new blood vessels, which are abnormal, fragile, and easily bleed as shown in Fig. 1.3(c). In the later stage of the disease, new blood vessels are grown continuously as well as scar tissue as shown in Fig. 1.3(d). Ultimately, retina will be detached from an eye. National eye institute (NEI) recommends everyone with diabetes to have comprehen sive eye exam at least once a year because diabetic retinopathy has no early warning symptoms or signs. According to [14], there are 20.8 million people in the United States, or 7% of the population, who have diabetes. While an estimated 14.6 mil lion have been diagnosed with diabetes, unfortunately, 6.2 million people (or nearly onethird) are unaware that they have the disease. Failure to undergo universally recommended annual eye examinations is the primary cause of this continued loss of sight. If detected early, majority of the severe vision loss from diabetic retinopa thy can be prevented with proper examination and treatment by ophthalmologists. Ophthalmologists primarily rely on the results of randomized clinical studies called ETDRS to guide their treatment of patients with diabetes. 3 (a) (b) Figure 1.2: Diabetic retinopathy. (a) Normal vision. (b) Same scene viewed by a person with diabetic retinopathy. 1.3 NIH's ETDRS Protocols The Early Treatment Diabetic Retinopathy Study (ETDRS) implemented standard ized retinal imaging, classi cation and severity staging for diabetic retinopathy as well as proving the therapeutic bene t of laser photocoagulation surgery in prevent ing vision loss [15]. This multicenter, randomized clinical trial designed to evaluate treatment of patients with nonproliferative or early proliferative diabetic retinopathy. A total of 3,711 patients were recruited to be followed for a minimum of 4 years to provide longterm information on the risks and bene ts of the treatments under study. The study demonstrated a statistically signi cant reduction in severe visual loss for those eyes with early treatment [11]. The ETDRS also developed an internationally recognized disease severity scale indicating the risk for diabetic retinopathy [16]. ET DRS protocols have become the \gold standard" for evaluating diabetic retinopathy [17] and diabetic macular edema [18]. 1.4 Technical Challenges In this work, we have addressed retinal image analysis issues: (1) image quality as sessment (IQA) in terms of eld coverage which can help in grading, and support 4 (a) (b) (c) (c) Figure 1.3: Di erent stages of diabetic retinopathy. (a) An example of normal retinal image. (b) A retinal image with microaneurysms. (c) Proliferated diabetic retinopa thy with fragile newly grown blood vessels. (d) A retina image with some types of scar tissues. (http://www.inoveon.com) the grader training processes; (2) several types of visual models, i.e. graphical reti nal map, 2D retinal registration, and 3D reconstructed retinal surface, which can greatly assist ophthalmologists in diagnosing and evaluating the disease, signi cantly facilitate clinical studies, as well as assist in laser surgical procedure by using vi sual models as spatial maps. The technical challenges associated with retinal image analysis are Feature extraction and correspondence selection Retinal images may not be wellfocused and blurred due to inappropriate image acquisition conditions. Image quality variability is the major challenge for feature extraction retinal 5 images. Speci cally, it includes: { Poor lighting condition can cause glaring or introduce arti cial e ects in retinal images. { Retinal images are usually dominated by the red homogeneous color with di erent shades. Background and foreground are di cult to distinguish. { Nonuniform illumination will lead to inconsistent intensity of both blood vessel and background within an image and across images. 2D retinal image registration: In addition to the challenges for feature extraction and correspondence selection, there are also some di culties for ETDRSbased retinal image registration. { Large homogeneous or textureless areas in retinal images complicate the registration process due to insu cient information for areabased approaches and inadequate features in featurebased approaches. { The overlaps between multi eld images are relatively small. This presents another major di culty to the procedure. It is not reliable to estimate the transformation for the whole images based on the small overlap regions. { The curved retinal surface and camera motion across ETDRS elds re quires highorder nonlinear transformations for image registration. 3D retinal surface reconstruction The 2D registration results are the perquisite for 3D retinal surface reconstruction where we are facing the same challenges in 2D registration as well as some new problems as follows. { The fundus camera parameters are unknown. Camera calibration has to be done prior to 3D reconstruction. { The virtual lens e ect from human cornea and lens distortion from the fundus camera have to be taken into account for 3D reconstruction. 6 1.5 Research Flow In order to provide readers better understandings of the materials and subjects in vestigated in this work, we use this section to provide the general ideas covered in this dissertation. An architecture of our system can be decomposed into a threelayer hierarchy as illustrated in Fig.1.4. F eature Extraction & Correspondences Translation Model Quadratic Model Affine Model 2D Retinal Image Registration Affine Structure Affine Bundle Adjustment 3D Retinal Surface Reconstruction Euclidean Structure Figure 1.4: Architecture of the proposed research. Feature correspondences provide input information for both 2D retinal image reg istration and 3D retinal surface reconstruction. The viceversa direction, knowledge and information regarding displacement, structure and motion will further improve correspondences accuracy. A similar analogy exists between 2D registration and 3D surface reconstruction. Displacements from 2D model supply additional input information for 3D surface reconstruction. On the other hand, geometric shape can considerably improve 2D registration results. In 2D registration layer, three 7 strati cation sublayers are presented in order to improve algorithm's robustness, e  ciency and accuracy. The least complex model, translation, is rst identi ed. Then, information from translation model is used as constraints for the a ne model. Finally, quadratic model is achieved with constraints from a ne model. A similar strati  cation approach is used in 3D surface reconstruction. Although the most relaxed solution in projective space is projective structure, we start with a ne structure. This is due to the fact that we use a ne camera. Once a ne structure is obtained, a ne bundle adjustment is employed to jointly re ne all parameters. Then, extra metric information is used to correct the structure back to Euclidean structure. Brief explanation regarding each layer is given next. Feature Extraction and Correspondence Selection The rst problem can be divided into two subproblems, feature extraction and feature matching/correspondences. Let us start with the rst subproblem, feature extraction. We want to extract fea tures because features can reduce the amount of data needed to be processed. Since \Not all information is created equal [19]", what are the most suitable features for matching? Points are used most of the time. Lines or blobs are also good features since they provide more information compared to points. Good features should con tain as much distinguishable information as possible. The next question would be what are the best approaches to extract features? And once features are obtained, how to match features across images? Although there are studies of image correlation [20, 21, 22], feature extraction and correspondences are pretty much imagedependent problems. Certain algorithms are good for some speci c types of images but perform poorly on other image types. 2D Image Registration Registration is a problem on how to coincide two or more images. Two images are often taken at di erent times, viewpoints, modes, or resolutions. Additionally, an image plane, and an world plane are often not par allel. Hence, it is impossible to simply overlay two images together. To register 8 two images, an \optimal" transformation model has to be identi ed. Numerous al gorithms have been proposed regarding this topic. These methods di ers in many aspects: (1) featurebased methods versus areabased methods; (2) batch methods versus RANSAClike methods; (3) lowlevel methods, e.g. optical ow, autocorrela tion, versus shapebased methods, e.g. template matching; (4) spatial domain versus frequency domain. 3D Surface Reconstruction Visual reconstruction is a process to recover a 3D scene or model from multiple images. It is usually referred to as a structure from motion, SFM, problem. A process usually recovers objects' 3D shapes, cameras' poses (positions and orientations), and cameras' internal parameters (focal lengths, principle points, and skew factors). Many possible camera models exist. A perspective projection is the standard. However, other projections, e.g. a ne, orthographic, are sometimes prove more useful and practical for a distant camera. The main di erences between projections are the required level of calibration. 3D reconstruction problem has been extensively studied and currently is a very active research topic. 1.6 Original Contributions Before the outline of each chapter is given, we would like to summarize main contri butions we believe we have made for this work. This dissertation is written based on a journal paper, four conference papers. There were certain motivations and contri butions at the times each topic was investigated. The speci c nature of retinal images leads to novel solutions for each problem. 1.6.1 Feature Extraction and Correspondence Selection Vascular tree and its bifurcation/crossover points are used as feature and correspon dences. This is due to the two following reasons: (1) vascular tree spans the whole retina hence it exists in every retinal images; (2) bifurcation/crossover points o er 9 more distinguishable information than homogeneous area throughout the retina. Be cause of nonuniform intensity of both background and foreground as well as the low contrast, multidirectional match lters are employed to enhance the contrast of blood vessels from the background. Next, a new description of cooccurrence matrix has been proposed. Then, a new thresholding method based on the idea of local entropy is introduced to extract blood vessels from the background. Because all blood vessels should be connected, length ltering is used to get rid of small separated regions. Finally, bifurcation/crossover points are detected by morphological thinning opera tion and windowbased probing approach. The simulation results suggest that the proposed framework is e cient and robust. Additionally, our false positive rates are lower than other computational expensive techniques while the true positive rates are comparable. A new cooccurrence matrix's de nition The new de nition takes into the account of both image spatial structure and noise in an image. Simulation results on several matched lter images demonstrate good performance in terms of robustness and accuracy. A new thresholding algorithm The proposed equation is slightly resem blance to both local entropy thresholding and relative entropy (or cross entropy) thresholding methods. The simulation results on various matched lter images exhibit better performance in terms of robustness and accuracy. 1.6.2 2D Retinal Image Registration Because areabased and featurebased have their own strengths and limitations, in this work, we combine both areabased and featurebased registration methods to get the advantages each method has o ered along with other decisionmaking cri teria in order to obtain the best optimal solution. In order to achieve robustness 10 and e ciency, hierarchical technique, translation, a ne, and quadratic, is incorpo rated. Because of nonuniform intensity within an image, binary mutual information is proposed for translation estimation. It demonstrates better performance in terms of robustness compared with traditional grayscale mutual information. In addition, multiscale searching strategy is applied to avoid large combinatorial searching space. Sampling point correspondences are introduced when bifurecation/crossover points are inadequate. An iterative closest point algorithm is used to re ne feature points as well as transformation models. Furthermore, two parameters characterizing the displacements along vertical and horizontal directions in translation model, suggest ing relative positions of each eld, can be used for image quality assessment (IQA) regarding eld coverage de nition. A hybrid registration method We combine both areabased and feature based registration methods to get the advantages each method has o ered. A translation model is estimated through binary mutual information while higher models are approximated through featurebased methods. Binary mutual information A binary mutual information is proposed. It demonstrates better performance in terms of robustness compared with tradi tional grayscale mutual information. Empirical conditions The conditions are e ectively combined all the tech niques into one ow where the algorithm is able (1) to adaptively select an appropriate transformation model, (2) to determine whether sampling point correspondences have to be involved, and more importantly, (3) to reject in valid registered pairs. Image quality assessment (IQA) We have addressed an issue of image qual ity assessment in terms of eld coverage by the two parameters characterizing the displacements along vertical and horizontal directions in translation model. 11 1.6.3 3D Retinal Surface Reconstruction In this research, we assume a weakperspective camera because of the two following reasons: (1) the ETDRS imaging standard speci es a 30 eld of view each eye (narrow eld of view); (2) each retinal image has small depth variation. We derive an a ne camera model and show mathematical proof of an a ne condition. A ne structure from motion has been investigated and an a ne factorization method is used for initial reconstruction because the approach can accommodate multiple images and utilize the use of all feature points. An a ne bundle adjustment based on a nonlinear optimization technique is used to re ne an a ne shape and a ne cameras. Then, a Euclidean constraint is involved to correct both the a ne shape and cameras into Euclidean space up to a similarity. Next, we take into an account of an eyeball's geometric constraint in order to generate denser points for surface. We assume that an eyeball is an approximated sphere. A pointbased sphere tting method is introduced. The condition for the a ne camera We have shown a mathematical proof of an a ne camera from a standard linear camera as well as its condition. A ne bundle adjustment Inspired by projective bundle adjustment, we introduce a ne bundle adjustment to re ne all parameters, a ne shape and a ne cameras, simultaneously. A pointbased spherical tting method We introduce a linear approach to solve a nonlinear surface approximation problem. Constrained a ne bundle adjustment with lens distortion updates We introduce an optimization process which incorporates a geometrically meaning ful de nition and lens distortion into a cost function. We propose a constrained optimization algorithm in an a ne space rather than the traditional Euclidean space. The procedure optimizes all of the parameters, camera's parameters, 3D points, physical shape of a retinal surface, and lens distortion, simultaneously. 12 1.7 Outline The organization of this dissertation is illustrated in Fig. 1.5. The motivation and signi cance of this research as well as relevant materials and subjects are presented in this chapter. Chapter 2 reviews thestateofart research on retinal image analysis and cate gorizes them into ve di erent areas, i.e., structure segmentation, image regis tration, 3D reconstruction, image quality assessment (IQA), and image classi  cation. Chapter 3, we reviewe the mathematical background of the proposed research, and the materials presented here serve the foundation of latter chapters. Specif ically, we will address 2D retinal image registration in Chapter 5 that requires 2D/2D transformations, and 3D retinal reconstruction is discussed in Chap ter 6 that involves a camera projection (3D/2D transformations) and 3D registration (3D/3D transformations). Chapter 4 deals with the rst layer of architecture, feature extraction. Fea tures are signi cantly important in motion estimation techniques because they are input to the algorithms. However, most works studied often neglect this part and assume features are available. We have proposed a feature extraction algorithm for retinal images. Bifurcations/crossovers are used as features. A new thresholding algorithm based on our de nition of cooccurrence matrix is proposed. As a result, vascular tree which is an important structure to indicate many diseases has also been extracted. In chapter 5, we consider 2D retinal image registration which are the prob lem of the transformations of 2D/2D. Both linear and nonlinear models are incorporated that account for motions and distortions. A hybrid method has 13 been introduced in order to take advantages di erent methods have o ered along with other decisionmaking criteria. Binary mutual information is pro posed for translation estimation. Hierarchical technique, translation, a ne, and quadratic, is incorporated. In chapter 6, a 3D retinal surface reconstruction issue has been addressed. To generate a 3D scenes from 2D images, a camera projection or transformations of 3D/2D techniques have been investigated. We choose an a ne camera to be a represented model for a fundus camera. We have provide our proof to justify the use of a ne camera. An a ne bundle adjustment based on non linear optimization technique is established to re ne an a ne shape and a ne cameras. A pointbased spherical approximation is introduced. In chapter 7, an objective for this chapter is to solve the problem in an optimal way which means to estimate structure and camera parameters simultaneously by minimizing a physically meaningful cost function. An optimization proce dure, called constrained a ne bundle adjustment with lens distortion updates, is proposed to improve the algorithm's performance in terms of accuracy and robustness. Chapter 8 states future works and concludes the dissertation. 14 Chapter 1: Introduction Chapter 2: Literature Reviews Chapter 3: Motion Parameters & Estimation Chapter 4: Feature Extraction Chapter 5: 2D Registration Chapter 6: 3D Surface Reconstruction Chapter 8: Conclusions & Future Works Transformations of 2D/2D Transformations of 3D/2D Transformations of 3D/3D Chapter 7: Constrained Optimization for 3D Surface Reconstruction Figure 1.5: Outline of the dissertation. 15 CHAPTER 2 Literature Reviews 2.1 Overview of Retinal Image Analysis Research Computerassisted retinal image analysis can be used for many purposes: (1) helping ophthalmologists diagnosing and evaluating eyerelated diseases; (2) patient screening and grading disease severity; (3) facilitating clinical studies; (4) quantifying retinal image quality; and (5) assisting in laser surgical procedure. Numerous techniques have been investigated and developed regrading retinal image analysis researches. The objective of this chapter is to categorize, characterize, and review these algorithms. Before we move further into more detail, we'd like to note here that there are several retinal image types in which two types are widely used: (1) normal retinal (fundus) images; and (2) uorescein angiogram (FA) images. A FA image is obtained by injecting a special dye, called uorescein, into patient's vein in the arm. The dye moves quickly to blood vessels inside the eyes. The result is that blood vessels become more prominent from the background (high contrast). In image analysis point of view, therefore, FA images are generally less complicated to be processed in comparison with normal fundus images. Here, we categorize retinal imagerelated researches into ve main groups: (1) structure segmentation; (2) 2D registration; (3) 3D reconstruction; (4) image quality assessment (IQA); and (5) image classi cation. The owchart is illustrated in Fig. 2.8 where the grayshaded boxes refer to the works involved in this work. 16 2.2 Structure Segmentation Image segmentation is one of the fundamental problems in computer vision and many other research areas. Many subsequent tasks, e.g. feature extraction, pattern recog nition, image retrieval, image registration, image compression, image classi cation, and etc., rely on the quality of image segmentation process. There are no univer sal theories as to what is the best approach for image segmentation. Segmentation is pretty much an imagedependent problem. However, good segmentation is that segmented region should be uniform with respect to some semantic characteristics. As for the retinal image case, we'd like to classify segmentation into two main cate gories, (1) anatomical structures which include blood vessel, optic nerve, and fovea; (2) pathological structures, i.e. lesion, which are abnormal structures. The goal is to detect and present the location of important structures in the retina as well as to nd correspondences across retinal images. Here, we only focus on blood vessel detection since our main tasks are registration and reconstruction and blood vessels are used as features. 2.2.1 Anatomical Structure Segmentation Two signi cant anatomical structures presented in a retina are blood vessels and op tic nerve as shown in Fig. 2.1. Besides, all the medical motivations mentioned above, a segmentation of the vascular tree seems to be the most appropriate representation for the image registration applications. This is due to the two following reasons: (1) vascular tree spans the whole retina hence it exists in every retinal images; (2) bifurca tion/crossover points o er more distinguishable information than other homogeneous areas throughout the retina. A variety of approaches have been proposed for vascular segmentation [9, 8, 6, 23, 24, 25, 26, 27, 28, 29, 30, 31]. Here, we arrange them into two main categories, supervised and unsupervised techniques. The major di erence is that supervised 17 Optic Disc Fovea Blood Vessel Figure 2.1: Anatomical structures in a human retina. techniques require training data. Although, generally speaking supervised methods should yield better segmentation results because of additional knowledge, training database, most of the algorithms proposed belong to unsupervised category. This is due to the fact that handlabeled vessels is a tedious task which takes more than a couple of hours to complete just one retinal image. Therefore, it is not practical for real applications. Supervised Approaches. Manually labeled images are required for training purpose. To the best of our knowledge, there are only four retinal's blood ves sel segmentation papers [25, 32, 6, 7] belonging to this group. Sinthanayothin et.al. [25, 32] used a multilayer perceptron for blood vessel classi cation with backpropagation as a training approach. An advantage is its ability to deal with nonlinear classi cation. The limitations are that it is di cult to see what is going on inside the hidden layers and training process is repeated every time new features are incorporated. They partitioned each retinal image into 10 10 pixels subimages. There were total 25094 subimages in which 5 6 of the data 18 were handlabeled groundtruth images and used as their training set. The rest was used for validation. Staal et.al. [6, 7] used ridge detection to locate candi date blood vessel segments. Ridges are de ned as points where the image has an extremum in the direction of the largest surface gradient [33, 6]. The direction of largest surface gradient is the eigenvector of the Hessian matrix correspond ing to the largest absolute eigenvalue. After ridges are de ned, several feature sets and decisionmaking criteria are applied to create convex sets as well as classify pixels to be vessels or not. Then, ridge pixels were groups and convex sets were formed. After that, the image was partitioned into patches based on the convex set. Every pixel was assigned to the convex set to which it was clos est. Next, feature sets were formed and kNNclassi er was employed. Twenty handlabeled groundtruth were used as a training set and twenty images were used for validation. Although, good segmentation results are reported, the al gorithm is computationally expensive and handlabelled groundtruth images are mandatory. As the title has suggested, approaches belong in this group need handlabelled groundtruth images. Handlabelled vascular tree is not practical in real appli cation since it consumes a great deal of time. This is a signi cant drawback of algorithms in this group. Unsupervised Approaches. No groundtruth information is provided. Va riety of unsupervised approaches have been proposed. We further divide them based on the proposed techniques. { WindowBased. Window is referred to a pattern that is used to transform an image. The process is usually followed by a binarized technique. Pinz et.al. [26] used local gradient maxima because it occurs at the boundary of the vessels, the signi cant edges along these boundaries were extracted. 19 The grouping process searched a partner for each edge which satis es cer tain criteria like opposite gradient direction and spatial proximity. Only the vascular centerlines could be detected. FA images were used in their research. Zana et.al. [27] also used FA images. Therefore, they de ned a vessel as a bright pattern and linearly piecewise connected. Opening morphological lters with linear structuring elements were used. Each structuring element is 15pixels long (every 15o). The sum of tophats on the ltered image brightened all blood vessels (linear parts) and reduced small bright noise. In order to remove nonvessel parts, principal curvature was computed by using Laplacian followed by morphological opening. { Classi cationBased. First step is to divide an image into di erent regions. Then, multiple rules are applied to classify pixels in each region as being vessels or not vessel. Hoover et.al. [9] used twelve 16 16pixel matched lter proposed in [34] to map the vascular tree. A set of criteria was tested to determine the threshold of the probe region, and ultimately to decide if the area being probed was a blood vessel. Since the MFR image was probed in a spatially adaptive way, di erent thresholds were applied throughout the image for mapping blood vessels. Jiang et.al. [8] used veri cation based multiple threshold probing framework. A retinal image was probed at di erent threshold values. At a particular threshold, Euclidean distance transform was performed. Then, vessel candidates were pruned by means of the distance map to only retain centerline pixels. Finally blood vessels were reconstructed by the particular threshold. { TrackingBased. Zhou et.al. [24], de ned each vessel segment by three attributes, direction, width, and center point. The density distribution of cross section of a blood vessel were estimated using Gaussian shaped func tion. Individual segments were identi ed using a search procedure which 20 kept track of the center of the vessel and made some decisions about the future path of the vessel based on certain vessel properties. This method required that beginning and ending search points were manually selected using cursor. FA images were used in their research. Can et.al. [35] used tracing method that based on adaptive exploratory processing of an im age. Algorithm explored the image along a grid to seek local graylevel minima by using directional lowpass template. Intersections between grid and local minima were labeled as seed points along with their orientations. Seed points were then used for tracing which was repeated for 16 direc tions. The tracing followed the strongest edge. Chutatape et.al. [36] used secondorder derivative Gaussian matched lters to locate center point, width, and orientations. Then, extended Kalman lter was employed for the optimal linear estimation of the next possible location. The algorithm began from circumference of an optic disc. Wu et.al. [37] divided blood vessels into large and small. Blood vessels were enhanced with matched lter. Gabor standard deviation lter was used to distinguish the large and small vessels. Then, 2D Gaussian lter was used for tracing. Di erent rules for forward and backward veri cation were used for large and small vessels. For windowbased techniques, performance of algorithms largely depend on thresholding techniques. To our surprise, most papers belonged to this group do not put a focus on thresholding algorithm. For classi cationbase techniques, multiple threshold values are applied to di erent regions. This leads to sev eral decisionmaking criteria which in themselves require numerous threshold values to select an appropriate threshold value for a particular region. For trackingbase techniques, an initial point needs to be identi ed. Moreover, only centerline, not the whole branch of vascular tree, can be extracted. 21 We have addressed the blood vessel extraction issue and have proposed a new thresholding technique. More detail of our algorithm can be found in Chapter 4. 2.2.2 Pathological Structure Segmentation Since our research does not focus on extracting lesion and due to limited time, we will not go into detail of this subject. We include the subject here for the sake of completeness. Basically, the algorithms can be divided into the same two main groups, supervised and unsupervised, as in anatomical structure extraction algorithms. 2.3 2D Image Registration Image registration is a process trying to coincide two or more images. To register two images, an optimal transformation must be identi ed. Image registration is used for many purposes, e.g. integrate information from di erent images, detect changes in im ages taken at di erent times, object recognition, and etc. There are two major types of distortions in an image: (1) misalignment between images; (2) distortion caused by camera (world plan and image plane are not parallel). Hence, it is impossible to simply overlay two images together. Several retinalrelated registration methods have been proposed [38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51]. Reviews of general registration can be found in [52, 53]. Several reviews of medical image registration can be found in [54, 55, 56, 57]. In the 2D retinal image registration, we categorized based on two criteria, types and techniques. As for the rst criteria, types, we classify registration types into three main categories, (1) viewbased or mosaic registration; (2) temporalbased registration; and (3) modalbased registrations. Regarding the second criteria, techniques, we group registration techniques into only two main cat egories, featurebased and areabased. Featurebased methods require set of feature correspondences, which can be points, lines, or blobs, in order to nd the optimal 22 registration model. Areabase methods deal with images without trying to locate salient features. The algorithms are based on pixel intensities and certain objective functions. Typically, there are two major factors that may degrade the performance of areabased methods: (1) nonconsistent/nonuniform contrast within an image; and (2) large homogeneous/textureless areas. The performance of featurebased meth ods largely depends on su cient and/or reliable correspondences, especially, when the overlapping part of an image pair is very limited or when there are mismatched correspondences. We rst group the 2D retinal registration researches based on types: (1) view based or mosaic registration; (2) temporalbased registration; and (3) modalbased registrations. For each type, it can be further divided into featurebased and area based techniques. 2.3.1 ViewBased Registration This category concerns retinal images taken at di erent views and how to integrate these images into one view as shown in Fig. 2.2. There are fourteen images taken per one eye according to ETDRS standard. Seven images from di erent areas of a retina and their corresponding stereo pairs. Integrate all of the images into one piece can be used as a spatial map during surgical procedure and can assist ophthalmologists in evaluating disease. Can et.al. [38, 39] proposed hierarchical featurebased approach. The similarity matrix for all possible correspondences was computed based on the orientations of vascular centerlines and the similarity measure was converted to a prior probabil ity. The transformation was estimated in a hierarchical way, from the zerothorder model to the rstorder model and nally to the secondorder model. Stewart et.al. [43] proposed a featurebased approach called dualbootstrap iterative closest point algorithm for registration. The approach started from one or more initial, loworder 23 Figure 2.2: Viewbased registration. See more detail discussion in Chapter 5. estimates that were only accurate in small image regions called bootstrap regions. In each bootstrap region, the method iteratively re ned the transformation estimation, expanded the bootstrap region, and tested to see if a higherorder model could be used. The method required accurate initialization of at least one point correspon dence. High success rates were reported in [43]. We have addressed the viewbased registration issue [58, 59] More detail of our algorithm can be found in Chapter 5. 2.3.2 TemporalBased Registration Retinal images from the same patient are acquired at di erent times. Temporalbased registration can greatly help ophthalmologists to examine progress of the treatment or disease. Fang et.al. [60] introduced an areabased a ne elastic model for temporal reg istration. Thinned vascular tree was extracted by using morphological, linear lter, and region growing techniques. An energy function was de ned to elastically register two images based on binary vascular tree. Ritter et.al. [1] used areabased technique 24 Figure 2.3: Temporalbased registration [1]. by employing mutual information as a similarity criteria. Simulated annealing was used as a searching technique. Simulation result from Ritter et.al. [1] is shown in Fig. 2.3. 2.3.3 ModalBased Registration Retinal images are acquired from di erent sensors, normal mode, FA mode, and other modes. Di erent modal images o er distinctive information. Ophthalmologists need to combine diverse information from di erent modes in order to correctly diagnosing and evaluating the diseases. In [2], registration between two modes, FA and redfree (RF) modes, had been studied. Matsopoulos et.al. [2] used areabased technique. A coarse vessel segmenta tion was performed in both FA and RF images. The measure of match (MoM), which was similar to a logical operation, was proposed to be used as an objective function and the genetic algorithm was chosen to be the optimization technique. Three trans formations, a ne, bilinear, andprojective, were included. Simulation result from [2] is shown in Fig. 2.4. Matsopoulos et.al. [61], later, proposed a featurebased method for modalbased registration. Vascular centerlines and their bifurcation points were 25 Figure 2.4: Modalbased registration [2]. extracted. Correspondences were identi ed by using neural networkbased approach, self organizing maps (SOM). A ne transformation was estimated in a least mean square sense. Zana et.al. [40] used featurebased technique. Landmark points were extracted and labeled with vessel orintations. An anglebased invariant was computed to give a probability for two points to match. Then, Bayesian Hough transform was used to sort the transformations with their respective likelihood. The most likely transformation was chosen for registration. 2.4 3D Reconstruction 3D reconstruction is a process to recover a 3D scene or model from multiple im ages. It is usually referred to as a structure from motion, SFM, problem. A process usually recovers objects' 3D shapes, cameras' poses (positions and orientations), and cameras' internal parameters (focal lengths, principle points, and skew factor). Many possible camera models exist. A perspective projection is the standard. A ne and orthographic projections are widely used in distant cameras. Stereo techniques, 26 e.g. cepstrum, are also commonly used for depth estimation because of their sim plicity. They requires only a stereo pair and it does not need camera calibration. Although a general SFM problem has been extensively studied, surprisingly there are not much researches published and dedicated to 3D retinal reconstruction. We categorize retinalrelated 3D reconstruction into two main categories, 3D surface reconstruction and local depth reconstruction. 2.4.1 3D Surface Reconstruction 3D surface reconstruction refers to global depth reconstruction where we consider the curvature of an eyeball. Deguchi et.al. [3, 62] modelled both fundus camera and eye lens with a single lens. They utilized the fact that a fundus has a spherical shape and image of sphere by the eye lens results in a quadratic surface. They, then calibrated a camera by using twoplane method to get the quadratic surface. Then, eye lens parameters were identi ed to recover fundus's spherical surface. The simulation result from [3] is shown in Fig. 2.5. Choe et. al. [4] used PCAbased directional lters to extract candidate seed points (Y features). A gradient descent was employed to model Y features and match pairs of features. A planeandparallax was employed to estimate the epipolar geometry because a nearplanar retinal surface can obstruct a traditional fundamental matrix estimation. The stereo pair is recti ed. Then, a Parzen windowbased mutual information was used to generate dense disparity map. The simulation result from [4] is shown in Fig. 2.6. In this work, we have also addressed the 3D surface reconstruction issue using a di erent approach. More detail of our algorithm can be found in Chapter 6. 2.4.2 Local Depth Reconstruction Local depth reconstruction refers to recovering the depth of individual objects, e.g. optic nerve and lesions, inside retina. Mitra et.al. [63] proposed the use of power 27 Figure 2.5: 3D retinal surface reconstruction [3]. cepstrum to nd disparity between a stereo pair. Then depth was calculated by a simple triangulation method. Corona et.al. [5] extended the idea from Mitra et.al. [63] and proposed a framework that combined the use of power cepstrum and cross correlation techniques to extract optic nerve's depth from a stereo pair. Then, bspline was employed to generate optic nerve smooth surfaces. The simulation result from [5] is shown in Fig. 2.7. 2.5 Image Quality Assessment (IQA) Image quality assessment plays a signi cant role in digital image processing research. Many e orts have been made to quantify image quality [64, 65, 66, 67]. In retinal image case, because ophthalmologists rely on retinal images for disease diagnoses and evaluation, the retinal images need to meet the image quality criteria. Each set of fundus photographs should be assessed for quality before the photographs are sent to ophthalmologists. A photographer should be able to decide whether a particular image set meets the three ETDRS requirements: 1. Clarity & focus. This is an obvious requirement in IQA. Focus is de ned as sharpness which means the transition between background and foreground is sharp. Clarity is de ned as image contrast. 28 Figure 2.6: 3D retinal surface reconstruction [4]. 2. Field de nition. Images need to cover the required areas of retina. The positions of key anatomical structures, optic nerve and fovea, in images are also necessary. 3. Stereo e ect. Depth can be perceived only if displacement between images is in an acceptable range. A software tool to standardize and certify image quality is in demand to assist photographers. Photographers can take a second shot immediately if necessary, rather than calling the patient back for another visit. A software tool should be able to quan tify an objective image quality that correlate with perceived quality measurement. There are limited number of works that have been published and devoted to the issue of retinal image quality assessment. Lee et.al. [68] used template intensity histogram derived from 20 goodquality images. Its base or width was employed as 29 Figure 2.7: Optic nerve's depth reconstruction [5]. an indicator of contrast. The quality of a target image was evaluated by convolving its histogram with the template histogram. Lalonde et.al. [69] suggested the use of two criteria, the distribution of the edge magnitudes and the local distribution of the pixel intensity, to assess images into three groups, good, fair, and poor. Awawdeh et.al. [70] proposed the use of power cepstrum for stereo quality assessment. The stereo pair are added. Then, DCTbased power cepstrum was applied to estimate the displacements which were indicators to stereo quality. In this work, we have also addressed the eld de nition issue [58, 59] which is a byproduct of 2D image registration. More detail of our method can be found in Chapter 5. To the best of our knowledge, there is no other previous work on the eld de nition topic. 2.6 Image Classi cation The term, image classi cation, is referred to a process used to assign a speci c class to a pixel. In retinal image case, it describes the anatomical and pathological classi cation. Since our research does not focus on image classi cation and due to limited time, we will not go into detail of this subject. We include the subject here for the 30 sake of completeness. 31 Retinal Image Analysis Structure Segmentation 2D Registration 3D Reconstructure Image Quality Assessment Image Classification Anatomical Structure Pathological Structure Viewpoint Temporal Modal Focus/Clarity Field Definition Stereo Effect Surface Reconstruction Local Depth Reconstruction Figure 2.8: Overview of retinal image analysis research. The grayshaded boxes refer to the topics involved in this work. 32 CHAPTER 3 Motion Parameters And Motion Estimation 3.1 Introduction The motion model is needed in order to infer information from multiple images. Suit able motion parameters for representing the possible motions of the camera has to be chosen before motion estimation. The motion parameters are nothing but the parameters in coordinate transformation. There are three major types of motion parameters: (1) 2D/2D, (2) 3D/3D, and (3) 3D/2D. In the 2D/2D case, they map an image coordinate, (x; y) to another image coordinate ( x; y). 2D retinal im age registration requires 2D/2D transformations. In the 3D/2D case, they map a world coordinate (X; Y;Z) to an image coordinate (x; y). The 2D/3D transfor mation is generally referred to as camera projection. In order to perform 3D retinal surface reconstruction, the 3D/2D transformations (or camera projection) need to be identi ed. In the 3D/3D case, a world coordinate (X; Y;Z) is mapped to another world coordinate ( X ; Y ; Z). After 3D retial surface is approximated, we need to fuse all of the retinal surfaces into one single view. This task involves the 3D/3D trans formations. Variety of other tasks, e.g. 3D reconstruction, 2D registration, video segmentation, object tracking, site monitoring, and etc, need motion estimation as well. Transformation of 2D/2D case and 3D/3D case are reviewed in sections 3.2 and 3.3. The 3D/2D mapping, camera projection, will be discussed in section 3.4. 33 3.2 Transformations of 2D/2D The 2D/2D transformations are the mapping between image coordinate systems. Several commonly used coordinate transformations are listed in Table 3.1. Translation Rigid Affine Projective Quadratic Figure 3.1: Results from 2D coordinate transformation. From Table 3.1, tx and ty are translation parameters in vertical and horizontal directions respectively. s is a scaling factor. r11; : : : ; r22 are rotation parameters. 11; : : : ; 26 are general motion parameters. A translation model is the simplest one. It can only handle displacements be tween images. A rigid model can further deal with translation, rotation and scaling between images. Both of them can not handle images with distortions. Absolute distance and area are preserved. They are called rigid invariants. An a ne model includes translation, rotation, scaling, and shearing distortion. Parallelism and rela 34 tive distance along parallel direction are a ne invariants. A projective is the general linear transformation describing translation, rotation, scaling, shearing distortion, keystoning distortion, and chirping distortion. The projective invariants are a ratio of ratios (cross ratio) of distances and linearity. Chirping means the e ect of increas ing or decreasing spatial frequency with respect to spatial location [71]. Keystoning means the e ect of convergence lines [72]. The results from each transformation are illustrated in Fig. 3.1. A quadratic model is a nonlinear transformation, accounting for translation, rotation, scaling, shearing, keystoning, chirping, and nonlinear dis tortions. Nonlinear distortions include several components, e.g. radial distortions, tangential distortions. 3.3 Transformations of 3D/3D 3D/3D transformations are the mapping between two 3D world coordinate systems. Several commonly used coordinate transformations are listed in Table 3.2. From Table 3.2, tx, ty and tz are translation parameters. r11; : : : ; r33 are elements in rotation matrices. s is a scaling factor. 11; : : : ; 44 are general motion parameters. A projective model is the most general case. Every points in the projective space are treated equally. Same as in 2D/2D transformation, the cross ratio is preserved in the projective space. When the plane at in nity is identi ed, the projective space becomes an a ne space. Hence, points and plane at in nity are a ne invariants. Parallelism and relative distance are other a ne invariants. Once an absolute conic is identi ed, the a ne space becomes a similarity (or metric) space. Assume (X; Y;Z; T) represents a homogeneous coordinate. Then, X2 + Y 2 + Z2 = 0; T = 0 is known as the absolute conic [73]. Therefore, absolute conic is preserved in the similarity space. Without scaling e ect, the similarity space becomes an Euclidean space. Absolute distance and volume are Euclidean invariants. The results from each transformation are illustrated in Fig. 3.2. 35 Euclidean Similarity Affine Projective Figure 3.2: Results from 3D coordinate transformation. 3.4 Transformations of 3D/2D A camera is a mapping between a 3D world coordinate system and 2D image co ordinate system. Therefore, the 3D/2D transformation is generally referred to as a camera projection. Camera motion has been one of the most important subjects in computer vision researches. Motion parameters can be estimated from two or more images depending on types of projection. Before camera motions are described, it is necessary to consider process of image formation. 36 3.4.1 Image Formation A pinhole model is a basic camera model for the central projection of points in a scene into an image plane as depicted in Fig. 3.3. Z Y X C M m image plane principal axis camera center f Figure 3.3: Pinhole camera geometry. Camera coordinate system is aligned with world coordinate system. Under a pinhole camera model, a point in space with coordinate ^M = (X; Y;Z)T is mapped to a point on the image plane ^m = (x; y). Then we have the relationship x = fX Z y = fY Z ; (3.1) where f represent the focal length of the camera. Using homogeneous coordinate with principal point o set (cx; cy) and skew angle s of image's pixel, the central projection can be represented as 37 266664 x y 1 377775 = 266664 fx s cx 0 0 fy cy 0 0 0 1 0 377775 266666664 X Y Z 1 377777775 : (3.2) In general a camera coordinate frame and a world coordinate frame are not coin cide. The equation becomes 266664 x y 1 377775 = 266664 fx s cx 0 fy cy 0 0 1 377775 266664 1 0 0 0 0 1 0 0 0 0 1 0 377775 264 RT RT t 0T3 1 375 266666664 X Y Z 1 377777775 ; (3.3) where R is a 3 3 rotation matrix and t = [tx; ty; tz]T is a translation vector. The equation can be simpli ed to m = K[Rj Rt]M m = PM; (3.4) where m and M denote image and world homogeneous coordinates respectively. K represents the e ect on the projection known as intrinsic parameters. [Rj Rt] are called extrinsic parameters. P is camera projection matrix. 3.4.2 Camera Models Three camera models, perspective, weak perspective, and orthographic models, which are widely used in computer vision, and two motion transformations, projective and a ne ,associated with the three camera models are discussed in this section. Readers may nd the terms are a bit confusing. This is due to the fact that two scienti c elds, photogrammetry and mathematics, use di erent terms to describe the same transformations. For instance, people in photogrammetry use the term perspective 38 projection to describe a general linear camera which happens to be in the same format as a projective transformation used in the mathematical eld. Perspective Projection A projective projection is the most general case. The projective transform or perspective camera can be represented by a mathematical equation as Tprojective = 266664 fx s cx 0 fy cy 0 0 1 377775 266664 RT 1 RT 1 tx RT 2 RT 2 ty RT 3 RT 3 tz 377775 = 266664 fxRT 1 + sRT 2 + cxRT 3 fxDx + sDy + cxDz fyRT 2 + cyRT 3 +fyDy + cyDz RT 3 Dz 377775 ; (3.5) where RT i is the ith row of the rotation matrix R and Dx = RT 1 tx, Dy = RT 2 ty, Dz = RT 3 tz. Image and world coordinates are related by ^m =264 (fxRT 1 +sRT 2 )M+fxDx+sDy RT 3 M+Dz fyRT 2 M+fyDy RT 3 M+Dz 375 +264 cx cy 375 ; (3.6) where ^m denotes nonhomogeneous image coordinates. WeakPerspective Projection A weak perspective camera can be represented in a mathematical form as Taffine = 266664 fxRT 1 + sRT 2 fxDx + sDy + cxDz fyRT 2 fyDy + cyDz 0T3 Dz 377775 : (3.7) Equation 6.3 has the similar form as a general a ne transform. Hence, it is termed a ne projection. Some call it a ne camera or weakperspective camera. Image and world coordinates are then related by 39 ^m =264 (fxRT 1 +sRT 2 )M+fxDx+sDy Dz fyRT 2M+fyDy Dz 375 +264 cx cy 375 : (3.8) More detail including the mathematical proof of an a ne camera can be found in section 6.2. Orthographic Projection Orthographic is the simplest case of a camera pro jection. It projects points along zaxis. The projection can be represented as Torthograpic = 266664 1 0 0 0 0 1 0 0 0 0 0 1 377775 : (3.9) The three projection models are illustrated in Fig. 3.4. An orthographic projection has ve degree of freedom, three parameters for a rotation matrix and two parameters for the displacements. It is suitable for the case where a world plane and an image plane are parallel. An a ne camera has eight degree of freedom corresponding to the eight nonzero element in a matrix. It is appropriate for a distant camera or a large focal length camera. A general projective camera has eleven degree of freedom, de ned up to an arbitrary scale. It is a general de nition for any linear camera. 3.5 A ne Structure From Motion An a ne structure from motion theorem is rst proposed by Koenderink and Van Doorn [74]. They have shown that two distinct views are enough to reconstruct a scene up to an arbitrary a ne transformation without a camera calibration. They have suggested the use of local coordinate frame (LCF). Later their algorithm has been re ned by Quan et.al. [75], Demy et.al. [76], and Shapiro [77]. Then, Tomasi and Kanade [78] proposed an a ne factorization method which eliminates the use of LCF and instead utilize the entire set of points. This section will review the a ne 40 Z M Image Plane Perspective Weak Perspective C f X d Orthographic Figure 3.4: Onedimensional image formation. Orthographic is projected along z axis. Perspective is projected along principal ray direction. Weak perspective is a combination between perspective and orthographic. A point, rst, is projected along zaxis to a plane Z = d. Then, perspective projection from the plane. structure from motion. Furthermore, we categorize a ne structure from motion into two main categories, two view geometry and multiple view geometry. 3.5.1 Two View Geometry Geometric Approach: Local Coordinate Frame Assume n general points in a scene. Four noncoplanar scene points, Mi, i 2 0; : : : 3, withM0 being a reference point can be considered as de ning a 3D a ne basis. De ne axis vectors Ei = Mi M0, i 2 0; : : : 3 which are called the local coordinate frame (LCF). Any points in a scene can be expressed as follow Mi = M0 + iE1 + iE2 + iE3; i 2 1; : : : n 1; (3.10) where , , and are a ne invariant coordinates. The prove is given next. 41 Under a 3D a ne transformation, M i = RMi + T; (3.11) where M is the new world position, R is a 3 3 matrix, and T is a 3vector. Then, M i M 0 = A(Mi M0) = iAE1 + iAE2 + iAE3 = i E 1 + i E 2 + i E 3: (3.12) Equation 3.12 demonstrates that , , and are indeed a ne invariant coordi nates. The a ne coordinates are independent of frame. Let's extend the idea into image plane under a ne projection. m = AM + d m = A M + d; (3.13) where m and m are 2vector image image coordinates from rst and second views respectively. A and A are general 2 3 matrices. d and d are general 2 1vectors. From Equations 3.11, 3.12, 3.13, and the di erences of vectors eliminate addition terms, T and d, we get mi m0 = ie1 + ie2 + ie3 mi m0 = i e1 + i e2 + i e3; (3.14) where ei = AEi, i 2 0; : : : ; 3 and ei = AREi, i 2 0; : : : ; 3. Although it seems redundant to represent image coordinates with three basis, the extra basis allows 3D a ne coordinates, ( , , ), to be computed. The solution may be obtained by minimizing a cost function in a least mean square sense. Algebraic Approach: Fundamental Matrix Given two a ne views, the relationship between them can be de ned as follows [79] [80] a xi + b yi + cxi + dyi + e = 0; (3.15) 42 where mi = [xi; yi; zi]T , mi = [ xi; yi; zi], and a; b; c; d; e are unknown parameters. This equation is termed a ne epipolar constraint. Rearrange Equation 3.15 in a matrix format mT i FAmi = xi yi 1 266664 0 0 a 0 0 b c d e 377775 266664 xi yi 1 377775 = 0: (3.16) FA is called a ne fundamental matrix which is an algebraic representation of a ne epipolar geometry. The epipolar geometry is the geometry between two views. 3.5.2 Multiple View Geometry Local Coordinate Frame Local Coordinate Frame explained in Section 3.5.1 can be extended to accommodate multiple view geometry. Assume n features appear in f distinct views. W = LS; (3.17) where W is a 2f (n 1) matrix containing the observations, L is a 2f 3 matrix containing the a ne basis, and S is an unknown 3 (n 1) matrix containing the a ne structure. A ne Factorization Tomasi and Kanade [78] has proposed an a ne factor ization method which eliminate the use of LCF and utilize the whole set of point correspondences. Suppose there are f a ne views and n point correspondences from each view. ^mi = Ai ^M + di; i 2 1; : : : f; (3.18) where ^m and ^M denotes image and world nonhomogeneous coordinates respectively. A is an arbitrary 2 3 matrix and d represents any 2 1 vector. 43 With the similar idea of LCF, we need to select one point as an origin or a center of mass. The Equation 3.18 becomes 4^mi = Ai(4 ^M ); i 2 1; : : : f W = LS; (3.19) W denotes a 2f n matrix containing set of 2D point correspondences with respect to the center of mass. S denotes a 3 n matrix containing a ne shape. L denotes 2f 3 matrix. With rank theorem, W is at most rank three. Singular value decomposition (SVD) is used to factorized W = UWVT . Therefore, L and S are the left and right eigenvectors corresponding to the three greatest eigenvalues. L = U3 S = W3V T 3 : (3.20) Concatenated Image Space Shapiro [77] has provided an insight geometrical meaning of an a ne factorization method proposed by Tomasi and Kanade [78] in terms of concatenated image space (CI space). For simpli cation, assume there are 2 views. If L in Equation 3.19 is decomposed into three columns L = [l1jl2jl3], then Equation 3.19 can be rewritten as wi = 4Xil1 +4Yil2 +4Zil3: (3.21) Shapiro [77] has indicated that \the 4dimensional wi is a linear combination of the three column vectors, and lies on a hyperplane in the 4dimensional CI space. This hyperplane is spanned by the three columns of Li and its orientation depends solely on the motion of the camera, while the distribution of points within the hyperplane depends solely on the scene structure (as shown in Fig. 3.5)." 44 Figure 3.5: The concatenated image (CI) space. If the images are noise free, then wi would lie exactly on the hyperplane . In practice, noise relocates wi from hyperplane . Hyperplane which best tted the w must be identi ed. If noise distribution is assumed to be zero mean, isotropic and Gaussian, then the maximum likelihood estimation of optimal hyperplane is found by minimizing n i=0(wi Lsi)T 1 wi (wi Lsi): (3.22) Because points are independent of each other, we simply assume wi = 2I for all i. In this case, the above equation becomes min L;s n i=0(wi Lsi)2: (3.23) This is exactly what a ne factorization does. The CI space and the a ne coordi 45 nate frame give insights into the physical concepts of the a ne factorization method. The cameras' motion can be represented by a hyperplane and its orientation. While the 3D points, in an ideal case, should be lain on the hyperplane. 3.6 Conclusions In this chapter, we have reviewed the mathematical background of the proposed re search, and the materials presented here serve the theoretical foundation of latter chapters. Speci cally, we will address 2D retinal image registration in Chapter 5 that requires 2D/2D transformations, and 3D retinal reconstruction is discussed in Chapter 6 that involves a camera projection (3D/2D transformations) and 3D registration (3D/3D transformations). Also, as the prerequisite of all geometrical transformations discussed above, feature points or correspondences have to be ex tracted rst that is the focus of the next chapter. 46 Table 3.1: 2D Transformation Models Models Transformation Models DOF Linear Transformations Translation 0BBBB@ x y 1 1CCCCA = 0BBBB@ 1 0 tx 0 1 ty 0 0 1 1CCCCA 0BBBB@ x y 1 1CCCCA 2 Rigid 0BBBB@ x y 1 1CCCCA = 0BBBB@ sr11 sr12 tx sr21 sr22 ty 0 0 1 1CCCCA 0BBBB@ x y 1 1CCCCA 4 A ne 0BBBB@ x y 1 1CCCCA = 0BBBB@ 11 12 tx 21 22 ty 0 0 1 1CCCCA 0BBBB@ x y 1 1CCCCA 6 Projective 0BBBB@ x y 1 1CCCCA = 0BBBB@ 11 12 13 21 22 23 31 32 33 1CCCCA 0BBBB@ x y 1 1CCCCA 8 Nonlinear Transformations Quadratic 0B@ x y 1CA =0B@ 11 12 13 14 15 16 21 22 23 24 25 26 1CA 0BBBBBBBBBBBBBB@ x y 1 x2 y2 xy 1CCCCCCCCCCCCCCA 12 47 Table 3.2: 3D Transformation Models Models Transformation Models DOF Euclidean 0BBBBBBB@ X Y Z 1 1CCCCCCCA = 0BBBBBBB@ r11 r12 r13 tx r21 r22 r23 ty r31 r32 r33 tz 0 0 0 1 1CCCCCCCA 0BBBBBBB@ X Y Z 1 1CCCCCCCA 6 Similarity 0BBBBBBB@ X Y Z 1 1CCCCCCCA = 0BBBBBBB@ sr11 sr12 sr13 tx sr21 sr22 sr23 ty sr31 sr32 sr33 tz 0 0 0 1 1CCCCCCCA 0BBBBBBB@ X Y Z 1 1CCCCCCCA 7 A ne 0BBBBBBB@ X Y Z 1 1CCCCCCCA = 0BBBBBBB@ 11 12 13 tx 21 22 23 ty 31 32 33 tz 0 0 0 1 1CCCCCCCA 0BBBBBBB@ X Y Z 1 1CCCCCCCA 12 Projective 0BBBBBBB@ X Y Z 1 1CCCCCCCA = 0BBBBBBB@ 11 12 13 14 21 22 23 24 31 32 33 34 41 42 43 44 1CCCCCCCA 0BBBBBBB@ X Y Z 1 1CCCCCCCA 15 48 CHAPTER 4 Feature Extraction 4.1 Introduction We want to extract features because features can reduce the amount of data needed to be processed. Since \Not all information is created equal [19]", what are the most suitable features for matching? Points are used most of the time. Lines or blobs are also good features since they provide more information compared to points. Good features should contain as much distinguishable information as possible. The next question would be what are the best approaches to extract features? Feature extraction are pretty much imagedependent problems. Certain algorithms are good for some speci c types of images but perform poorly on other image types. Several researches have been entirely devoted to this topic. In this research, vascular tree and its bifurcation/crossover points are used as fea ture and correspondences. This is due to the two following reasons: (1) vascular tree spans the whole retina hence it exists in every retinal images; (2) bifurcation/crossover points o er more distinguishable information than homogeneous area throughout the retina. The automatic detection of blood vessels in the retinal images can help physi cians for the purposes of diagnosing ocular diseases, patient screening, and clinical study, etc. Information about blood vessels in retinal images can be used in grading disease severity or as part of the process of automated diagnosis of diseases. Blood vessel appearance can provide information on pathological changes caused by some diseases including diabetes, hypertension, and arteriosclerosis. The most e ective treatment for many eyerelated diseases is the early detection through regular screen 49 ings. The organization of this chapter is as follows. The importance of blood vessel de tection in medical applications are given in this section. Currently available di erent methods for the detection of blood vessels have been reviewed in section 4.1.2. Then, Section 4.3 describes the implementation of the proposed algorithm. Simulation re sults and the performance of our algorithm are presented in section 4.4. 4.1.1 Entropy Thresholding Pun [81, 82] was the rst to adopt Shannon's information theory [83] in image thresh olding applications. Pun's thresholding method, entropy was de ned by only its gray level histogram. Pun only considered one probability distribution. Kapur et.al. [84], later, extended the idea in Pun's by considering two probabilities distributions, one for object and one for background, to binarize an image into foreground and back ground. Those two methods, however, did not take spatial relationship or image structure into its entropy. The two images with identical histogram will always result in the same threshold value. Pal et.al. [85] was the rst to propose the use of lo cal entropy thresholding for image thresholding applications which takes the spatial distribution or image structure into consideration. Instead of a graylevel histogram, Pal et.al. [85, 86] proposed twodimensional histogram called a cooccurrence matrix. Elements in cooccurrence matrix represented spatial distribution in an image. Chang et.al. [87] proposed a relative entropy (cross entropy) approach for image thresholding applications. The method involved KullbackLeiber distance in nding a threshold value that minimize the mismatch between a grayscale image and a binarized image. In this work, we combine the concepts proposed by Pal [85, 86] and Chang et.al. [87] and propose a new thresholding method. We also de ne a new de nition for a cooccurrence matrix. 50 4.1.2 Blood Vessel Extraction There were many previous works on extracting blood vessels in retinal images which can be found in chapter 2: literature review. Here we'll mention a few of these meth ods that have been considered to be stateoftheart methods at the time they were proposed and have been cited by almost every retinal vascular tree extraction papers. An e cient piecewise threshold probing technique was proposed by Hoover et.al. [9] where the matched lterresponse (MFR) image was used for mapping the vascular tree. A set of criteria was tested to determine the threshold of the probe region, and ultimately to decide if the area being probed was a blood vessel. Since the MFR image was probed in a spatially adaptive way, di erent thresholds can be applied throughout the image for mapping blood vessels. Jiang et.al. [8] used veri cation based multiple threshold probing framework. A retinal image was probed at di erent threshold values. At a particular threshold, Euclidean distance transform was per formed. Then, vessel candidates were pruned by means of the distance map to only retain centerline pixels. Finally blood vessels were reconstructed by the particular threshold. Staal et.al. [6] used supervised method. Twenty handlabeled ground truth were used as a training set and twenty images were used for validation. Ridge detection was employed to locate candidate blood vessel segments. Then, ridge pixels were groups and convex sets are formed. After that, the image was partitioned into patches based on the convex set. Every pixel was assigned to the convex set to which it was closest. Next, feature sets were formed and kNNclassi er was employed. 4.2 Preliminaries The entropy is de ned as a function of the state probability [83]. Theorem: Let p(si) be the probability of a sequence si of gray levels of length q, where a sequence si of length q is de ned as a permutation of q gray levels. 51 H(q) = 1 qXi p(si) log2 p(si): (4.1) Where the summation is taken over all gray level sequences of length q. Then H(q) is a monotonic decreasing function of (q) and limq!1H(q) = H, the entropy of the image. For di erent values of q, we get various orders of entropy. In the case of an image, the global entropy or H(1) is not an adequate measure since image pixel intensities are not independent of each other. Di erent images with identical histograms will always yield the same value of entropy since the spatial distribution is not taken into account in the global entropy computation. The local entropy or H(2) [85], on the other hand, can better distinguish two images in terms of their spatial structures, because it considers the dependency of image pixel intensities. The local entropy os de ned as H(2) = 1 2Xi Xj pij log2 pij ; (4.2) where pij is the probability of cooccurrence of the gray levels i and j. The co occurrence matrix is an L L dimensional matrix (L is the number of intensity levels) [85]. It indicates the transition of pixel intensities between adjacent pixels. The original cooccurrence matrix proposed by Pal [85, 86] is asymmetric by considering the horizontally right and vertically lower transitions. tij is de ned as follows: tij = P Xl=1 Q Xk=1 ; (4.3) where = 1 if 8>>><> >>: I(l; k) = i and I(l + 1; k) = j or I(l; k) = i and I(l + 1; k + 1) = j = 0 otherwise: Therefore, the local entropy can preserve the structure details of an image. Two 52 images with identical histograms but di erent spatial distribution will result in dif ferent local entropy (also di erent threshold values). 4.3 Proposed Framework In this paper, we propose a framework to e ciently locate and extract blood vessels in ocular fundus images. The proposed algorithm is composed of four steps, matched l tering, modi ed local entropy thresholding, length ltering, and bifurcation/crossover point detection. Compare with the method in [24], our proposed algorithm does not involve human intervention. Since our algorithm can automatically estimate one op timal threshold value, it requires less computational complexity compared with the methods in [9], [8], and [6]. In addition, our method doesn't require additional infor mation, training database as it is mandatory in [6]. 4.3.1 Vascular Tree Extraction Because of nonuniform intensity of both background and foreground as well as the low contrast, multidirectional match lters are employed. We observe three interesting properties of the blood vessels in retinal images. (1) Two edges of a vessel always run parallel to each other. Such objects may be represented by piecewise linear directed segments of nite width. The gradient directions for these two edge elements are 180o apart and hence they are sometimes referred to as "antiparallels". (2) The contrast between vessels and other retinal surfaces are very low. Blood vessels appear darker relative to the background. Sample of blood vessel graylevel pro le along direction perpendicular to their length is plotted in Fig. 4.1. It was observed that the vessels never have ideal step edges. Although the intensity pro le varies by a small amount from vessel to vessel, it can be approximated by a Gaussian curve. (3) Although the width of a vessel decreases as it travels radially outward from the optic disc, such a change in vessel caliber is a gradual one. The widths of the vessels are found to lie 53 within a range 36 180 m. For our initial calculation, we assume that all the blood vessels in the images are of equal width. 0 5 10 15 125 130 135 140 145 150 155 160 Gray Level Intensity pixels Figure 4.1: The graylevel pro le of the cross section of a blood vessel. Match Filter. In [34], the graylevel pro le of the cross section of a blood vessel can be approximated by a Gaussian shaped curve. The concept of matched lter detection is used to detect piecewise linear segments of blood vessels in retinal images. Blood vessels usually have poor local contrast. The twodimensional matched lter kernel is designed to convolve with the original image in order to enhance the blood vessels. A prototype matched lter kernel is expressed as f(x; y) = exp(x2 2 2 ); for jyj L=2; (4.4) where L is the length of the segment for which the vessel is assumed to have a xed orientation. The parameter L is chosen to be equal to 9 pixels. Here the direction of the vessel is assumed to be aligned along the yaxis. Because a vessel may be oriented at any angles, the kernel needs to be rotated for all possible angles. Assuming an angular resolution of 15o, twelve di erent kernels have been constructed to span all possible orientations (Fig. 4.2). A set of twelve 16x15 pixel kernels is applied by convolving to a fundus image and at each pixel only the maximum of their responses is retained. 54 For example, given a retinal image in Fig. 4.5(a) which has low contrast between blood vessels and background , its MFR version is shown in Fig. 4.5(b), where we can see blood vessels are signi cantly enhanced. Figure 4.2: Illustration of 12 matched lter kernels along di erent directions where = 2:0. Modi ed Local Entropy Thresholding Algorithm As the second step, the MFR image is processed by a proper thresholding scheme, which can be used to distin guish between enhanced vessel segments and the background. An e cient entropy based thresholding algorithm, which takes into account the spatial distribution of gray levels, is used because an image pixel intensities are not independent of each other. We, speci cally, implement a thresholding technique which is a blend of a local entropy thresholding [85] and a relative (or cross) entropy thresholding [87]. In 55 a match ltered retinal image, enhanced blood vessels are very sparse compared with the uniform background. This leads to a highly peaky cooccurrence matrix with a low entropy that is not appropriate for local entropy thresholding. Also, the local entropy thresholding aims to maximize the local entropy of foreground and background with out considering the unbalanced proportion between them. Therefore, blood vessels extracted by local entropy thresholding are usually not complete, and some detailed structures are missed. We made two modi cations to improve the results of blood vessel extraction. First, we develop a smoothed cooccurrence matrix to increase the entropy and to reduce the peak in the cooccurrence. The cooccurrence matrix of an image show the intensity transitions between adjacent pixels. The original cooccurrence matrix is asymmetric by considering the horizontally right and vertically lower transitions. We want to add some jittering e ect to the cooccurrence matrix that tends to keep the similar spatial structure but with less variations, i.e., T = [ti;j ]N N is computed as follow (Fig. 4.3(a)). For every pixel (l; k) in an image I  i = I(l; k);  j = I(l; k + 1);  d = I(l + 1; k + 1);  tij = tid + 1; End Fig. 4.3(b) and (c) compare the original cooccurrence matrix and the modi ed one for a typical match ltered image. Two matrices still share a similar structure that is important for the valid thresholding result. Also, the latter one has more entropy with a much smaller standard deviation, which is more desirable for local entropy 56 i j d (a) (b) (c) Figure 4.3: (a) The computation of the new cooccurrence matrix. (b) The original cooccurrence matrix in a normalized log scale ( = 261:63). (c) The a modi ed cooccurrence matrix in a normalized log scale = 9:00). thresholding. One may wonder whether the modi ed cooccurrence matrix still well represent the original spatial structure. Actually, considering a smooth area where j and d are very close or identical, the computation in Fig. 4.3(a) implicitly introduces certain lowpass ltering e ect and some structured noise to the cooccurrence matrix. Second, we want to preserve the complete vascular tree structure after threshold ing by modifying the original threshold selection criterion. The threshold selected by the local entropy aims to maximize the local entropy of foreground and back ground without considering the small proportion of the foreground compared to the background. Therefore, we propose to select the optimal threshold that maximizes the local entropy of the binarized image that tends to retain an appropriate fore ground/background ratio. The larger the local entropy, the more balanced ratio be tween foreground and background. If s, 0 s L1, is a threshold, s can partition a cooccurrence matrix into four quadrants, namely A, B, C, and D (Fig. 4.4). Then we de ne the local entropy of the binary image due to the foreground and the background as H(2) b (s) = PA log2 PA PC log2 PC; (4.5) where PA and PB are the probability sums in quadrants A and C, respectively. s corresponding to the maximum of H(2) b (s) is used as the optimal threshold for blood 57 Figure 4.4: Four quadrants of a cooccurrence matrix. vessel segmentation. As shown in Fig. 4.10, the modi ed local entropy thresholding algorithm can better preserve detailed blood vessels compared with the original one. For the MFR image shown in Fig. 4.5(b), the entropybased thresholding result is shown in Fig. 4.5(c) where we can see blood vessels are clearly segmented from the background. Length Filtering. As seen in Fig. 4.5(c), there are still some misclassi ed pixels in the image. Here we want to produce a clean and complete vascular tree structure by removing misclassi ed pixels. Length ltering is used to remove isolated pixels by using the concept of connected pixels labeling described in [88]. Connected regions correspond to individual objects. We rst need to identify separate connected regions. The binary image is simply an array of '1's and '0's. The length ltering tries to isolate the individual objects by using the eightconnected neighborhood and label propagation. We assume the binary image contains pixels with value '1' on objects and '0' on background. 1. Set the current label L = 1; 2. Scan in a raster order from the top left to the bottom right; 3. If encountering a pixel '1', check its neighbors in the upperleft half of its 8 connected neighborhood(W, NW, N, NE). 58 (a) (b) (c) (d) Figure 4.5: (a) An original retinal image. (b) Matched ltering result. (c) Local entropy thresholding result. (d) Vascular tree. If one of them have L > 1, set the current pixel's label to that value; Else If there is more than one label represented in the pixel's halfneighborhood, then these labels should be noted as "equivalent"; Else set the current pixel's label to the current label; increment the current label. 4. Go to step 2; 5. Relabel all \equivalent" labelled pixels to the same value. 59 Once the algorithm is completed, only the resulting classes exceed a certain num ber of pixels, e.g., 250, are labeled as blood vessels. Classes, that are not labeled as blood vessels, are eliminated. Fig. 4.5(d) shows the results after length ltering based on Fig. 4.5(c), where a clean vascular tree is presented. (a) (b) Figure 4.6: (a) Onepixel wide vascular tree. (b) Onepixel wide vascular tree with intersections and crossovers overlaying on grayscaled image. 4.3.2 Bifurcation/crossover Detection Vascular intersections and crossovers are the most appropriate representative features because (1) vascular tree spans the whole retina hence it exists in every retinal images; (2) bifurcation/crossover points o er more distinguishable information than other homogeneous areas throughout the retina. If a vascular tree is onepixel wide, the branching points can be detected and characterized e ciently from the vascular tree. Morphological thinning is applied to the vascular tree in order to get onepixelwide vascular tree as shown in Fig. 4.6(a). In order to save computational time, a 3 3 neighborhood window is used to probe and nd the branching points. If the number of vascular tree in the window is great than 3, it is a branch point. Then a 11 11 neighborhood is applied through a detected branching points in order to eliminate 60 the small intersections. We consider only the boundary pixels of a 11 11 square. If the number of vascular tree on the boundary is greater than 2, we mark it as an intersection/crossover. Fig. 4.6(b) presents the vascular tree with the intersections and crossovers. 4.4 Experimental Analysis 4.4.1 Thresholding Algorithm (a) (b) (c) (d) Figure 4.7: (a) An original image. (b) Our thresholding result (threshold = 101). (c) Local entropy thresholding result (threshold = 75). (d) Cross entropy result (threshold = 112). A cooccurrence matrix is a representation of spatial relationship in an image. 61 (a) (b) (c) (d) Figure 4.8: (a) An original image. (b) Our thresholding result (threshold = 136). (c) Local entropy thresholding result (threshold = 98). (d) Cross entropy result (threshold = 253). Each element in a cooccurrence matrix gives an idea about the transition of intensities between adjacent pixels. Our proposed thresholding algorithm works well on any MFR images because a MFR image have a closer range of intensity changes compared with a normal grayscale image. Therefore, in MFR images, an original cooccurrence matrix's de nition produces very distinct peaks in such a narrow which can corrupt the power of a statistical method. On the other hand, our method spreading out the weight and reduce the range of standard deviation which results in a better threshold selection. We also compare our proposed algorithm with the other two entropybased thresholding methods, namely local entropy thresholding [85, 86] and 62 relative (or cross) entropy thresholding [87], on di erent types of images including retinal images. Fig. 4.7 and Fig.4.8 illustrate examples of simulation results on normal grayscale images of our proposed algorithm versus the other two methods. Algorithm's performance depend on images. Table 4.1 demonstrates numerical results of the three approaches on MFR retinal image. Twenty retinal images provided by Hoover [89] are used to test the three entropybased thresholding algorithms. Plots of three entropybased thresholding algorithms are illustrated in Fig. 4.9 in which our algorithm always provides a distinct peak regardless of the image type. Examples of simulation results are shown in Fig. 4.10 for a normal retinal image and Fig. 4.11 for a retinal image with lesions. While the performance of cross entropy and our algorithm's performance are comparable in a normal image, our algorithm are more robust to lesions. In the MFR image, it is quite obvious that our algorithm's performance is better than the other two entropybased methods. 4.4.2 Blood Vessel Extraction Evaluation We use a set of twenty retinal images provided by Hoover [9] because both fundus and groundtruth images are available online [89] and other stateoftheart algorithms [9, 8, 6] use Hoover database to evaluate their algorithms' performance. Therefore, not only we can evaluate and analyze the performance of the proposed framework but we also can impartially compare performance of the proposed algorithm with others. Performance of blood vessel extraction is evaluated by using the true positive and the false positive rates as in [9]. Any pixel which was handlabeled as vessel and the algorithm labeled as vessel is counted as true positive. Any pixel which was handlabeled as nonvessel and the algorithm labeled as vessel is counted as false positive. The true positive rate is calculated by normalizing true positive by the total pixel number of the handlabeled vessel. The false positive rate is calculated by normalizing false positive by the total pixel number of the handlabeled nonvessel. 63 0 50 100 150 200 250 300 4 5 6 7 8 9 10 11 0 50 100 150 200 250 300 −16 −15.8 −15.6 −15.4 −15.2 −15 −14.8 −14.6 −14.4 −14.2 −14 0 50 100 150 200 250 300 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 (a) (b) (c) 0 50 100 150 200 250 300 3 4 5 6 7 8 9 0 50 100 150 200 250 300 −16 −15.5 −15 −14.5 0 50 100 150 200 250 300 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 (d) (e) (f) Figure 4.9: First row: entropy plots of an image shown in Fig. 4.10. Second row: entropy plots of an image shown in Fig. 4.11. (a),(d) Local entropy. (b),(e) Relative entropy. (c),(f) Modi ed local entropy. Speci cally, we classify retinal images into three categories, normal retinal images, abnormal retinal images with some lesions, and retinal images with obscure blood vessel appearance. Fig. 4.13 shows an example of simulation result from a normal retinal images. Fig. 4.14 presents an example simulation result from an obscure blood vessel appearance image. An example of simulation result from an abnormal retinal image with some lesions is shown in Fig. 4.15. In order to evaluate the performance of our algorithm, we compare our simulation results with stateoftheart results obtained from Hoover et.al. [9], Jiang et.al. [8], Staal et.al. [6, 7], and handlabeled ground truth segmentations. Numerical perfor mance of the proposed framework and other methods are demonstrated in Fig. 4.12. The best quantitative performance belongs to, as predicted, a normal image category. The numerical performances of a group of obscure bloodvessel appearance and lesion 64 (a) (b) (c) (d) (e) (f) Figure 4.10: (a) An original retinal image. (b) A matched ltering result. (c) A groundtruth segmentation. (d) The local entropy thresholding result. (e) The rela tive entropy thresholding result. (f) The modi ed local entropy threshold. images are comparable. In lesion image category, although the proposed algorithm misdetect lesions as blood vessel, the false positive rate of a lesion image group is lower and/or comparable to other approaches. This is due to the fact that lesions only account for small portions in an image. Although segmentation performance is decreased with presence of lesions, the additional detected lesions are actually good for 2D registration and 3D surface reconstruction because those lesions are used as additional candidate features. Overall performance of our algorithm is comparable to other computational expensive algorithms. While the other three methods require multiple threshold values and various decisionmaking criteria, we believe one single threshold yield better optimal segmentation results. If an image is partitioned into 65 (a) (b) (c) (d) (e) (f) Figure 4.11: (a) An original retinal image. (b) A matched ltering result. (c) A groundtruth segmentation. (d) The local entropy thresholding result. (e) The rela tive entropy results. (f) The modi ed local entropy threshold. multiple small regions and one particular small region is given, even by human eyes it is di cult to distinguish between foreground (blood vessel) and background in that small region. This remark may not be so apparent in Hoover database, but in high resolution images, e.g. ETDRS database, this observation is quite obvious since there are background patterns which highly resemblance blood vessel structures all over the retinal images. 4.5 Conclusions We have introduced an e cient and robust algorithm for blood vessel detection in ocular fundus images with a modi ed cooccurrence matrix used for local entropy 66 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive Rate True Positive Rate Our algorithm Staal algorithm Jiang algorithm Hoover algorithm Figure 4.12: Performance of our approach versus other methods, Staal's algorithm [6, 7], Jiang's algorithm [8], and Hoover's algorithm [9]. thresholding. The proposed method retains the computational simplicity, and at the same time, can achieve good simulation results. Additionally, our false positive rates are lower than other computational expensive techniques while the true positive rates are comparable. 67 Table 4.1: Threshold values of three di erent approaches for twenty retinal images Retina Our Approach Cross Entropy Thresholding Local Entropy Thresholding 0001 121 106 184 0002 82 98 123 0003 121 101 150 0004 69 76 139 0005 90 107 128 0044 63 74 109 0077 82 92 114 0081 86 96 138 0082 89 93 122 0139 110 121 161 0162 64 68 86 0163 89 95 111 0235 89 98 117 0236 89 97 123 0239 97 106 134 0240 87 98 132 0255 81 86 125 0291 113 119 145 0319 83 92 116 0324 93 105 122 68 (a) (b) (c) (d) (e) (f) Figure 4.13: (a),(b) Examples of normal retinal images. (c),(d) Handlabeled ground truth images. (e),(f) Our segmentation results. 69 (a) (b) (c) (d) (e) (f) Figure 4.14: (a),(b) Examples of obscure bloodvessel retinal images. (c),(d) Hand labeled groundtruth images. (e),(f) Our segmentation results. 70 (a) (b) (c) (d) (e) (f) Figure 4.15: (a),(b) Examples of retinal images with lesions. (c),(d) Handlabeled groundtruth images. (e),(f) Our segmentation results. 71 CHAPTER 5 2D Retinal Image Registration 5.1 Introduction Registration is a problem on how to coincide two or more images. Two images are often taken at di erent times, viewpoints, modes, or resolutions. Additionally, an im age plane, and an world plane are often not parallel. Hence, it is impossible to simply overlay two images together. To register two images, an \optimal" transformation model has to be identi ed. Numerous algorithms have been proposed regarding this topic. These methods di ers in many aspects: (1) featurebased methods versus areabased methods; (2) batch methods versus RANSAClike methods; (3) lowlevel methods, e.g. optical ow, autocorrelation, versus shapebased methods, e.g. tem plate matching; (4) spatial domain versus frequency domain. In the case of medical imaging, disease diagnosis and treatment planning are often supported by multiple images acquired from the same patient. Image registration techniques, hence, are needed in order to integrate the information gained from several images to obtain a comprehensive understanding. The organization of this chapter is as follows. The importance of 2D retinal registration in medical applications are given in this section. Currently available di erent methods for retinal registration have been reviewed in section 5.1.1. Our methodology is given in Section 5.1.2. Section 5.2 provides information concerning technical background. Then, Section 5.3 describes the implementation of the proposed algorithm. Simulation results and the performance of our algorithm are presented in section 5.4. 72 5.1.1 Literature Reviews Image registration is a fundamental problem to several image processing and com puter vision applications [52, 53]. A broad range of image registration methods have been proposed for di erent medical imaging applications including retinal image reg istration. Various criteria, e.g., modalities, dimensionalities, elasticity of the trans formation, have been proposed to categorize registration methods [52, 53, 55, 54]. Typically, retinal image registration techniques are classi ed as featurebased and areabased methods. Areabased techniques are generally based on pixel intensities and certain opti mized objective functions, such as least mean square error, crosscorrelation, phase correlation, or mutual information, [90, 2, 1, 91, 92, 93, 94]. In the case of retinal image registration, areabased approaches are often used in multimodal or temporal image registration applications. In [1], mutual information was used as a similarity measure and simulated annealing was employed as a searching technique. In [2], the measure of match (MOM) was proposed as an objective function and the genetic algorithm was chosen to be the optimization technique. Nevertheless, the searching space of transformation models (a ne, bilinear, and projective) is huge. The greater the geometric distortion between the image pair, the more complicated the searching space. Typically, there are two major factors that may degrade the performance of areabased methods: nonconsistent/nonuniform contrast within an image and large homogeneous/textureless areas. Featurebased methods are somewhat similar to manual registration [38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51]. The approach assumes that point corre spondences are available in both images, and the registration process is performed by maximizing a similarity measure computed from the correspondences. In [40], the bifurcation points of a vascular tree, also called landmark points, were labeled with surrounding vessel orientations. An anglebased invariant was then computed to give 73 a probability for every two matching points. After that, the Bayesian Hough trans form was used to sort the transformations according to their respective likelihoods. In [38], the similarity matrix for all possible correspondences was computed based on the orientations of vascular centerlines and the similarity measure was converted to a prior probability. The transformation was estimated in a hierarchical way, from the zerothorder model to the rstorder model and nally to the secondorder model. Nonetheless, su cient feature points have to be available. In [43], the dualbootstrap iterative closest point (dualbootstrap ICP) algorithm was introduced. The approach started from one or more initial, loworder estimates that were only accurate in small image regions called bootstrap regions. In each bootstrap region, the method iter atively re ned the transformation estimation, expanded the bootstrap region, and tested to see if a higherorder model can be used. The method required accurate ini tialization of at least one point correspondence. High success rates were reported in [43]. The performance of featurebased methods largely depends on su cient and/or reliable correspondences, especially, when the overlapping part of an image pair is very limited or when there are mismatched correspondences. 5.1.2 Methodology In this paper, we study retinal image registration in the context of the National Institutes of Health (NIH), Early Treatment Diabetic Retinopathy Study (ETDRS) standard protocol [11]. The ETDRS protocol de nes seven 30 elds of each retina with speci c eld coverage. A robust ETDRS image registration algorithm is re quired to (1) assess image quality in terms of ETDRS eld coverage, and to (2) sup port ETDRSbased disease staging. Three major challenges are present. First, small overlaps between adjacent elds lead to inadequate landmark points (crossovers and bifurcations) for featurebased methods. Second, the contrast and intensity distribu tions within an image are not spatially uniform or consistent. This can deteriorate 74 the performance of areabased techniques. Third, highresolution ETDRS images contain large homogeneous nonvascular/textureless regions which result in di culties for both featurebased and areabased techniques. Because areabased and featurebased have their own strengths and limitations, in this work, we combine both areabased and featurebased registration methods to get the advantages each method has o ered along with other decisionmaking crite ria in order to obtain the best optimal solution. In order to achieve robustness and e ciency, hierarchical technique, translation, a ne, and quadratic, is incorporated. Binary mutual information is proposed for translation estimation. It demonstrates better performance in terms of robustness compared with traditional grayscale mu tual information. In addition, multiscale searching strategy is applied to avoid large combinatorial searching space. Furthermore, two parameters characterizing the dis placements along vertical and horizontal directions in translation model, suggesting relative positions of each eld, can be used for IQA regarding eld coverage de nition. There are three major steps in the proposed algorithm. First, binary vascu lar trees are extracted from retinal images using a modi ed cooccurrence matrix for local entropybased thresholding method [95, 96]. Next, zerothorder transla tion is estimated by maximizing mutual information based on the binary image pair (areabased). Speci cally, a local entropybased peak selection scheme and a multi resolution searching strategy are developed to improve the accuracy and e ciency of translation estimation. Third, we use two types of features, landmark points and sampling points, for higherorder transformation estimation. Sampling points, which are acquired by imposing a grid onto the thinned vascular tree, are only introduced when landmark points do not meet certain criteria. Simulation results on 504 pairs of ETDRS retinal images show the e ectiveness and robustness of the proposed reg istration algorithm. 75 5.2 Preliminaries 5.2.1 NIH ETDRS Protocol Figure 5.1: ETDRS sevenstandard elds (right/left eyes) (http://eyephoto.ophth.wisc.edu/Photographers.html) The importance of the ETDRS protocols and challenges in their implementation call for the automated software tool for image quality assessment (IQA). The retinal images need to meet the image quality criteria de ned by ETDRS protocol. Each set of fundus photographs should be assessed for quality before the photographs are sent to the Coordinating center. A photographer is required to decide whether a particular image set meets the three ETDRS requirements: (1) clarity & focus; (2) eld de nition; and (3) stereo e ect. In this work, we focus on eld de nition. The evaluation of eld de nition involves (1) horizontal and vertical displacements between image elds; and (2) veri cation of relative positions of key features, i.e. optic disc and fovea, in an image. The images are captured sequentially from eld 1 and the required coverage de nes acceptable overlapping regions. The ETDRS imaging standard speci es seven stereoscopic 30 elds of each eye, is de ned in Table 5.1 and 76 illustrated in Fig. 5.1. Field 1 is centered on the optic disc. Field 2 is centered on the center of macula. Overlapping parts of elds 1 and 2 ( elds 1/2) as well as elds 2/3 are roughly 50% of the image size. For other elds, the overlapping parts are typically less than 25%. It is worth mentioning that the displacements are not always consistent and depend on patient cooperation and photographer skills. 5.2.2 Image Quality Assessment (IQA): Field De nition The ETDRS protocol speci es seven stereoscopic 30 elds of each eye, as de ned in Table 5.1 and Fig. 5.1. The overlap of eld pairs 1 and 2 (or elds 1/2) 1 as well as that of elds 2/3 are roughly 50% of the image size. For other eld pairs, the overlaps are typically less than 25%. It is worth mentioning that the eld displacements are not always consistent and depend on patient cooperation and photographer's skills. The importance of the ETDRS protocol and the challenges in its practical imple mentation call for automated software tools for image quality assessment (IQA) that checks the relative positions, i.e., horizontal/vertical displacements, of every image pair according to Table 5.1. By comparing the o set, i.e., To, which is the di er ence between the desired vertical/horizontal displacements and actual ones, with the diameter of optic disc (DD), an image pair is categorized as good (To < 1=2DD), fair (1=2DD To 1DD), or poor (To > 1DD). Therefore, the IQA of ETDRS eld de nition boils down to a problem of image registration followed by displace ment veri cation. We will brie y review some technical background of retinal image registration in the following. 1The notation of \ elds 1/2" indicates that eld 1 is the xed image (the model image) and eld 2 is the image being mapped to (the distorted image) 77 5.2.3 Global and Local Entropy The global entropy, or the entropy, of a nstate system is de ned as a function of the state probability [83], H = n Xi=1 pi log2 pi; (5.1) where pi is the probability of state i. In the case of an image, the entropy is not an adequate measure since image pixel intensities are not independent of each other. Di erent images with identical histograms will always yield the same value of entropy since the spatial distribution is not taken into account in the global entropy computa tion. The local entropy [85], on the other hand, can better distinguish two images in terms of their spatial structures, because it considers the dependency of image pixel intensities. The local entropy is de ned as H(2) = Xi Xj pij log2 pij ; (5.2) where pij is the probability of cooccurrence of the intensity levels, i and j. The cooccurrence matrix is an L L dimensional matrix (L is the number of intensity levels) [85]. It indicates the transition of pixel intensities between adjacent pixels. The local entropy, therefore, can indicate the spatial structure/patt
Click tabs to swap between content that is broken into logical sections.
Rating  
Title  Advanced Retinal Imaging: Feature Extraction, Twodimensional Registration, and Threedimensional Reconstruction 
Date  20061201 
Author  Chanwimaluang, Thitiporn 
Department  Electrical Engineering 
Document Type  
Full Text Type  Open Access 
Abstract  In this dissertation, we have studied feature extraction and multiple view geometry in the context of retinal imaging. Specifically, this research involves three components, i.e., feature extraction, 2D registration, and 3D reconstruction. First, the problem of feature extraction is investigated. Features are significantly important in motion estimation techniques because they are the input to the algorithms. We have proposed a feature extraction algorithm for retinal images. Bifurcations/crossovers are used as features. A modified local entropy thresholding algorithm based on a new definition of cooccurrence matrix is proposed. Then, we consider 2D retinal image registration which is the problem of the transformation of 2D/2D. Both linear and nonlinear models are incorporated to account for motions and distortions. A hybrid registration method has been introduced in order to take advantages of both featurebased and areabased methods have offered along with relevant decisionmaking criteria. Areabased binary mutual information is proposed or translation estimation. A featurebased hierarchical registration technique, which involves the affine and quadratic transformations, is developed. After that, a 3D retinal surface reconstruction issue has been addressed. To generate a 3D scene from 2D images, a camera projection or transformations of 3D/2D techniques have been investigated. We choose an affine camera to characterize for 3D retinal reconstruction. We introduce a constrained optimization procedure which incorporates a geometrically penalty function and lens distortion into the cost function. The procedure optimizes all of the parameters, camera's parameters, 3D points, the physical shape of human retina, and lens distortion, simultaneously. Then, a pointbased spherical fitting method is introduced. The proposed retinal imaging techniques will pave the path to a comprehensive visual 3D retinal model for many medical applications. 
Note  Dissertation 
Rights  © Oklahoma Agricultural and Mechanical Board of Regents 
Transcript  ADVANCED RETINAL IMAGING: FEATURE EXTRACTION, 2D REGISTRATION, AND 3D RECONSTRUCTION By THITIPORN CHANWIMALUANG Bachelor of Engineering Chulalongkorn University Bangkok, Thailand 1995 Master of Science Pennsylvania State University State College, PA 2001 Submitted to the Faculty of the Graduate College of Oklahoma State University in partial ful llment of the requirements for the Degree of DOCTOR OF PHILOSOPHY December, 2006 COPYRIGHT c By THITIPORN CHANWIMALUANG December, 2006 ADVANCED RETINAL IMAGING: FEATURE EXTRACTION, 2D REGISTRATION, AND 3D RECONSTRUCTION Dissertation Approved: Guoliang Fan, Ph.D. Dissertation Advisor Gary Yen, Ph.D. Daqing Piao, Ph.D. Sundar Madihally, Ph.D. Gordon Emslie, Ph.D. Dean of the Graduate College iii ACKNOWLEDGMENTS First, I would like to express my sincere thanks to my advisor, Professor Guoliang Fan, for his guidance and support throughout my Ph.D. study. He has taught me not only the technical knowledge, but also attitudes towards research. I am also gratitude to Professor Gary Yen who is my committee chair and a coPI on this project. I would like to thank Professor Daqing Piao and Professor Sundar Mahidally for serving on my dissertation committee. My thanks also go to Professor Stephen R. Fransen, who is an ophthalmologist at the University of Oklahoma Health Science Center, for providing ETDRS retinal images and his medical advices. I also want to thank the current and previous members of my research group, visual computing and image processing laboratory (VCIPL), for their helps and friendship. Most important of all, I want to express my deepest gratitude towards my family: my parents, my sisters, and my grandmother, who always have faith in me. I cannot thank them enough for their unconditional love, supports, patience, and understand ing. To my grandmother who is 81 years old, always concerns about my health and wishes to see me coming home with my success before her death: grandma, I will come home soon and you are part of my success. To my parents who have taught me honesty, perseverance, and merit: you are my good examples and part of my accom plishment. To my sisters who are willing to listen to every of my silly complaints: you are my encouragement. This work was supported by an OHRS award for project number HR0333 from the Oklahoma Center for the Advancement of Science and Technology (OCAST). iv TABLE OF CONTENTS Chapter Page 1 Introduction 1 1.1 Motivations and Objectives . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Signi cance and Background . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 NIH's ETDRS Protocols . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Technical Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.5 Research Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.6 Original Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.6.1 Feature Extraction and Correspondence Selection . . . . . . . 9 1.6.2 2D Retinal Image Registration . . . . . . . . . . . . . . . . . 10 1.6.3 3D Retinal Surface Reconstruction . . . . . . . . . . . . . . . 12 1.7 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2 Literature Reviews 16 2.1 Overview of Retinal Image Analysis Research . . . . . . . . . . . . . 16 2.2 Structure Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1 Anatomical Structure Segmentation . . . . . . . . . . . . . . . 17 2.2.2 Pathological Structure Segmentation . . . . . . . . . . . . . . 22 2.3 2D Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.1 ViewBased Registration . . . . . . . . . . . . . . . . . . . . . 23 2.3.2 TemporalBased Registration . . . . . . . . . . . . . . . . . . 24 2.3.3 ModalBased Registration . . . . . . . . . . . . . . . . . . . . 25 2.4 3D Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 v 2.4.1 3D Surface Reconstruction . . . . . . . . . . . . . . . . . . . 27 2.4.2 Local Depth Reconstruction . . . . . . . . . . . . . . . . . . . 27 2.5 Image Quality Assessment (IQA) . . . . . . . . . . . . . . . . . . . . 28 2.6 Image Classi cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3 Motion Parameters And Motion Estimation 33 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Transformations of 2D/2D . . . . . . . . . . . . . . . . . . . . . . . 34 3.3 Transformations of 3D/3D . . . . . . . . . . . . . . . . . . . . . . . 35 3.4 Transformations of 3D/2D . . . . . . . . . . . . . . . . . . . . . . . 36 3.4.1 Image Formation . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4.2 Camera Models . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5 A ne Structure From Motion . . . . . . . . . . . . . . . . . . . . . . 40 3.5.1 Two View Geometry . . . . . . . . . . . . . . . . . . . . . . . 41 3.5.2 Multiple View Geometry . . . . . . . . . . . . . . . . . . . . . 43 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4 Feature Extraction 49 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.1.1 Entropy Thresholding . . . . . . . . . . . . . . . . . . . . . . 50 4.1.2 Blood Vessel Extraction . . . . . . . . . . . . . . . . . . . . . 51 4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3 Proposed Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3.1 Vascular Tree Extraction . . . . . . . . . . . . . . . . . . . . . 53 4.3.2 Bifurcation/crossover Detection . . . . . . . . . . . . . . . . . 60 4.4 Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.4.1 Thresholding Algorithm . . . . . . . . . . . . . . . . . . . . . 61 4.4.2 Blood Vessel Extraction Evaluation . . . . . . . . . . . . . . . 63 vi 4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5 2D Retinal Image Registration 72 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.1.1 Literature Reviews . . . . . . . . . . . . . . . . . . . . . . . . 73 5.1.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.2.1 NIH ETDRS Protocol . . . . . . . . . . . . . . . . . . . . . . 76 5.2.2 Image Quality Assessment (IQA): Field De nition . . . . . . . 77 5.2.3 Global and Local Entropy . . . . . . . . . . . . . . . . . . . . 78 5.2.4 AreaBased Retinal Image Registration . . . . . . . . . . . . . 78 5.2.5 FeatureBased Retinal Image Registration . . . . . . . . . . . 81 5.3 Proposed Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3.1 Vascular Tree Extraction . . . . . . . . . . . . . . . . . . . . . 82 5.3.2 Translation Estimation . . . . . . . . . . . . . . . . . . . . . . 83 5.3.3 A ne/Quadratic Transformations . . . . . . . . . . . . . . . . 88 5.4 Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.4.1 Vascular Tree Extraction . . . . . . . . . . . . . . . . . . . . . 91 5.4.2 IQA Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.4.3 Registration Performance Evaluation . . . . . . . . . . . . . . 94 5.4.4 Discussions of the Proposed Algorithm . . . . . . . . . . . . . 97 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6 3D Retinal Surface Reconstruction 107 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.2 A ne Camera for 3D Retinal Surface Reconstruction . . . . . . . . . 109 6.3 Correspondence Selection . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.4 Proposed Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 vii 6.4.1 Lens Distortion Removal . . . . . . . . . . . . . . . . . . . . . 113 6.4.2 Initial Retinal A ne Surface . . . . . . . . . . . . . . . . . . . 116 6.4.3 A ne Bundle Adjustment . . . . . . . . . . . . . . . . . . . . 117 6.4.4 Euclidean Reconstruction of Retinal Surface . . . . . . . . . . 120 6.4.5 PointBased Surface Approximation . . . . . . . . . . . . . . . 121 6.5 Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.5.1 Surface Approximation on Synthesized Data . . . . . . . . . . 125 6.5.2 Surface Approximation on Retinal Images . . . . . . . . . . . 127 6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 7 Constrained Optimization for 3D Surface Reconstruction 136 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 7.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 7.3 Constrained Optimization for 3D Reconstruction . . . . . . . . . . . 139 7.3.1 Constrained A ne Bundle Adjustment (CABA) . . . . . . . . 139 7.3.2 A ne Bundle Adjustment with Lens Distortion Update . . . . 140 7.3.3 Constrained A ne Bundle Adjustment with Lens Distortion Update (CABALDU) . . . . . . . . . . . . . . . . . . . . . . 141 7.4 Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.4.1 Surface Approximation on Synthesized Data . . . . . . . . . . 143 7.4.2 Surface Approximation on Retinal Images . . . . . . . . . . . 146 7.4.3 More Discussions . . . . . . . . . . . . . . . . . . . . . . . . . 148 7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 8 Conclusions and Future Works 154 BIBLIOGRAPHY 157 viii LIST OF TABLES Table Page 3.1 2D Transformation Models . . . . . . . . . . . . . . . . . . . . . . . 47 3.2 3D Transformation Models . . . . . . . . . . . . . . . . . . . . . . . 48 4.1 Threshold values of three di erent approaches for twenty retinal images 68 5.1 Field Coverage Speci cation . . . . . . . . . . . . . . . . . . . . . . . 101 5.2 The Transformation Models For 2D Retinal Registration . . . . . . . 102 5.3 The energy concentration of binary imagebased ECC vs. logical op eration XOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.4 Ideal vertical/horizontal displacements . . . . . . . . . . . . . . . . . 103 5.5 The registration error and overlaps between elds. . . . . . . . . . . . 103 ix LIST OF FIGURES Figure Page 1.1 Human retina. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Diabetic retinopathy. (a) Normal vision. (b) Same scene viewed by a person with diabetic retinopathy. . . . . . . . . . . . . . . . . . . . . 4 1.3 Di erent stages of diabetic retinopathy. (a) An example of normal reti nal image. (b) A retinal image with microaneurysms. (c) Proliferated diabetic retinopathy with fragile newly grown blood vessels. (d) A retina image with some types of scar tissues. (http://www.inoveon.com) 5 1.4 Architecture of the proposed research. . . . . . . . . . . . . . . . . . 7 1.5 Outline of the dissertation. . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1 Anatomical structures in a human retina. . . . . . . . . . . . . . . . . 18 2.2 Viewbased registration. See more detail discussion in Chapter 5. . . 24 2.3 Temporalbased registration [1]. . . . . . . . . . . . . . . . . . . . . . 25 2.4 Modalbased registration [2]. . . . . . . . . . . . . . . . . . . . . . . . 26 2.5 3D retinal surface reconstruction [3]. . . . . . . . . . . . . . . . . . . 28 2.6 3D retinal surface reconstruction [4]. . . . . . . . . . . . . . . . . . . 29 2.7 Optic nerve's depth reconstruction [5]. . . . . . . . . . . . . . . . . . 30 2.8 Overview of retinal image analysis research. The grayshaded boxes refer to the topics involved in this work. . . . . . . . . . . . . . . . . 32 3.1 Results from 2D coordinate transformation. . . . . . . . . . . . . . . 34 3.2 Results from 3D coordinate transformation. . . . . . . . . . . . . . . 36 x 3.3 Pinhole camera geometry. Camera coordinate system is aligned with world coordinate system. . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4 Onedimensional image formation. Orthographic is projected along zaxis. Perspective is projected along principal ray direction. Weak perspective is a combination between perspective and orthographic. A point, rst, is projected along zaxis to a plane Z = d. Then, perspective projection from the plane. . . . . . . . . . . . . . . . . . . 41 3.5 The concatenated image (CI) space. . . . . . . . . . . . . . . . . . . . 45 4.1 The graylevel pro le of the cross section of a blood vessel. . . . . . . 54 4.2 Illustration of 12 matched lter kernels along di erent directions where = 2:0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3 (a) The computation of the new cooccurrence matrix. (b) The original cooccurrence matrix in a normalized log scale ( = 261:63). (c) The a modi ed cooccurrence matrix in a normalized log scale = 9:00). 57 4.4 Four quadrants of a cooccurrence matrix. . . . . . . . . . . . . . . . 58 4.5 (a) An original retinal image. (b) Matched ltering result. (c) Local entropy thresholding result. (d) Vascular tree. . . . . . . . . . . . . . 59 4.6 (a) Onepixel wide vascular tree. (b) Onepixel wide vascular tree with intersections and crossovers overlaying on grayscaled image. . . . . . 60 4.7 (a) An original image. (b) Our thresholding result (threshold = 101). (c) Local entropy thresholding result (threshold = 75). (d) Cross en tropy result (threshold = 112). . . . . . . . . . . . . . . . . . . . . . . 61 4.8 (a) An original image. (b) Our thresholding result (threshold = 136). (c) Local entropy thresholding result (threshold = 98). (d) Cross en tropy result (threshold = 253). . . . . . . . . . . . . . . . . . . . . . . 62 xi 4.9 First row: entropy plots of an image shown in Fig. 4.10. Second row: entropy plots of an image shown in Fig. 4.11. (a),(d) Local entropy. (b),(e) Relative entropy. (c),(f) Modi ed local entropy. . . . . . . . . 64 4.10 (a) An original retinal image. (b) A matched ltering result. (c) A groundtruth segmentation. (d) The local entropy thresholding result. (e) The relative entropy thresholding result. (f) The modi ed local entropy threshold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.11 (a) An original retinal image. (b) A matched ltering result. (c) A groundtruth segmentation. (d) The local entropy thresholding result. (e) The relative entropy results. (f) The modi ed local entropy threshold. 66 4.12 Performance of our approach versus other methods, Staal's algorithm [6, 7], Jiang's algorithm [8], and Hoover's algorithm [9]. . . . . . . . 67 4.13 (a),(b) Examples of normal retinal images. (c),(d) Handlabeled ground truth images. (e),(f) Our segmentation results. . . . . . . . . . . . . . 69 4.14 (a),(b) Examples of obscure bloodvessel retinal images. (c),(d) Hand labeled groundtruth images. (e),(f) Our segmentation results. . . . . 70 4.15 (a),(b) Examples of retinal images with lesions. (c),(d) Handlabeled groundtruth images. (e),(f) Our segmentation results. . . . . . . . . 71 5.1 ETDRS sevenstandard elds (right/left eyes) (http://eyephoto.ophth.wisc.edu/Photograp5.2 The owchart of the proposed algorithm (LPCs: landmark point cor respondences and SPCs: sampling point correspondences.) . . . . . . 79 5.3 Vascular tree and the centerline extraction of elds 2 and elds 4 of Julie's retinal images. (a) Match ltered image: eld 2; (b) Match ltered image: eld 4; (c) Binary image: eld 2; (d) Binary image: eld 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 xii 5.4 Vascular tree and the centerline extraction of elds 2 and elds 4 of Julie's retinal images. (a) Thinned binary image: eld 2; (b) Thinned binary image: eld 4; (c) Crossover/bifurcation points: eld 2; (d) Crossover/bifurcation points: eld 4. . . . . . . . . . . . . . . . . . . 81 5.5 Sample plots of binary imagebased ECC vs. logical operation, XOR, at every possible translation in the coarsest scale (downsampled by 16). (a) The binary imagebased ECC of Julie's elds 1/2; (b) The binary imagebased ECC of Julie's elds 2/3 (c) The logical operation XOR of Julie's elds 1/2; (d) The logical operation XOR of Julie's elds 2/3. 84 5.6 The overlaps of two possible translation models (left column and right column) that produce similar values of ECC peaks for elds 1/7. . . 86 5.7 An example of the sampling process. (Left) A grid is placed on the thinned binary vascular tree; (b) The sampling points. . . . . . . . . 90 5.8 Handlabeled groundtruths (top) and the extracted vascular trees using the proposed algorithm (bottom) of eld 1 (a) and eld 2 (b). . . . . 92 5.9 Vertical/horizontal displacements of six ETDRS image pairs. . . . . . 93 5.10 The failure rates of translation estimation for six ETDRS eld pairs. . 95 5.11 Error comparisons for consecutive transformation models. . . . . . . . 95 5.12 The failure rates of 468 image pairs which proceed from the translation model to a ne and quadratic models based on LPCs only. . . . . . . 96 5.13 The failure rates (after manual validation) of a ne/quadratic models (using both LPCs and SPCs) and the proposed algorithm for six eld pairs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.14 Two examples of using the quadratic transformation on elds 2/5 and elds 1/6 with insu cient LPCs, where the registration errors are 0.4 and 1.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 xiii 5.15 A translation modelbased registration result where the vertical/horizontal displacements in elds 1/7 are depicted. The registration errors are as follows: elds 1/2 = 15:81, elds 1/6 = 22:06, elds 1/7 = 13:83, elds 2/3 = 12:60, elds 2/4 = 21:56, and elds 2/5 = 18:97. The median error is 17:39. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.16 The nal registration result of an ETDRS image set. The quadratic transformation is applied to elds 2/3. The a ne model is applied to elds 1/2, 1/6, 1/7, 2/4, and 2/5. The registration errors are given as as follows: elds 1/2: 2:00, elds 1/6: 0:52, elds 1/7: 1:23, elds 2/3: 1:66, elds 2/4: 2:78, and elds 2/5: 1:49. The median error is 1:58. The sampling points (SPCs) are involved in elds 1/6, 2/3 and elds 2/4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.17 (a) An example of a poor quality retinal image pair with no LPC in the overlap. (b) An example of a retinal image pair with pathology. 106 6.1 The structure from motion (SFM) problem. . . . . . . . . . . . . . . 109 6.2 The owchart of the proposed algorithm. . . . . . . . . . . . . . . . . 112 6.3 Retinal images are obtained from a fundus camera which composes of an actual camera and a digital camera attached to a fundus camera. . 113 6.4 The sparse Jacobian matrix for a ne bundle adjustment comprises 3 cameras and 4 points. . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.5 The sparse Hessian matrix comprises 3 cameras and 4 points. . . . . . 119 6.6 3D point clouds: (a) Top view. (b) Side view. . . . . . . . . . . . . . 122 6.7 The set up of four synthetic cameras. Point cloud is constructed on a spherical surface with spreading angle of 90o. . . . . . . . . . . . . . . 123 6.8 Four images generated by the four cameras shown in Fig. 6.7. . . . . 124 6.9 The errors of spreading angles (%) versus noise variances. The overall spreading angle of an original synthetic partial sphere is 90o. . . . . . 126 xiv 6.10 The errors between reconstructed points and surface approximation versus noise variances. . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.11 3D surface reconstruction on synthetic data. (a) A partial synthetic spherical shape. (b) A ne reconstruction of a partial synthetic spheri cal shape. (c) Euclidean reconstruction of a partial synthetic spherical shape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.12 The errors versus 2D spatial locations of the initial point selection. (a) Without a ne bundle adjustment. (b) A ne bundle adjustment is involved. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6.13 (a) An example of a grid pattern. (b) A lens distortionfree grid pattern.131 6.14 (a) An example of a retinal image. (b) A lens distortionfree image. . 132 6.15 Two sets of retinal images with marked point correspondences. First and second rows show stereo pairs of eld 1. Third and fourth rows show stereo pairs of eld 2. . . . . . . . . . . . . . . . . . . . . . . . . 133 6.16 The 3D retinal reconstruction results with the retinal image mapped onto sphere surfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 6.17 The errors between reconstructed 3D points and the approximated spherical surface without/with radial distortion removal (a) and with out/with a ne bundle adjustment (with the radial distortion removed) (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.1 Retinal images are obtained from a fundus camera which composes of an actual camera and a digital camera attached to a fundus camera [3]. 137 7.2 An original partial sphere with spreading angle = 90o . . . . . . . . . 143 7.3 Lens distortionfree synthesized data with noise: the errors between reconstructed points and surface approximation versus noise variances. 143 xv 7.4 Lens distortionfree synthesized data with noise: the errors of spreading angles in percentage versus noise variances. The overall spreading angle of an original synthetic partial sphere is 90o. . . . . . . . . . . . . . . 144 7.5 Noisefree synthesized data with lens distortion coe cients set to kc = [0:03; 0:08; 0:02; 0:01; 0:02]: Tested on CABALDU optimiza tion procedure. A plot between average errors between approximated surface and 3D points versus the number of synthesized points. . . . 145 7.6 Synthesized data with both noise and lens distortion: the errors be tween reconstructed points and surface approximation versus noise variances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 7.7 Noisefree synthesized data with lens distortion coe cients kc = [0:03; 0:08; 0:02; 0:01; 0:02]: (a) No optimization. (b) With ABA. (c) With CABA. (d) With CABALDU. . . . . . . . . . . . . . . . . . . 147 7.8 Four images generated by the four synthesized cameras: blue crosses ( ) are original images and red dots ( ) are images with noise (variance = 3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 7.9 Four images generated by the four synthesized cameras: blue crosses ( ) are original images and red dots ( ) are images with lens distortions (coe cients = [0:2;0:3;0:1; 0:1; 0:1]). . . . . . . . . . . . . . . . . 150 7.10 Noisefree synthesized data with lens distortion coe cients set to kc = [0:03; 0:08; 0:02; 0:01; 0:02]: (a) No optimization versus ABA. (b) ABA versus CABA. (c) CABA versus CABALDU. . . . . . . . . 151 7.11 Retinal Images: (a) No optimization versus ABA. (b) ABA versus CABA. (c) CABA versus CABALDU. . . . . . . . . . . . . . . . . . 152 7.12 The 3D retinal reconstruction results with the retinal image mapped onto sphere surfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 xvi CHAPTER 1 Introduction 1.1 Motivations and Objectives Diabetes is the leading cause of blindness among workingage Americans, and many patients with visionthreatening diabetic retinopathy remain asymptomatic until blind ness occurs [10]. The great majority of this blindness can be prevented with proper eye examination and treatment by ophthalmologists who rely on the results of random ized clinical trials, called Early Treatment Diabetic Retinopathy Study (ETDRS), to guide their treatment of patients with diabetes [11]. ETDRS requires sets of retinal images to be captured from di erent elds of an eye. Because ophthalmologists rely on multiple retinal images for disease diagnosis and evaluation, these images need to cover a required area of the retina. The retinal images need to meet the image quality criteria de ned by ETDRS protocol. Each set of fundus photographs should be assessed for quality before the patient leaves the imaging center. A photographer is required to decide whether a particular image set meets the three ETDRS's image quality assessment (IQA) requirements: (1) clarity & focus; (2) eld de nition; and (3) stereo e ect. A software tool to standardize and certify image quality is in de mand. Several types of visual models including graphical retinal mapping, 3D retinal surface reconstruction, and 2D retinal registration can (1) assist ophthalmologists in diagnosing, analyzing, and evaluating the disease; (2) facilitate clinical studies; and (3) be used as a spatial map during laser surgical procedures. Furthermore, many respectable researches in psychology have said that a person can always put a smile on his/her face but eyes usually reveal his/her true feelings. We agree with the statement 1 completely considering that eyes are the only place where we can, without any invasive medical techniques, directly see inside a human body. We can observe, in real time, a retina's blood vessels which are signi cant indicators to not only eyerelated diseases but also various other diseases. In this work, advanced retinal imaging approaches have been developed according to the aforementioned needs as follows. Objective 1: To develop an e cient feature extraction algorithm that can ef fectively segment blood vessels and to select reliable point correspondences for latter processing. Objective 2: To develop a robust 2D retinal image registration algorithm that can register seven ETDRS retinal images together and perform image quality assessment on eldcoverage. Objective 3: To study an accurate 3D retinal surface reconstruction algorithm that can recover the 3D shape of retina from ETDRS retinal images. Figure 1.1: Human retina. 2 1.2 Signi cance and Background Diabetic retinopathy is a complication of diabetes [12]. Vision of a person with dia betic retinopathy is shown in Fig. 1.2(b). The disease a ects blood vessels inside the retina. The retina is an area lying at the back of the eyeball as shown in Fig. 1.1. In accordance with [12, 11, 13], the earliest stage of the disease, the tiny blood vessels, or capillaries, become thinner, weaker and eventually they leak blood (microaneurysm) as illustrated in Fig. 1.3(b). A patient's sight at this stage is still good but an oph thalmologist can detect and notice the abnormalities in the retina. As the disease progresses, some blood vessels are blocked. These trigger the retina to grow new blood vessels, which are abnormal, fragile, and easily bleed as shown in Fig. 1.3(c). In the later stage of the disease, new blood vessels are grown continuously as well as scar tissue as shown in Fig. 1.3(d). Ultimately, retina will be detached from an eye. National eye institute (NEI) recommends everyone with diabetes to have comprehen sive eye exam at least once a year because diabetic retinopathy has no early warning symptoms or signs. According to [14], there are 20.8 million people in the United States, or 7% of the population, who have diabetes. While an estimated 14.6 mil lion have been diagnosed with diabetes, unfortunately, 6.2 million people (or nearly onethird) are unaware that they have the disease. Failure to undergo universally recommended annual eye examinations is the primary cause of this continued loss of sight. If detected early, majority of the severe vision loss from diabetic retinopa thy can be prevented with proper examination and treatment by ophthalmologists. Ophthalmologists primarily rely on the results of randomized clinical studies called ETDRS to guide their treatment of patients with diabetes. 3 (a) (b) Figure 1.2: Diabetic retinopathy. (a) Normal vision. (b) Same scene viewed by a person with diabetic retinopathy. 1.3 NIH's ETDRS Protocols The Early Treatment Diabetic Retinopathy Study (ETDRS) implemented standard ized retinal imaging, classi cation and severity staging for diabetic retinopathy as well as proving the therapeutic bene t of laser photocoagulation surgery in prevent ing vision loss [15]. This multicenter, randomized clinical trial designed to evaluate treatment of patients with nonproliferative or early proliferative diabetic retinopathy. A total of 3,711 patients were recruited to be followed for a minimum of 4 years to provide longterm information on the risks and bene ts of the treatments under study. The study demonstrated a statistically signi cant reduction in severe visual loss for those eyes with early treatment [11]. The ETDRS also developed an internationally recognized disease severity scale indicating the risk for diabetic retinopathy [16]. ET DRS protocols have become the \gold standard" for evaluating diabetic retinopathy [17] and diabetic macular edema [18]. 1.4 Technical Challenges In this work, we have addressed retinal image analysis issues: (1) image quality as sessment (IQA) in terms of eld coverage which can help in grading, and support 4 (a) (b) (c) (c) Figure 1.3: Di erent stages of diabetic retinopathy. (a) An example of normal retinal image. (b) A retinal image with microaneurysms. (c) Proliferated diabetic retinopa thy with fragile newly grown blood vessels. (d) A retina image with some types of scar tissues. (http://www.inoveon.com) the grader training processes; (2) several types of visual models, i.e. graphical reti nal map, 2D retinal registration, and 3D reconstructed retinal surface, which can greatly assist ophthalmologists in diagnosing and evaluating the disease, signi cantly facilitate clinical studies, as well as assist in laser surgical procedure by using vi sual models as spatial maps. The technical challenges associated with retinal image analysis are Feature extraction and correspondence selection Retinal images may not be wellfocused and blurred due to inappropriate image acquisition conditions. Image quality variability is the major challenge for feature extraction retinal 5 images. Speci cally, it includes: { Poor lighting condition can cause glaring or introduce arti cial e ects in retinal images. { Retinal images are usually dominated by the red homogeneous color with di erent shades. Background and foreground are di cult to distinguish. { Nonuniform illumination will lead to inconsistent intensity of both blood vessel and background within an image and across images. 2D retinal image registration: In addition to the challenges for feature extraction and correspondence selection, there are also some di culties for ETDRSbased retinal image registration. { Large homogeneous or textureless areas in retinal images complicate the registration process due to insu cient information for areabased approaches and inadequate features in featurebased approaches. { The overlaps between multi eld images are relatively small. This presents another major di culty to the procedure. It is not reliable to estimate the transformation for the whole images based on the small overlap regions. { The curved retinal surface and camera motion across ETDRS elds re quires highorder nonlinear transformations for image registration. 3D retinal surface reconstruction The 2D registration results are the perquisite for 3D retinal surface reconstruction where we are facing the same challenges in 2D registration as well as some new problems as follows. { The fundus camera parameters are unknown. Camera calibration has to be done prior to 3D reconstruction. { The virtual lens e ect from human cornea and lens distortion from the fundus camera have to be taken into account for 3D reconstruction. 6 1.5 Research Flow In order to provide readers better understandings of the materials and subjects in vestigated in this work, we use this section to provide the general ideas covered in this dissertation. An architecture of our system can be decomposed into a threelayer hierarchy as illustrated in Fig.1.4. F eature Extraction & Correspondences Translation Model Quadratic Model Affine Model 2D Retinal Image Registration Affine Structure Affine Bundle Adjustment 3D Retinal Surface Reconstruction Euclidean Structure Figure 1.4: Architecture of the proposed research. Feature correspondences provide input information for both 2D retinal image reg istration and 3D retinal surface reconstruction. The viceversa direction, knowledge and information regarding displacement, structure and motion will further improve correspondences accuracy. A similar analogy exists between 2D registration and 3D surface reconstruction. Displacements from 2D model supply additional input information for 3D surface reconstruction. On the other hand, geometric shape can considerably improve 2D registration results. In 2D registration layer, three 7 strati cation sublayers are presented in order to improve algorithm's robustness, e  ciency and accuracy. The least complex model, translation, is rst identi ed. Then, information from translation model is used as constraints for the a ne model. Finally, quadratic model is achieved with constraints from a ne model. A similar strati  cation approach is used in 3D surface reconstruction. Although the most relaxed solution in projective space is projective structure, we start with a ne structure. This is due to the fact that we use a ne camera. Once a ne structure is obtained, a ne bundle adjustment is employed to jointly re ne all parameters. Then, extra metric information is used to correct the structure back to Euclidean structure. Brief explanation regarding each layer is given next. Feature Extraction and Correspondence Selection The rst problem can be divided into two subproblems, feature extraction and feature matching/correspondences. Let us start with the rst subproblem, feature extraction. We want to extract fea tures because features can reduce the amount of data needed to be processed. Since \Not all information is created equal [19]", what are the most suitable features for matching? Points are used most of the time. Lines or blobs are also good features since they provide more information compared to points. Good features should con tain as much distinguishable information as possible. The next question would be what are the best approaches to extract features? And once features are obtained, how to match features across images? Although there are studies of image correlation [20, 21, 22], feature extraction and correspondences are pretty much imagedependent problems. Certain algorithms are good for some speci c types of images but perform poorly on other image types. 2D Image Registration Registration is a problem on how to coincide two or more images. Two images are often taken at di erent times, viewpoints, modes, or resolutions. Additionally, an image plane, and an world plane are often not par allel. Hence, it is impossible to simply overlay two images together. To register 8 two images, an \optimal" transformation model has to be identi ed. Numerous al gorithms have been proposed regarding this topic. These methods di ers in many aspects: (1) featurebased methods versus areabased methods; (2) batch methods versus RANSAClike methods; (3) lowlevel methods, e.g. optical ow, autocorrela tion, versus shapebased methods, e.g. template matching; (4) spatial domain versus frequency domain. 3D Surface Reconstruction Visual reconstruction is a process to recover a 3D scene or model from multiple images. It is usually referred to as a structure from motion, SFM, problem. A process usually recovers objects' 3D shapes, cameras' poses (positions and orientations), and cameras' internal parameters (focal lengths, principle points, and skew factors). Many possible camera models exist. A perspective projection is the standard. However, other projections, e.g. a ne, orthographic, are sometimes prove more useful and practical for a distant camera. The main di erences between projections are the required level of calibration. 3D reconstruction problem has been extensively studied and currently is a very active research topic. 1.6 Original Contributions Before the outline of each chapter is given, we would like to summarize main contri butions we believe we have made for this work. This dissertation is written based on a journal paper, four conference papers. There were certain motivations and contri butions at the times each topic was investigated. The speci c nature of retinal images leads to novel solutions for each problem. 1.6.1 Feature Extraction and Correspondence Selection Vascular tree and its bifurcation/crossover points are used as feature and correspon dences. This is due to the two following reasons: (1) vascular tree spans the whole retina hence it exists in every retinal images; (2) bifurcation/crossover points o er 9 more distinguishable information than homogeneous area throughout the retina. Be cause of nonuniform intensity of both background and foreground as well as the low contrast, multidirectional match lters are employed to enhance the contrast of blood vessels from the background. Next, a new description of cooccurrence matrix has been proposed. Then, a new thresholding method based on the idea of local entropy is introduced to extract blood vessels from the background. Because all blood vessels should be connected, length ltering is used to get rid of small separated regions. Finally, bifurcation/crossover points are detected by morphological thinning opera tion and windowbased probing approach. The simulation results suggest that the proposed framework is e cient and robust. Additionally, our false positive rates are lower than other computational expensive techniques while the true positive rates are comparable. A new cooccurrence matrix's de nition The new de nition takes into the account of both image spatial structure and noise in an image. Simulation results on several matched lter images demonstrate good performance in terms of robustness and accuracy. A new thresholding algorithm The proposed equation is slightly resem blance to both local entropy thresholding and relative entropy (or cross entropy) thresholding methods. The simulation results on various matched lter images exhibit better performance in terms of robustness and accuracy. 1.6.2 2D Retinal Image Registration Because areabased and featurebased have their own strengths and limitations, in this work, we combine both areabased and featurebased registration methods to get the advantages each method has o ered along with other decisionmaking cri teria in order to obtain the best optimal solution. In order to achieve robustness 10 and e ciency, hierarchical technique, translation, a ne, and quadratic, is incorpo rated. Because of nonuniform intensity within an image, binary mutual information is proposed for translation estimation. It demonstrates better performance in terms of robustness compared with traditional grayscale mutual information. In addition, multiscale searching strategy is applied to avoid large combinatorial searching space. Sampling point correspondences are introduced when bifurecation/crossover points are inadequate. An iterative closest point algorithm is used to re ne feature points as well as transformation models. Furthermore, two parameters characterizing the displacements along vertical and horizontal directions in translation model, suggest ing relative positions of each eld, can be used for image quality assessment (IQA) regarding eld coverage de nition. A hybrid registration method We combine both areabased and feature based registration methods to get the advantages each method has o ered. A translation model is estimated through binary mutual information while higher models are approximated through featurebased methods. Binary mutual information A binary mutual information is proposed. It demonstrates better performance in terms of robustness compared with tradi tional grayscale mutual information. Empirical conditions The conditions are e ectively combined all the tech niques into one ow where the algorithm is able (1) to adaptively select an appropriate transformation model, (2) to determine whether sampling point correspondences have to be involved, and more importantly, (3) to reject in valid registered pairs. Image quality assessment (IQA) We have addressed an issue of image qual ity assessment in terms of eld coverage by the two parameters characterizing the displacements along vertical and horizontal directions in translation model. 11 1.6.3 3D Retinal Surface Reconstruction In this research, we assume a weakperspective camera because of the two following reasons: (1) the ETDRS imaging standard speci es a 30 eld of view each eye (narrow eld of view); (2) each retinal image has small depth variation. We derive an a ne camera model and show mathematical proof of an a ne condition. A ne structure from motion has been investigated and an a ne factorization method is used for initial reconstruction because the approach can accommodate multiple images and utilize the use of all feature points. An a ne bundle adjustment based on a nonlinear optimization technique is used to re ne an a ne shape and a ne cameras. Then, a Euclidean constraint is involved to correct both the a ne shape and cameras into Euclidean space up to a similarity. Next, we take into an account of an eyeball's geometric constraint in order to generate denser points for surface. We assume that an eyeball is an approximated sphere. A pointbased sphere tting method is introduced. The condition for the a ne camera We have shown a mathematical proof of an a ne camera from a standard linear camera as well as its condition. A ne bundle adjustment Inspired by projective bundle adjustment, we introduce a ne bundle adjustment to re ne all parameters, a ne shape and a ne cameras, simultaneously. A pointbased spherical tting method We introduce a linear approach to solve a nonlinear surface approximation problem. Constrained a ne bundle adjustment with lens distortion updates We introduce an optimization process which incorporates a geometrically meaning ful de nition and lens distortion into a cost function. We propose a constrained optimization algorithm in an a ne space rather than the traditional Euclidean space. The procedure optimizes all of the parameters, camera's parameters, 3D points, physical shape of a retinal surface, and lens distortion, simultaneously. 12 1.7 Outline The organization of this dissertation is illustrated in Fig. 1.5. The motivation and signi cance of this research as well as relevant materials and subjects are presented in this chapter. Chapter 2 reviews thestateofart research on retinal image analysis and cate gorizes them into ve di erent areas, i.e., structure segmentation, image regis tration, 3D reconstruction, image quality assessment (IQA), and image classi  cation. Chapter 3, we reviewe the mathematical background of the proposed research, and the materials presented here serve the foundation of latter chapters. Specif ically, we will address 2D retinal image registration in Chapter 5 that requires 2D/2D transformations, and 3D retinal reconstruction is discussed in Chap ter 6 that involves a camera projection (3D/2D transformations) and 3D registration (3D/3D transformations). Chapter 4 deals with the rst layer of architecture, feature extraction. Fea tures are signi cantly important in motion estimation techniques because they are input to the algorithms. However, most works studied often neglect this part and assume features are available. We have proposed a feature extraction algorithm for retinal images. Bifurcations/crossovers are used as features. A new thresholding algorithm based on our de nition of cooccurrence matrix is proposed. As a result, vascular tree which is an important structure to indicate many diseases has also been extracted. In chapter 5, we consider 2D retinal image registration which are the prob lem of the transformations of 2D/2D. Both linear and nonlinear models are incorporated that account for motions and distortions. A hybrid method has 13 been introduced in order to take advantages di erent methods have o ered along with other decisionmaking criteria. Binary mutual information is pro posed for translation estimation. Hierarchical technique, translation, a ne, and quadratic, is incorporated. In chapter 6, a 3D retinal surface reconstruction issue has been addressed. To generate a 3D scenes from 2D images, a camera projection or transformations of 3D/2D techniques have been investigated. We choose an a ne camera to be a represented model for a fundus camera. We have provide our proof to justify the use of a ne camera. An a ne bundle adjustment based on non linear optimization technique is established to re ne an a ne shape and a ne cameras. A pointbased spherical approximation is introduced. In chapter 7, an objective for this chapter is to solve the problem in an optimal way which means to estimate structure and camera parameters simultaneously by minimizing a physically meaningful cost function. An optimization proce dure, called constrained a ne bundle adjustment with lens distortion updates, is proposed to improve the algorithm's performance in terms of accuracy and robustness. Chapter 8 states future works and concludes the dissertation. 14 Chapter 1: Introduction Chapter 2: Literature Reviews Chapter 3: Motion Parameters & Estimation Chapter 4: Feature Extraction Chapter 5: 2D Registration Chapter 6: 3D Surface Reconstruction Chapter 8: Conclusions & Future Works Transformations of 2D/2D Transformations of 3D/2D Transformations of 3D/3D Chapter 7: Constrained Optimization for 3D Surface Reconstruction Figure 1.5: Outline of the dissertation. 15 CHAPTER 2 Literature Reviews 2.1 Overview of Retinal Image Analysis Research Computerassisted retinal image analysis can be used for many purposes: (1) helping ophthalmologists diagnosing and evaluating eyerelated diseases; (2) patient screening and grading disease severity; (3) facilitating clinical studies; (4) quantifying retinal image quality; and (5) assisting in laser surgical procedure. Numerous techniques have been investigated and developed regrading retinal image analysis researches. The objective of this chapter is to categorize, characterize, and review these algorithms. Before we move further into more detail, we'd like to note here that there are several retinal image types in which two types are widely used: (1) normal retinal (fundus) images; and (2) uorescein angiogram (FA) images. A FA image is obtained by injecting a special dye, called uorescein, into patient's vein in the arm. The dye moves quickly to blood vessels inside the eyes. The result is that blood vessels become more prominent from the background (high contrast). In image analysis point of view, therefore, FA images are generally less complicated to be processed in comparison with normal fundus images. Here, we categorize retinal imagerelated researches into ve main groups: (1) structure segmentation; (2) 2D registration; (3) 3D reconstruction; (4) image quality assessment (IQA); and (5) image classi cation. The owchart is illustrated in Fig. 2.8 where the grayshaded boxes refer to the works involved in this work. 16 2.2 Structure Segmentation Image segmentation is one of the fundamental problems in computer vision and many other research areas. Many subsequent tasks, e.g. feature extraction, pattern recog nition, image retrieval, image registration, image compression, image classi cation, and etc., rely on the quality of image segmentation process. There are no univer sal theories as to what is the best approach for image segmentation. Segmentation is pretty much an imagedependent problem. However, good segmentation is that segmented region should be uniform with respect to some semantic characteristics. As for the retinal image case, we'd like to classify segmentation into two main cate gories, (1) anatomical structures which include blood vessel, optic nerve, and fovea; (2) pathological structures, i.e. lesion, which are abnormal structures. The goal is to detect and present the location of important structures in the retina as well as to nd correspondences across retinal images. Here, we only focus on blood vessel detection since our main tasks are registration and reconstruction and blood vessels are used as features. 2.2.1 Anatomical Structure Segmentation Two signi cant anatomical structures presented in a retina are blood vessels and op tic nerve as shown in Fig. 2.1. Besides, all the medical motivations mentioned above, a segmentation of the vascular tree seems to be the most appropriate representation for the image registration applications. This is due to the two following reasons: (1) vascular tree spans the whole retina hence it exists in every retinal images; (2) bifurca tion/crossover points o er more distinguishable information than other homogeneous areas throughout the retina. A variety of approaches have been proposed for vascular segmentation [9, 8, 6, 23, 24, 25, 26, 27, 28, 29, 30, 31]. Here, we arrange them into two main categories, supervised and unsupervised techniques. The major di erence is that supervised 17 Optic Disc Fovea Blood Vessel Figure 2.1: Anatomical structures in a human retina. techniques require training data. Although, generally speaking supervised methods should yield better segmentation results because of additional knowledge, training database, most of the algorithms proposed belong to unsupervised category. This is due to the fact that handlabeled vessels is a tedious task which takes more than a couple of hours to complete just one retinal image. Therefore, it is not practical for real applications. Supervised Approaches. Manually labeled images are required for training purpose. To the best of our knowledge, there are only four retinal's blood ves sel segmentation papers [25, 32, 6, 7] belonging to this group. Sinthanayothin et.al. [25, 32] used a multilayer perceptron for blood vessel classi cation with backpropagation as a training approach. An advantage is its ability to deal with nonlinear classi cation. The limitations are that it is di cult to see what is going on inside the hidden layers and training process is repeated every time new features are incorporated. They partitioned each retinal image into 10 10 pixels subimages. There were total 25094 subimages in which 5 6 of the data 18 were handlabeled groundtruth images and used as their training set. The rest was used for validation. Staal et.al. [6, 7] used ridge detection to locate candi date blood vessel segments. Ridges are de ned as points where the image has an extremum in the direction of the largest surface gradient [33, 6]. The direction of largest surface gradient is the eigenvector of the Hessian matrix correspond ing to the largest absolute eigenvalue. After ridges are de ned, several feature sets and decisionmaking criteria are applied to create convex sets as well as classify pixels to be vessels or not. Then, ridge pixels were groups and convex sets were formed. After that, the image was partitioned into patches based on the convex set. Every pixel was assigned to the convex set to which it was clos est. Next, feature sets were formed and kNNclassi er was employed. Twenty handlabeled groundtruth were used as a training set and twenty images were used for validation. Although, good segmentation results are reported, the al gorithm is computationally expensive and handlabelled groundtruth images are mandatory. As the title has suggested, approaches belong in this group need handlabelled groundtruth images. Handlabelled vascular tree is not practical in real appli cation since it consumes a great deal of time. This is a signi cant drawback of algorithms in this group. Unsupervised Approaches. No groundtruth information is provided. Va riety of unsupervised approaches have been proposed. We further divide them based on the proposed techniques. { WindowBased. Window is referred to a pattern that is used to transform an image. The process is usually followed by a binarized technique. Pinz et.al. [26] used local gradient maxima because it occurs at the boundary of the vessels, the signi cant edges along these boundaries were extracted. 19 The grouping process searched a partner for each edge which satis es cer tain criteria like opposite gradient direction and spatial proximity. Only the vascular centerlines could be detected. FA images were used in their research. Zana et.al. [27] also used FA images. Therefore, they de ned a vessel as a bright pattern and linearly piecewise connected. Opening morphological lters with linear structuring elements were used. Each structuring element is 15pixels long (every 15o). The sum of tophats on the ltered image brightened all blood vessels (linear parts) and reduced small bright noise. In order to remove nonvessel parts, principal curvature was computed by using Laplacian followed by morphological opening. { Classi cationBased. First step is to divide an image into di erent regions. Then, multiple rules are applied to classify pixels in each region as being vessels or not vessel. Hoover et.al. [9] used twelve 16 16pixel matched lter proposed in [34] to map the vascular tree. A set of criteria was tested to determine the threshold of the probe region, and ultimately to decide if the area being probed was a blood vessel. Since the MFR image was probed in a spatially adaptive way, di erent thresholds were applied throughout the image for mapping blood vessels. Jiang et.al. [8] used veri cation based multiple threshold probing framework. A retinal image was probed at di erent threshold values. At a particular threshold, Euclidean distance transform was performed. Then, vessel candidates were pruned by means of the distance map to only retain centerline pixels. Finally blood vessels were reconstructed by the particular threshold. { TrackingBased. Zhou et.al. [24], de ned each vessel segment by three attributes, direction, width, and center point. The density distribution of cross section of a blood vessel were estimated using Gaussian shaped func tion. Individual segments were identi ed using a search procedure which 20 kept track of the center of the vessel and made some decisions about the future path of the vessel based on certain vessel properties. This method required that beginning and ending search points were manually selected using cursor. FA images were used in their research. Can et.al. [35] used tracing method that based on adaptive exploratory processing of an im age. Algorithm explored the image along a grid to seek local graylevel minima by using directional lowpass template. Intersections between grid and local minima were labeled as seed points along with their orientations. Seed points were then used for tracing which was repeated for 16 direc tions. The tracing followed the strongest edge. Chutatape et.al. [36] used secondorder derivative Gaussian matched lters to locate center point, width, and orientations. Then, extended Kalman lter was employed for the optimal linear estimation of the next possible location. The algorithm began from circumference of an optic disc. Wu et.al. [37] divided blood vessels into large and small. Blood vessels were enhanced with matched lter. Gabor standard deviation lter was used to distinguish the large and small vessels. Then, 2D Gaussian lter was used for tracing. Di erent rules for forward and backward veri cation were used for large and small vessels. For windowbased techniques, performance of algorithms largely depend on thresholding techniques. To our surprise, most papers belonged to this group do not put a focus on thresholding algorithm. For classi cationbase techniques, multiple threshold values are applied to di erent regions. This leads to sev eral decisionmaking criteria which in themselves require numerous threshold values to select an appropriate threshold value for a particular region. For trackingbase techniques, an initial point needs to be identi ed. Moreover, only centerline, not the whole branch of vascular tree, can be extracted. 21 We have addressed the blood vessel extraction issue and have proposed a new thresholding technique. More detail of our algorithm can be found in Chapter 4. 2.2.2 Pathological Structure Segmentation Since our research does not focus on extracting lesion and due to limited time, we will not go into detail of this subject. We include the subject here for the sake of completeness. Basically, the algorithms can be divided into the same two main groups, supervised and unsupervised, as in anatomical structure extraction algorithms. 2.3 2D Image Registration Image registration is a process trying to coincide two or more images. To register two images, an optimal transformation must be identi ed. Image registration is used for many purposes, e.g. integrate information from di erent images, detect changes in im ages taken at di erent times, object recognition, and etc. There are two major types of distortions in an image: (1) misalignment between images; (2) distortion caused by camera (world plan and image plane are not parallel). Hence, it is impossible to simply overlay two images together. Several retinalrelated registration methods have been proposed [38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51]. Reviews of general registration can be found in [52, 53]. Several reviews of medical image registration can be found in [54, 55, 56, 57]. In the 2D retinal image registration, we categorized based on two criteria, types and techniques. As for the rst criteria, types, we classify registration types into three main categories, (1) viewbased or mosaic registration; (2) temporalbased registration; and (3) modalbased registrations. Regarding the second criteria, techniques, we group registration techniques into only two main cat egories, featurebased and areabased. Featurebased methods require set of feature correspondences, which can be points, lines, or blobs, in order to nd the optimal 22 registration model. Areabase methods deal with images without trying to locate salient features. The algorithms are based on pixel intensities and certain objective functions. Typically, there are two major factors that may degrade the performance of areabased methods: (1) nonconsistent/nonuniform contrast within an image; and (2) large homogeneous/textureless areas. The performance of featurebased meth ods largely depends on su cient and/or reliable correspondences, especially, when the overlapping part of an image pair is very limited or when there are mismatched correspondences. We rst group the 2D retinal registration researches based on types: (1) view based or mosaic registration; (2) temporalbased registration; and (3) modalbased registrations. For each type, it can be further divided into featurebased and area based techniques. 2.3.1 ViewBased Registration This category concerns retinal images taken at di erent views and how to integrate these images into one view as shown in Fig. 2.2. There are fourteen images taken per one eye according to ETDRS standard. Seven images from di erent areas of a retina and their corresponding stereo pairs. Integrate all of the images into one piece can be used as a spatial map during surgical procedure and can assist ophthalmologists in evaluating disease. Can et.al. [38, 39] proposed hierarchical featurebased approach. The similarity matrix for all possible correspondences was computed based on the orientations of vascular centerlines and the similarity measure was converted to a prior probabil ity. The transformation was estimated in a hierarchical way, from the zerothorder model to the rstorder model and nally to the secondorder model. Stewart et.al. [43] proposed a featurebased approach called dualbootstrap iterative closest point algorithm for registration. The approach started from one or more initial, loworder 23 Figure 2.2: Viewbased registration. See more detail discussion in Chapter 5. estimates that were only accurate in small image regions called bootstrap regions. In each bootstrap region, the method iteratively re ned the transformation estimation, expanded the bootstrap region, and tested to see if a higherorder model could be used. The method required accurate initialization of at least one point correspon dence. High success rates were reported in [43]. We have addressed the viewbased registration issue [58, 59] More detail of our algorithm can be found in Chapter 5. 2.3.2 TemporalBased Registration Retinal images from the same patient are acquired at di erent times. Temporalbased registration can greatly help ophthalmologists to examine progress of the treatment or disease. Fang et.al. [60] introduced an areabased a ne elastic model for temporal reg istration. Thinned vascular tree was extracted by using morphological, linear lter, and region growing techniques. An energy function was de ned to elastically register two images based on binary vascular tree. Ritter et.al. [1] used areabased technique 24 Figure 2.3: Temporalbased registration [1]. by employing mutual information as a similarity criteria. Simulated annealing was used as a searching technique. Simulation result from Ritter et.al. [1] is shown in Fig. 2.3. 2.3.3 ModalBased Registration Retinal images are acquired from di erent sensors, normal mode, FA mode, and other modes. Di erent modal images o er distinctive information. Ophthalmologists need to combine diverse information from di erent modes in order to correctly diagnosing and evaluating the diseases. In [2], registration between two modes, FA and redfree (RF) modes, had been studied. Matsopoulos et.al. [2] used areabased technique. A coarse vessel segmenta tion was performed in both FA and RF images. The measure of match (MoM), which was similar to a logical operation, was proposed to be used as an objective function and the genetic algorithm was chosen to be the optimization technique. Three trans formations, a ne, bilinear, andprojective, were included. Simulation result from [2] is shown in Fig. 2.4. Matsopoulos et.al. [61], later, proposed a featurebased method for modalbased registration. Vascular centerlines and their bifurcation points were 25 Figure 2.4: Modalbased registration [2]. extracted. Correspondences were identi ed by using neural networkbased approach, self organizing maps (SOM). A ne transformation was estimated in a least mean square sense. Zana et.al. [40] used featurebased technique. Landmark points were extracted and labeled with vessel orintations. An anglebased invariant was computed to give a probability for two points to match. Then, Bayesian Hough transform was used to sort the transformations with their respective likelihood. The most likely transformation was chosen for registration. 2.4 3D Reconstruction 3D reconstruction is a process to recover a 3D scene or model from multiple im ages. It is usually referred to as a structure from motion, SFM, problem. A process usually recovers objects' 3D shapes, cameras' poses (positions and orientations), and cameras' internal parameters (focal lengths, principle points, and skew factor). Many possible camera models exist. A perspective projection is the standard. A ne and orthographic projections are widely used in distant cameras. Stereo techniques, 26 e.g. cepstrum, are also commonly used for depth estimation because of their sim plicity. They requires only a stereo pair and it does not need camera calibration. Although a general SFM problem has been extensively studied, surprisingly there are not much researches published and dedicated to 3D retinal reconstruction. We categorize retinalrelated 3D reconstruction into two main categories, 3D surface reconstruction and local depth reconstruction. 2.4.1 3D Surface Reconstruction 3D surface reconstruction refers to global depth reconstruction where we consider the curvature of an eyeball. Deguchi et.al. [3, 62] modelled both fundus camera and eye lens with a single lens. They utilized the fact that a fundus has a spherical shape and image of sphere by the eye lens results in a quadratic surface. They, then calibrated a camera by using twoplane method to get the quadratic surface. Then, eye lens parameters were identi ed to recover fundus's spherical surface. The simulation result from [3] is shown in Fig. 2.5. Choe et. al. [4] used PCAbased directional lters to extract candidate seed points (Y features). A gradient descent was employed to model Y features and match pairs of features. A planeandparallax was employed to estimate the epipolar geometry because a nearplanar retinal surface can obstruct a traditional fundamental matrix estimation. The stereo pair is recti ed. Then, a Parzen windowbased mutual information was used to generate dense disparity map. The simulation result from [4] is shown in Fig. 2.6. In this work, we have also addressed the 3D surface reconstruction issue using a di erent approach. More detail of our algorithm can be found in Chapter 6. 2.4.2 Local Depth Reconstruction Local depth reconstruction refers to recovering the depth of individual objects, e.g. optic nerve and lesions, inside retina. Mitra et.al. [63] proposed the use of power 27 Figure 2.5: 3D retinal surface reconstruction [3]. cepstrum to nd disparity between a stereo pair. Then depth was calculated by a simple triangulation method. Corona et.al. [5] extended the idea from Mitra et.al. [63] and proposed a framework that combined the use of power cepstrum and cross correlation techniques to extract optic nerve's depth from a stereo pair. Then, bspline was employed to generate optic nerve smooth surfaces. The simulation result from [5] is shown in Fig. 2.7. 2.5 Image Quality Assessment (IQA) Image quality assessment plays a signi cant role in digital image processing research. Many e orts have been made to quantify image quality [64, 65, 66, 67]. In retinal image case, because ophthalmologists rely on retinal images for disease diagnoses and evaluation, the retinal images need to meet the image quality criteria. Each set of fundus photographs should be assessed for quality before the photographs are sent to ophthalmologists. A photographer should be able to decide whether a particular image set meets the three ETDRS requirements: 1. Clarity & focus. This is an obvious requirement in IQA. Focus is de ned as sharpness which means the transition between background and foreground is sharp. Clarity is de ned as image contrast. 28 Figure 2.6: 3D retinal surface reconstruction [4]. 2. Field de nition. Images need to cover the required areas of retina. The positions of key anatomical structures, optic nerve and fovea, in images are also necessary. 3. Stereo e ect. Depth can be perceived only if displacement between images is in an acceptable range. A software tool to standardize and certify image quality is in demand to assist photographers. Photographers can take a second shot immediately if necessary, rather than calling the patient back for another visit. A software tool should be able to quan tify an objective image quality that correlate with perceived quality measurement. There are limited number of works that have been published and devoted to the issue of retinal image quality assessment. Lee et.al. [68] used template intensity histogram derived from 20 goodquality images. Its base or width was employed as 29 Figure 2.7: Optic nerve's depth reconstruction [5]. an indicator of contrast. The quality of a target image was evaluated by convolving its histogram with the template histogram. Lalonde et.al. [69] suggested the use of two criteria, the distribution of the edge magnitudes and the local distribution of the pixel intensity, to assess images into three groups, good, fair, and poor. Awawdeh et.al. [70] proposed the use of power cepstrum for stereo quality assessment. The stereo pair are added. Then, DCTbased power cepstrum was applied to estimate the displacements which were indicators to stereo quality. In this work, we have also addressed the eld de nition issue [58, 59] which is a byproduct of 2D image registration. More detail of our method can be found in Chapter 5. To the best of our knowledge, there is no other previous work on the eld de nition topic. 2.6 Image Classi cation The term, image classi cation, is referred to a process used to assign a speci c class to a pixel. In retinal image case, it describes the anatomical and pathological classi cation. Since our research does not focus on image classi cation and due to limited time, we will not go into detail of this subject. We include the subject here for the 30 sake of completeness. 31 Retinal Image Analysis Structure Segmentation 2D Registration 3D Reconstructure Image Quality Assessment Image Classification Anatomical Structure Pathological Structure Viewpoint Temporal Modal Focus/Clarity Field Definition Stereo Effect Surface Reconstruction Local Depth Reconstruction Figure 2.8: Overview of retinal image analysis research. The grayshaded boxes refer to the topics involved in this work. 32 CHAPTER 3 Motion Parameters And Motion Estimation 3.1 Introduction The motion model is needed in order to infer information from multiple images. Suit able motion parameters for representing the possible motions of the camera has to be chosen before motion estimation. The motion parameters are nothing but the parameters in coordinate transformation. There are three major types of motion parameters: (1) 2D/2D, (2) 3D/3D, and (3) 3D/2D. In the 2D/2D case, they map an image coordinate, (x; y) to another image coordinate ( x; y). 2D retinal im age registration requires 2D/2D transformations. In the 3D/2D case, they map a world coordinate (X; Y;Z) to an image coordinate (x; y). The 2D/3D transfor mation is generally referred to as camera projection. In order to perform 3D retinal surface reconstruction, the 3D/2D transformations (or camera projection) need to be identi ed. In the 3D/3D case, a world coordinate (X; Y;Z) is mapped to another world coordinate ( X ; Y ; Z). After 3D retial surface is approximated, we need to fuse all of the retinal surfaces into one single view. This task involves the 3D/3D trans formations. Variety of other tasks, e.g. 3D reconstruction, 2D registration, video segmentation, object tracking, site monitoring, and etc, need motion estimation as well. Transformation of 2D/2D case and 3D/3D case are reviewed in sections 3.2 and 3.3. The 3D/2D mapping, camera projection, will be discussed in section 3.4. 33 3.2 Transformations of 2D/2D The 2D/2D transformations are the mapping between image coordinate systems. Several commonly used coordinate transformations are listed in Table 3.1. Translation Rigid Affine Projective Quadratic Figure 3.1: Results from 2D coordinate transformation. From Table 3.1, tx and ty are translation parameters in vertical and horizontal directions respectively. s is a scaling factor. r11; : : : ; r22 are rotation parameters. 11; : : : ; 26 are general motion parameters. A translation model is the simplest one. It can only handle displacements be tween images. A rigid model can further deal with translation, rotation and scaling between images. Both of them can not handle images with distortions. Absolute distance and area are preserved. They are called rigid invariants. An a ne model includes translation, rotation, scaling, and shearing distortion. Parallelism and rela 34 tive distance along parallel direction are a ne invariants. A projective is the general linear transformation describing translation, rotation, scaling, shearing distortion, keystoning distortion, and chirping distortion. The projective invariants are a ratio of ratios (cross ratio) of distances and linearity. Chirping means the e ect of increas ing or decreasing spatial frequency with respect to spatial location [71]. Keystoning means the e ect of convergence lines [72]. The results from each transformation are illustrated in Fig. 3.1. A quadratic model is a nonlinear transformation, accounting for translation, rotation, scaling, shearing, keystoning, chirping, and nonlinear dis tortions. Nonlinear distortions include several components, e.g. radial distortions, tangential distortions. 3.3 Transformations of 3D/3D 3D/3D transformations are the mapping between two 3D world coordinate systems. Several commonly used coordinate transformations are listed in Table 3.2. From Table 3.2, tx, ty and tz are translation parameters. r11; : : : ; r33 are elements in rotation matrices. s is a scaling factor. 11; : : : ; 44 are general motion parameters. A projective model is the most general case. Every points in the projective space are treated equally. Same as in 2D/2D transformation, the cross ratio is preserved in the projective space. When the plane at in nity is identi ed, the projective space becomes an a ne space. Hence, points and plane at in nity are a ne invariants. Parallelism and relative distance are other a ne invariants. Once an absolute conic is identi ed, the a ne space becomes a similarity (or metric) space. Assume (X; Y;Z; T) represents a homogeneous coordinate. Then, X2 + Y 2 + Z2 = 0; T = 0 is known as the absolute conic [73]. Therefore, absolute conic is preserved in the similarity space. Without scaling e ect, the similarity space becomes an Euclidean space. Absolute distance and volume are Euclidean invariants. The results from each transformation are illustrated in Fig. 3.2. 35 Euclidean Similarity Affine Projective Figure 3.2: Results from 3D coordinate transformation. 3.4 Transformations of 3D/2D A camera is a mapping between a 3D world coordinate system and 2D image co ordinate system. Therefore, the 3D/2D transformation is generally referred to as a camera projection. Camera motion has been one of the most important subjects in computer vision researches. Motion parameters can be estimated from two or more images depending on types of projection. Before camera motions are described, it is necessary to consider process of image formation. 36 3.4.1 Image Formation A pinhole model is a basic camera model for the central projection of points in a scene into an image plane as depicted in Fig. 3.3. Z Y X C M m image plane principal axis camera center f Figure 3.3: Pinhole camera geometry. Camera coordinate system is aligned with world coordinate system. Under a pinhole camera model, a point in space with coordinate ^M = (X; Y;Z)T is mapped to a point on the image plane ^m = (x; y). Then we have the relationship x = fX Z y = fY Z ; (3.1) where f represent the focal length of the camera. Using homogeneous coordinate with principal point o set (cx; cy) and skew angle s of image's pixel, the central projection can be represented as 37 266664 x y 1 377775 = 266664 fx s cx 0 0 fy cy 0 0 0 1 0 377775 266666664 X Y Z 1 377777775 : (3.2) In general a camera coordinate frame and a world coordinate frame are not coin cide. The equation becomes 266664 x y 1 377775 = 266664 fx s cx 0 fy cy 0 0 1 377775 266664 1 0 0 0 0 1 0 0 0 0 1 0 377775 264 RT RT t 0T3 1 375 266666664 X Y Z 1 377777775 ; (3.3) where R is a 3 3 rotation matrix and t = [tx; ty; tz]T is a translation vector. The equation can be simpli ed to m = K[Rj Rt]M m = PM; (3.4) where m and M denote image and world homogeneous coordinates respectively. K represents the e ect on the projection known as intrinsic parameters. [Rj Rt] are called extrinsic parameters. P is camera projection matrix. 3.4.2 Camera Models Three camera models, perspective, weak perspective, and orthographic models, which are widely used in computer vision, and two motion transformations, projective and a ne ,associated with the three camera models are discussed in this section. Readers may nd the terms are a bit confusing. This is due to the fact that two scienti c elds, photogrammetry and mathematics, use di erent terms to describe the same transformations. For instance, people in photogrammetry use the term perspective 38 projection to describe a general linear camera which happens to be in the same format as a projective transformation used in the mathematical eld. Perspective Projection A projective projection is the most general case. The projective transform or perspective camera can be represented by a mathematical equation as Tprojective = 266664 fx s cx 0 fy cy 0 0 1 377775 266664 RT 1 RT 1 tx RT 2 RT 2 ty RT 3 RT 3 tz 377775 = 266664 fxRT 1 + sRT 2 + cxRT 3 fxDx + sDy + cxDz fyRT 2 + cyRT 3 +fyDy + cyDz RT 3 Dz 377775 ; (3.5) where RT i is the ith row of the rotation matrix R and Dx = RT 1 tx, Dy = RT 2 ty, Dz = RT 3 tz. Image and world coordinates are related by ^m =264 (fxRT 1 +sRT 2 )M+fxDx+sDy RT 3 M+Dz fyRT 2 M+fyDy RT 3 M+Dz 375 +264 cx cy 375 ; (3.6) where ^m denotes nonhomogeneous image coordinates. WeakPerspective Projection A weak perspective camera can be represented in a mathematical form as Taffine = 266664 fxRT 1 + sRT 2 fxDx + sDy + cxDz fyRT 2 fyDy + cyDz 0T3 Dz 377775 : (3.7) Equation 6.3 has the similar form as a general a ne transform. Hence, it is termed a ne projection. Some call it a ne camera or weakperspective camera. Image and world coordinates are then related by 39 ^m =264 (fxRT 1 +sRT 2 )M+fxDx+sDy Dz fyRT 2M+fyDy Dz 375 +264 cx cy 375 : (3.8) More detail including the mathematical proof of an a ne camera can be found in section 6.2. Orthographic Projection Orthographic is the simplest case of a camera pro jection. It projects points along zaxis. The projection can be represented as Torthograpic = 266664 1 0 0 0 0 1 0 0 0 0 0 1 377775 : (3.9) The three projection models are illustrated in Fig. 3.4. An orthographic projection has ve degree of freedom, three parameters for a rotation matrix and two parameters for the displacements. It is suitable for the case where a world plane and an image plane are parallel. An a ne camera has eight degree of freedom corresponding to the eight nonzero element in a matrix. It is appropriate for a distant camera or a large focal length camera. A general projective camera has eleven degree of freedom, de ned up to an arbitrary scale. It is a general de nition for any linear camera. 3.5 A ne Structure From Motion An a ne structure from motion theorem is rst proposed by Koenderink and Van Doorn [74]. They have shown that two distinct views are enough to reconstruct a scene up to an arbitrary a ne transformation without a camera calibration. They have suggested the use of local coordinate frame (LCF). Later their algorithm has been re ned by Quan et.al. [75], Demy et.al. [76], and Shapiro [77]. Then, Tomasi and Kanade [78] proposed an a ne factorization method which eliminates the use of LCF and instead utilize the entire set of points. This section will review the a ne 40 Z M Image Plane Perspective Weak Perspective C f X d Orthographic Figure 3.4: Onedimensional image formation. Orthographic is projected along z axis. Perspective is projected along principal ray direction. Weak perspective is a combination between perspective and orthographic. A point, rst, is projected along zaxis to a plane Z = d. Then, perspective projection from the plane. structure from motion. Furthermore, we categorize a ne structure from motion into two main categories, two view geometry and multiple view geometry. 3.5.1 Two View Geometry Geometric Approach: Local Coordinate Frame Assume n general points in a scene. Four noncoplanar scene points, Mi, i 2 0; : : : 3, withM0 being a reference point can be considered as de ning a 3D a ne basis. De ne axis vectors Ei = Mi M0, i 2 0; : : : 3 which are called the local coordinate frame (LCF). Any points in a scene can be expressed as follow Mi = M0 + iE1 + iE2 + iE3; i 2 1; : : : n 1; (3.10) where , , and are a ne invariant coordinates. The prove is given next. 41 Under a 3D a ne transformation, M i = RMi + T; (3.11) where M is the new world position, R is a 3 3 matrix, and T is a 3vector. Then, M i M 0 = A(Mi M0) = iAE1 + iAE2 + iAE3 = i E 1 + i E 2 + i E 3: (3.12) Equation 3.12 demonstrates that , , and are indeed a ne invariant coordi nates. The a ne coordinates are independent of frame. Let's extend the idea into image plane under a ne projection. m = AM + d m = A M + d; (3.13) where m and m are 2vector image image coordinates from rst and second views respectively. A and A are general 2 3 matrices. d and d are general 2 1vectors. From Equations 3.11, 3.12, 3.13, and the di erences of vectors eliminate addition terms, T and d, we get mi m0 = ie1 + ie2 + ie3 mi m0 = i e1 + i e2 + i e3; (3.14) where ei = AEi, i 2 0; : : : ; 3 and ei = AREi, i 2 0; : : : ; 3. Although it seems redundant to represent image coordinates with three basis, the extra basis allows 3D a ne coordinates, ( , , ), to be computed. The solution may be obtained by minimizing a cost function in a least mean square sense. Algebraic Approach: Fundamental Matrix Given two a ne views, the relationship between them can be de ned as follows [79] [80] a xi + b yi + cxi + dyi + e = 0; (3.15) 42 where mi = [xi; yi; zi]T , mi = [ xi; yi; zi], and a; b; c; d; e are unknown parameters. This equation is termed a ne epipolar constraint. Rearrange Equation 3.15 in a matrix format mT i FAmi = xi yi 1 266664 0 0 a 0 0 b c d e 377775 266664 xi yi 1 377775 = 0: (3.16) FA is called a ne fundamental matrix which is an algebraic representation of a ne epipolar geometry. The epipolar geometry is the geometry between two views. 3.5.2 Multiple View Geometry Local Coordinate Frame Local Coordinate Frame explained in Section 3.5.1 can be extended to accommodate multiple view geometry. Assume n features appear in f distinct views. W = LS; (3.17) where W is a 2f (n 1) matrix containing the observations, L is a 2f 3 matrix containing the a ne basis, and S is an unknown 3 (n 1) matrix containing the a ne structure. A ne Factorization Tomasi and Kanade [78] has proposed an a ne factor ization method which eliminate the use of LCF and utilize the whole set of point correspondences. Suppose there are f a ne views and n point correspondences from each view. ^mi = Ai ^M + di; i 2 1; : : : f; (3.18) where ^m and ^M denotes image and world nonhomogeneous coordinates respectively. A is an arbitrary 2 3 matrix and d represents any 2 1 vector. 43 With the similar idea of LCF, we need to select one point as an origin or a center of mass. The Equation 3.18 becomes 4^mi = Ai(4 ^M ); i 2 1; : : : f W = LS; (3.19) W denotes a 2f n matrix containing set of 2D point correspondences with respect to the center of mass. S denotes a 3 n matrix containing a ne shape. L denotes 2f 3 matrix. With rank theorem, W is at most rank three. Singular value decomposition (SVD) is used to factorized W = UWVT . Therefore, L and S are the left and right eigenvectors corresponding to the three greatest eigenvalues. L = U3 S = W3V T 3 : (3.20) Concatenated Image Space Shapiro [77] has provided an insight geometrical meaning of an a ne factorization method proposed by Tomasi and Kanade [78] in terms of concatenated image space (CI space). For simpli cation, assume there are 2 views. If L in Equation 3.19 is decomposed into three columns L = [l1jl2jl3], then Equation 3.19 can be rewritten as wi = 4Xil1 +4Yil2 +4Zil3: (3.21) Shapiro [77] has indicated that \the 4dimensional wi is a linear combination of the three column vectors, and lies on a hyperplane in the 4dimensional CI space. This hyperplane is spanned by the three columns of Li and its orientation depends solely on the motion of the camera, while the distribution of points within the hyperplane depends solely on the scene structure (as shown in Fig. 3.5)." 44 Figure 3.5: The concatenated image (CI) space. If the images are noise free, then wi would lie exactly on the hyperplane . In practice, noise relocates wi from hyperplane . Hyperplane which best tted the w must be identi ed. If noise distribution is assumed to be zero mean, isotropic and Gaussian, then the maximum likelihood estimation of optimal hyperplane is found by minimizing n i=0(wi Lsi)T 1 wi (wi Lsi): (3.22) Because points are independent of each other, we simply assume wi = 2I for all i. In this case, the above equation becomes min L;s n i=0(wi Lsi)2: (3.23) This is exactly what a ne factorization does. The CI space and the a ne coordi 45 nate frame give insights into the physical concepts of the a ne factorization method. The cameras' motion can be represented by a hyperplane and its orientation. While the 3D points, in an ideal case, should be lain on the hyperplane. 3.6 Conclusions In this chapter, we have reviewed the mathematical background of the proposed re search, and the materials presented here serve the theoretical foundation of latter chapters. Speci cally, we will address 2D retinal image registration in Chapter 5 that requires 2D/2D transformations, and 3D retinal reconstruction is discussed in Chapter 6 that involves a camera projection (3D/2D transformations) and 3D registration (3D/3D transformations). Also, as the prerequisite of all geometrical transformations discussed above, feature points or correspondences have to be ex tracted rst that is the focus of the next chapter. 46 Table 3.1: 2D Transformation Models Models Transformation Models DOF Linear Transformations Translation 0BBBB@ x y 1 1CCCCA = 0BBBB@ 1 0 tx 0 1 ty 0 0 1 1CCCCA 0BBBB@ x y 1 1CCCCA 2 Rigid 0BBBB@ x y 1 1CCCCA = 0BBBB@ sr11 sr12 tx sr21 sr22 ty 0 0 1 1CCCCA 0BBBB@ x y 1 1CCCCA 4 A ne 0BBBB@ x y 1 1CCCCA = 0BBBB@ 11 12 tx 21 22 ty 0 0 1 1CCCCA 0BBBB@ x y 1 1CCCCA 6 Projective 0BBBB@ x y 1 1CCCCA = 0BBBB@ 11 12 13 21 22 23 31 32 33 1CCCCA 0BBBB@ x y 1 1CCCCA 8 Nonlinear Transformations Quadratic 0B@ x y 1CA =0B@ 11 12 13 14 15 16 21 22 23 24 25 26 1CA 0BBBBBBBBBBBBBB@ x y 1 x2 y2 xy 1CCCCCCCCCCCCCCA 12 47 Table 3.2: 3D Transformation Models Models Transformation Models DOF Euclidean 0BBBBBBB@ X Y Z 1 1CCCCCCCA = 0BBBBBBB@ r11 r12 r13 tx r21 r22 r23 ty r31 r32 r33 tz 0 0 0 1 1CCCCCCCA 0BBBBBBB@ X Y Z 1 1CCCCCCCA 6 Similarity 0BBBBBBB@ X Y Z 1 1CCCCCCCA = 0BBBBBBB@ sr11 sr12 sr13 tx sr21 sr22 sr23 ty sr31 sr32 sr33 tz 0 0 0 1 1CCCCCCCA 0BBBBBBB@ X Y Z 1 1CCCCCCCA 7 A ne 0BBBBBBB@ X Y Z 1 1CCCCCCCA = 0BBBBBBB@ 11 12 13 tx 21 22 23 ty 31 32 33 tz 0 0 0 1 1CCCCCCCA 0BBBBBBB@ X Y Z 1 1CCCCCCCA 12 Projective 0BBBBBBB@ X Y Z 1 1CCCCCCCA = 0BBBBBBB@ 11 12 13 14 21 22 23 24 31 32 33 34 41 42 43 44 1CCCCCCCA 0BBBBBBB@ X Y Z 1 1CCCCCCCA 15 48 CHAPTER 4 Feature Extraction 4.1 Introduction We want to extract features because features can reduce the amount of data needed to be processed. Since \Not all information is created equal [19]", what are the most suitable features for matching? Points are used most of the time. Lines or blobs are also good features since they provide more information compared to points. Good features should contain as much distinguishable information as possible. The next question would be what are the best approaches to extract features? Feature extraction are pretty much imagedependent problems. Certain algorithms are good for some speci c types of images but perform poorly on other image types. Several researches have been entirely devoted to this topic. In this research, vascular tree and its bifurcation/crossover points are used as fea ture and correspondences. This is due to the two following reasons: (1) vascular tree spans the whole retina hence it exists in every retinal images; (2) bifurcation/crossover points o er more distinguishable information than homogeneous area throughout the retina. The automatic detection of blood vessels in the retinal images can help physi cians for the purposes of diagnosing ocular diseases, patient screening, and clinical study, etc. Information about blood vessels in retinal images can be used in grading disease severity or as part of the process of automated diagnosis of diseases. Blood vessel appearance can provide information on pathological changes caused by some diseases including diabetes, hypertension, and arteriosclerosis. The most e ective treatment for many eyerelated diseases is the early detection through regular screen 49 ings. The organization of this chapter is as follows. The importance of blood vessel de tection in medical applications are given in this section. Currently available di erent methods for the detection of blood vessels have been reviewed in section 4.1.2. Then, Section 4.3 describes the implementation of the proposed algorithm. Simulation re sults and the performance of our algorithm are presented in section 4.4. 4.1.1 Entropy Thresholding Pun [81, 82] was the rst to adopt Shannon's information theory [83] in image thresh olding applications. Pun's thresholding method, entropy was de ned by only its gray level histogram. Pun only considered one probability distribution. Kapur et.al. [84], later, extended the idea in Pun's by considering two probabilities distributions, one for object and one for background, to binarize an image into foreground and back ground. Those two methods, however, did not take spatial relationship or image structure into its entropy. The two images with identical histogram will always result in the same threshold value. Pal et.al. [85] was the rst to propose the use of lo cal entropy thresholding for image thresholding applications which takes the spatial distribution or image structure into consideration. Instead of a graylevel histogram, Pal et.al. [85, 86] proposed twodimensional histogram called a cooccurrence matrix. Elements in cooccurrence matrix represented spatial distribution in an image. Chang et.al. [87] proposed a relative entropy (cross entropy) approach for image thresholding applications. The method involved KullbackLeiber distance in nding a threshold value that minimize the mismatch between a grayscale image and a binarized image. In this work, we combine the concepts proposed by Pal [85, 86] and Chang et.al. [87] and propose a new thresholding method. We also de ne a new de nition for a cooccurrence matrix. 50 4.1.2 Blood Vessel Extraction There were many previous works on extracting blood vessels in retinal images which can be found in chapter 2: literature review. Here we'll mention a few of these meth ods that have been considered to be stateoftheart methods at the time they were proposed and have been cited by almost every retinal vascular tree extraction papers. An e cient piecewise threshold probing technique was proposed by Hoover et.al. [9] where the matched lterresponse (MFR) image was used for mapping the vascular tree. A set of criteria was tested to determine the threshold of the probe region, and ultimately to decide if the area being probed was a blood vessel. Since the MFR image was probed in a spatially adaptive way, di erent thresholds can be applied throughout the image for mapping blood vessels. Jiang et.al. [8] used veri cation based multiple threshold probing framework. A retinal image was probed at di erent threshold values. At a particular threshold, Euclidean distance transform was per formed. Then, vessel candidates were pruned by means of the distance map to only retain centerline pixels. Finally blood vessels were reconstructed by the particular threshold. Staal et.al. [6] used supervised method. Twenty handlabeled ground truth were used as a training set and twenty images were used for validation. Ridge detection was employed to locate candidate blood vessel segments. Then, ridge pixels were groups and convex sets are formed. After that, the image was partitioned into patches based on the convex set. Every pixel was assigned to the convex set to which it was closest. Next, feature sets were formed and kNNclassi er was employed. 4.2 Preliminaries The entropy is de ned as a function of the state probability [83]. Theorem: Let p(si) be the probability of a sequence si of gray levels of length q, where a sequence si of length q is de ned as a permutation of q gray levels. 51 H(q) = 1 qXi p(si) log2 p(si): (4.1) Where the summation is taken over all gray level sequences of length q. Then H(q) is a monotonic decreasing function of (q) and limq!1H(q) = H, the entropy of the image. For di erent values of q, we get various orders of entropy. In the case of an image, the global entropy or H(1) is not an adequate measure since image pixel intensities are not independent of each other. Di erent images with identical histograms will always yield the same value of entropy since the spatial distribution is not taken into account in the global entropy computation. The local entropy or H(2) [85], on the other hand, can better distinguish two images in terms of their spatial structures, because it considers the dependency of image pixel intensities. The local entropy os de ned as H(2) = 1 2Xi Xj pij log2 pij ; (4.2) where pij is the probability of cooccurrence of the gray levels i and j. The co occurrence matrix is an L L dimensional matrix (L is the number of intensity levels) [85]. It indicates the transition of pixel intensities between adjacent pixels. The original cooccurrence matrix proposed by Pal [85, 86] is asymmetric by considering the horizontally right and vertically lower transitions. tij is de ned as follows: tij = P Xl=1 Q Xk=1 ; (4.3) where = 1 if 8>>><> >>: I(l; k) = i and I(l + 1; k) = j or I(l; k) = i and I(l + 1; k + 1) = j = 0 otherwise: Therefore, the local entropy can preserve the structure details of an image. Two 52 images with identical histograms but di erent spatial distribution will result in dif ferent local entropy (also di erent threshold values). 4.3 Proposed Framework In this paper, we propose a framework to e ciently locate and extract blood vessels in ocular fundus images. The proposed algorithm is composed of four steps, matched l tering, modi ed local entropy thresholding, length ltering, and bifurcation/crossover point detection. Compare with the method in [24], our proposed algorithm does not involve human intervention. Since our algorithm can automatically estimate one op timal threshold value, it requires less computational complexity compared with the methods in [9], [8], and [6]. In addition, our method doesn't require additional infor mation, training database as it is mandatory in [6]. 4.3.1 Vascular Tree Extraction Because of nonuniform intensity of both background and foreground as well as the low contrast, multidirectional match lters are employed. We observe three interesting properties of the blood vessels in retinal images. (1) Two edges of a vessel always run parallel to each other. Such objects may be represented by piecewise linear directed segments of nite width. The gradient directions for these two edge elements are 180o apart and hence they are sometimes referred to as "antiparallels". (2) The contrast between vessels and other retinal surfaces are very low. Blood vessels appear darker relative to the background. Sample of blood vessel graylevel pro le along direction perpendicular to their length is plotted in Fig. 4.1. It was observed that the vessels never have ideal step edges. Although the intensity pro le varies by a small amount from vessel to vessel, it can be approximated by a Gaussian curve. (3) Although the width of a vessel decreases as it travels radially outward from the optic disc, such a change in vessel caliber is a gradual one. The widths of the vessels are found to lie 53 within a range 36 180 m. For our initial calculation, we assume that all the blood vessels in the images are of equal width. 0 5 10 15 125 130 135 140 145 150 155 160 Gray Level Intensity pixels Figure 4.1: The graylevel pro le of the cross section of a blood vessel. Match Filter. In [34], the graylevel pro le of the cross section of a blood vessel can be approximated by a Gaussian shaped curve. The concept of matched lter detection is used to detect piecewise linear segments of blood vessels in retinal images. Blood vessels usually have poor local contrast. The twodimensional matched lter kernel is designed to convolve with the original image in order to enhance the blood vessels. A prototype matched lter kernel is expressed as f(x; y) = exp(x2 2 2 ); for jyj L=2; (4.4) where L is the length of the segment for which the vessel is assumed to have a xed orientation. The parameter L is chosen to be equal to 9 pixels. Here the direction of the vessel is assumed to be aligned along the yaxis. Because a vessel may be oriented at any angles, the kernel needs to be rotated for all possible angles. Assuming an angular resolution of 15o, twelve di erent kernels have been constructed to span all possible orientations (Fig. 4.2). A set of twelve 16x15 pixel kernels is applied by convolving to a fundus image and at each pixel only the maximum of their responses is retained. 54 For example, given a retinal image in Fig. 4.5(a) which has low contrast between blood vessels and background , its MFR version is shown in Fig. 4.5(b), where we can see blood vessels are signi cantly enhanced. Figure 4.2: Illustration of 12 matched lter kernels along di erent directions where = 2:0. Modi ed Local Entropy Thresholding Algorithm As the second step, the MFR image is processed by a proper thresholding scheme, which can be used to distin guish between enhanced vessel segments and the background. An e cient entropy based thresholding algorithm, which takes into account the spatial distribution of gray levels, is used because an image pixel intensities are not independent of each other. We, speci cally, implement a thresholding technique which is a blend of a local entropy thresholding [85] and a relative (or cross) entropy thresholding [87]. In 55 a match ltered retinal image, enhanced blood vessels are very sparse compared with the uniform background. This leads to a highly peaky cooccurrence matrix with a low entropy that is not appropriate for local entropy thresholding. Also, the local entropy thresholding aims to maximize the local entropy of foreground and background with out considering the unbalanced proportion between them. Therefore, blood vessels extracted by local entropy thresholding are usually not complete, and some detailed structures are missed. We made two modi cations to improve the results of blood vessel extraction. First, we develop a smoothed cooccurrence matrix to increase the entropy and to reduce the peak in the cooccurrence. The cooccurrence matrix of an image show the intensity transitions between adjacent pixels. The original cooccurrence matrix is asymmetric by considering the horizontally right and vertically lower transitions. We want to add some jittering e ect to the cooccurrence matrix that tends to keep the similar spatial structure but with less variations, i.e., T = [ti;j ]N N is computed as follow (Fig. 4.3(a)). For every pixel (l; k) in an image I  i = I(l; k);  j = I(l; k + 1);  d = I(l + 1; k + 1);  tij = tid + 1; End Fig. 4.3(b) and (c) compare the original cooccurrence matrix and the modi ed one for a typical match ltered image. Two matrices still share a similar structure that is important for the valid thresholding result. Also, the latter one has more entropy with a much smaller standard deviation, which is more desirable for local entropy 56 i j d (a) (b) (c) Figure 4.3: (a) The computation of the new cooccurrence matrix. (b) The original cooccurrence matrix in a normalized log scale ( = 261:63). (c) The a modi ed cooccurrence matrix in a normalized log scale = 9:00). thresholding. One may wonder whether the modi ed cooccurrence matrix still well represent the original spatial structure. Actually, considering a smooth area where j and d are very close or identical, the computation in Fig. 4.3(a) implicitly introduces certain lowpass ltering e ect and some structured noise to the cooccurrence matrix. Second, we want to preserve the complete vascular tree structure after threshold ing by modifying the original threshold selection criterion. The threshold selected by the local entropy aims to maximize the local entropy of foreground and back ground without considering the small proportion of the foreground compared to the background. Therefore, we propose to select the optimal threshold that maximizes the local entropy of the binarized image that tends to retain an appropriate fore ground/background ratio. The larger the local entropy, the more balanced ratio be tween foreground and background. If s, 0 s L1, is a threshold, s can partition a cooccurrence matrix into four quadrants, namely A, B, C, and D (Fig. 4.4). Then we de ne the local entropy of the binary image due to the foreground and the background as H(2) b (s) = PA log2 PA PC log2 PC; (4.5) where PA and PB are the probability sums in quadrants A and C, respectively. s corresponding to the maximum of H(2) b (s) is used as the optimal threshold for blood 57 Figure 4.4: Four quadrants of a cooccurrence matrix. vessel segmentation. As shown in Fig. 4.10, the modi ed local entropy thresholding algorithm can better preserve detailed blood vessels compared with the original one. For the MFR image shown in Fig. 4.5(b), the entropybased thresholding result is shown in Fig. 4.5(c) where we can see blood vessels are clearly segmented from the background. Length Filtering. As seen in Fig. 4.5(c), there are still some misclassi ed pixels in the image. Here we want to produce a clean and complete vascular tree structure by removing misclassi ed pixels. Length ltering is used to remove isolated pixels by using the concept of connected pixels labeling described in [88]. Connected regions correspond to individual objects. We rst need to identify separate connected regions. The binary image is simply an array of '1's and '0's. The length ltering tries to isolate the individual objects by using the eightconnected neighborhood and label propagation. We assume the binary image contains pixels with value '1' on objects and '0' on background. 1. Set the current label L = 1; 2. Scan in a raster order from the top left to the bottom right; 3. If encountering a pixel '1', check its neighbors in the upperleft half of its 8 connected neighborhood(W, NW, N, NE). 58 (a) (b) (c) (d) Figure 4.5: (a) An original retinal image. (b) Matched ltering result. (c) Local entropy thresholding result. (d) Vascular tree. If one of them have L > 1, set the current pixel's label to that value; Else If there is more than one label represented in the pixel's halfneighborhood, then these labels should be noted as "equivalent"; Else set the current pixel's label to the current label; increment the current label. 4. Go to step 2; 5. Relabel all \equivalent" labelled pixels to the same value. 59 Once the algorithm is completed, only the resulting classes exceed a certain num ber of pixels, e.g., 250, are labeled as blood vessels. Classes, that are not labeled as blood vessels, are eliminated. Fig. 4.5(d) shows the results after length ltering based on Fig. 4.5(c), where a clean vascular tree is presented. (a) (b) Figure 4.6: (a) Onepixel wide vascular tree. (b) Onepixel wide vascular tree with intersections and crossovers overlaying on grayscaled image. 4.3.2 Bifurcation/crossover Detection Vascular intersections and crossovers are the most appropriate representative features because (1) vascular tree spans the whole retina hence it exists in every retinal images; (2) bifurcation/crossover points o er more distinguishable information than other homogeneous areas throughout the retina. If a vascular tree is onepixel wide, the branching points can be detected and characterized e ciently from the vascular tree. Morphological thinning is applied to the vascular tree in order to get onepixelwide vascular tree as shown in Fig. 4.6(a). In order to save computational time, a 3 3 neighborhood window is used to probe and nd the branching points. If the number of vascular tree in the window is great than 3, it is a branch point. Then a 11 11 neighborhood is applied through a detected branching points in order to eliminate 60 the small intersections. We consider only the boundary pixels of a 11 11 square. If the number of vascular tree on the boundary is greater than 2, we mark it as an intersection/crossover. Fig. 4.6(b) presents the vascular tree with the intersections and crossovers. 4.4 Experimental Analysis 4.4.1 Thresholding Algorithm (a) (b) (c) (d) Figure 4.7: (a) An original image. (b) Our thresholding result (threshold = 101). (c) Local entropy thresholding result (threshold = 75). (d) Cross entropy result (threshold = 112). A cooccurrence matrix is a representation of spatial relationship in an image. 61 (a) (b) (c) (d) Figure 4.8: (a) An original image. (b) Our thresholding result (threshold = 136). (c) Local entropy thresholding result (threshold = 98). (d) Cross entropy result (threshold = 253). Each element in a cooccurrence matrix gives an idea about the transition of intensities between adjacent pixels. Our proposed thresholding algorithm works well on any MFR images because a MFR image have a closer range of intensity changes compared with a normal grayscale image. Therefore, in MFR images, an original cooccurrence matrix's de nition produces very distinct peaks in such a narrow which can corrupt the power of a statistical method. On the other hand, our method spreading out the weight and reduce the range of standard deviation which results in a better threshold selection. We also compare our proposed algorithm with the other two entropybased thresholding methods, namely local entropy thresholding [85, 86] and 62 relative (or cross) entropy thresholding [87], on di erent types of images including retinal images. Fig. 4.7 and Fig.4.8 illustrate examples of simulation results on normal grayscale images of our proposed algorithm versus the other two methods. Algorithm's performance depend on images. Table 4.1 demonstrates numerical results of the three approaches on MFR retinal image. Twenty retinal images provided by Hoover [89] are used to test the three entropybased thresholding algorithms. Plots of three entropybased thresholding algorithms are illustrated in Fig. 4.9 in which our algorithm always provides a distinct peak regardless of the image type. Examples of simulation results are shown in Fig. 4.10 for a normal retinal image and Fig. 4.11 for a retinal image with lesions. While the performance of cross entropy and our algorithm's performance are comparable in a normal image, our algorithm are more robust to lesions. In the MFR image, it is quite obvious that our algorithm's performance is better than the other two entropybased methods. 4.4.2 Blood Vessel Extraction Evaluation We use a set of twenty retinal images provided by Hoover [9] because both fundus and groundtruth images are available online [89] and other stateoftheart algorithms [9, 8, 6] use Hoover database to evaluate their algorithms' performance. Therefore, not only we can evaluate and analyze the performance of the proposed framework but we also can impartially compare performance of the proposed algorithm with others. Performance of blood vessel extraction is evaluated by using the true positive and the false positive rates as in [9]. Any pixel which was handlabeled as vessel and the algorithm labeled as vessel is counted as true positive. Any pixel which was handlabeled as nonvessel and the algorithm labeled as vessel is counted as false positive. The true positive rate is calculated by normalizing true positive by the total pixel number of the handlabeled vessel. The false positive rate is calculated by normalizing false positive by the total pixel number of the handlabeled nonvessel. 63 0 50 100 150 200 250 300 4 5 6 7 8 9 10 11 0 50 100 150 200 250 300 −16 −15.8 −15.6 −15.4 −15.2 −15 −14.8 −14.6 −14.4 −14.2 −14 0 50 100 150 200 250 300 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 (a) (b) (c) 0 50 100 150 200 250 300 3 4 5 6 7 8 9 0 50 100 150 200 250 300 −16 −15.5 −15 −14.5 0 50 100 150 200 250 300 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 (d) (e) (f) Figure 4.9: First row: entropy plots of an image shown in Fig. 4.10. Second row: entropy plots of an image shown in Fig. 4.11. (a),(d) Local entropy. (b),(e) Relative entropy. (c),(f) Modi ed local entropy. Speci cally, we classify retinal images into three categories, normal retinal images, abnormal retinal images with some lesions, and retinal images with obscure blood vessel appearance. Fig. 4.13 shows an example of simulation result from a normal retinal images. Fig. 4.14 presents an example simulation result from an obscure blood vessel appearance image. An example of simulation result from an abnormal retinal image with some lesions is shown in Fig. 4.15. In order to evaluate the performance of our algorithm, we compare our simulation results with stateoftheart results obtained from Hoover et.al. [9], Jiang et.al. [8], Staal et.al. [6, 7], and handlabeled ground truth segmentations. Numerical perfor mance of the proposed framework and other methods are demonstrated in Fig. 4.12. The best quantitative performance belongs to, as predicted, a normal image category. The numerical performances of a group of obscure bloodvessel appearance and lesion 64 (a) (b) (c) (d) (e) (f) Figure 4.10: (a) An original retinal image. (b) A matched ltering result. (c) A groundtruth segmentation. (d) The local entropy thresholding result. (e) The rela tive entropy thresholding result. (f) The modi ed local entropy threshold. images are comparable. In lesion image category, although the proposed algorithm misdetect lesions as blood vessel, the false positive rate of a lesion image group is lower and/or comparable to other approaches. This is due to the fact that lesions only account for small portions in an image. Although segmentation performance is decreased with presence of lesions, the additional detected lesions are actually good for 2D registration and 3D surface reconstruction because those lesions are used as additional candidate features. Overall performance of our algorithm is comparable to other computational expensive algorithms. While the other three methods require multiple threshold values and various decisionmaking criteria, we believe one single threshold yield better optimal segmentation results. If an image is partitioned into 65 (a) (b) (c) (d) (e) (f) Figure 4.11: (a) An original retinal image. (b) A matched ltering result. (c) A groundtruth segmentation. (d) The local entropy thresholding result. (e) The rela tive entropy results. (f) The modi ed local entropy threshold. multiple small regions and one particular small region is given, even by human eyes it is di cult to distinguish between foreground (blood vessel) and background in that small region. This remark may not be so apparent in Hoover database, but in high resolution images, e.g. ETDRS database, this observation is quite obvious since there are background patterns which highly resemblance blood vessel structures all over the retinal images. 4.5 Conclusions We have introduced an e cient and robust algorithm for blood vessel detection in ocular fundus images with a modi ed cooccurrence matrix used for local entropy 66 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive Rate True Positive Rate Our algorithm Staal algorithm Jiang algorithm Hoover algorithm Figure 4.12: Performance of our approach versus other methods, Staal's algorithm [6, 7], Jiang's algorithm [8], and Hoover's algorithm [9]. thresholding. The proposed method retains the computational simplicity, and at the same time, can achieve good simulation results. Additionally, our false positive rates are lower than other computational expensive techniques while the true positive rates are comparable. 67 Table 4.1: Threshold values of three di erent approaches for twenty retinal images Retina Our Approach Cross Entropy Thresholding Local Entropy Thresholding 0001 121 106 184 0002 82 98 123 0003 121 101 150 0004 69 76 139 0005 90 107 128 0044 63 74 109 0077 82 92 114 0081 86 96 138 0082 89 93 122 0139 110 121 161 0162 64 68 86 0163 89 95 111 0235 89 98 117 0236 89 97 123 0239 97 106 134 0240 87 98 132 0255 81 86 125 0291 113 119 145 0319 83 92 116 0324 93 105 122 68 (a) (b) (c) (d) (e) (f) Figure 4.13: (a),(b) Examples of normal retinal images. (c),(d) Handlabeled ground truth images. (e),(f) Our segmentation results. 69 (a) (b) (c) (d) (e) (f) Figure 4.14: (a),(b) Examples of obscure bloodvessel retinal images. (c),(d) Hand labeled groundtruth images. (e),(f) Our segmentation results. 70 (a) (b) (c) (d) (e) (f) Figure 4.15: (a),(b) Examples of retinal images with lesions. (c),(d) Handlabeled groundtruth images. (e),(f) Our segmentation results. 71 CHAPTER 5 2D Retinal Image Registration 5.1 Introduction Registration is a problem on how to coincide two or more images. Two images are often taken at di erent times, viewpoints, modes, or resolutions. Additionally, an im age plane, and an world plane are often not parallel. Hence, it is impossible to simply overlay two images together. To register two images, an \optimal" transformation model has to be identi ed. Numerous algorithms have been proposed regarding this topic. These methods di ers in many aspects: (1) featurebased methods versus areabased methods; (2) batch methods versus RANSAClike methods; (3) lowlevel methods, e.g. optical ow, autocorrelation, versus shapebased methods, e.g. tem plate matching; (4) spatial domain versus frequency domain. In the case of medical imaging, disease diagnosis and treatment planning are often supported by multiple images acquired from the same patient. Image registration techniques, hence, are needed in order to integrate the information gained from several images to obtain a comprehensive understanding. The organization of this chapter is as follows. The importance of 2D retinal registration in medical applications are given in this section. Currently available di erent methods for retinal registration have been reviewed in section 5.1.1. Our methodology is given in Section 5.1.2. Section 5.2 provides information concerning technical background. Then, Section 5.3 describes the implementation of the proposed algorithm. Simulation results and the performance of our algorithm are presented in section 5.4. 72 5.1.1 Literature Reviews Image registration is a fundamental problem to several image processing and com puter vision applications [52, 53]. A broad range of image registration methods have been proposed for di erent medical imaging applications including retinal image reg istration. Various criteria, e.g., modalities, dimensionalities, elasticity of the trans formation, have been proposed to categorize registration methods [52, 53, 55, 54]. Typically, retinal image registration techniques are classi ed as featurebased and areabased methods. Areabased techniques are generally based on pixel intensities and certain opti mized objective functions, such as least mean square error, crosscorrelation, phase correlation, or mutual information, [90, 2, 1, 91, 92, 93, 94]. In the case of retinal image registration, areabased approaches are often used in multimodal or temporal image registration applications. In [1], mutual information was used as a similarity measure and simulated annealing was employed as a searching technique. In [2], the measure of match (MOM) was proposed as an objective function and the genetic algorithm was chosen to be the optimization technique. Nevertheless, the searching space of transformation models (a ne, bilinear, and projective) is huge. The greater the geometric distortion between the image pair, the more complicated the searching space. Typically, there are two major factors that may degrade the performance of areabased methods: nonconsistent/nonuniform contrast within an image and large homogeneous/textureless areas. Featurebased methods are somewhat similar to manual registration [38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51]. The approach assumes that point corre spondences are available in both images, and the registration process is performed by maximizing a similarity measure computed from the correspondences. In [40], the bifurcation points of a vascular tree, also called landmark points, were labeled with surrounding vessel orientations. An anglebased invariant was then computed to give 73 a probability for every two matching points. After that, the Bayesian Hough trans form was used to sort the transformations according to their respective likelihoods. In [38], the similarity matrix for all possible correspondences was computed based on the orientations of vascular centerlines and the similarity measure was converted to a prior probability. The transformation was estimated in a hierarchical way, from the zerothorder model to the rstorder model and nally to the secondorder model. Nonetheless, su cient feature points have to be available. In [43], the dualbootstrap iterative closest point (dualbootstrap ICP) algorithm was introduced. The approach started from one or more initial, loworder estimates that were only accurate in small image regions called bootstrap regions. In each bootstrap region, the method iter atively re ned the transformation estimation, expanded the bootstrap region, and tested to see if a higherorder model can be used. The method required accurate ini tialization of at least one point correspondence. High success rates were reported in [43]. The performance of featurebased methods largely depends on su cient and/or reliable correspondences, especially, when the overlapping part of an image pair is very limited or when there are mismatched correspondences. 5.1.2 Methodology In this paper, we study retinal image registration in the context of the National Institutes of Health (NIH), Early Treatment Diabetic Retinopathy Study (ETDRS) standard protocol [11]. The ETDRS protocol de nes seven 30 elds of each retina with speci c eld coverage. A robust ETDRS image registration algorithm is re quired to (1) assess image quality in terms of ETDRS eld coverage, and to (2) sup port ETDRSbased disease staging. Three major challenges are present. First, small overlaps between adjacent elds lead to inadequate landmark points (crossovers and bifurcations) for featurebased methods. Second, the contrast and intensity distribu tions within an image are not spatially uniform or consistent. This can deteriorate 74 the performance of areabased techniques. Third, highresolution ETDRS images contain large homogeneous nonvascular/textureless regions which result in di culties for both featurebased and areabased techniques. Because areabased and featurebased have their own strengths and limitations, in this work, we combine both areabased and featurebased registration methods to get the advantages each method has o ered along with other decisionmaking crite ria in order to obtain the best optimal solution. In order to achieve robustness and e ciency, hierarchical technique, translation, a ne, and quadratic, is incorporated. Binary mutual information is proposed for translation estimation. It demonstrates better performance in terms of robustness compared with traditional grayscale mu tual information. In addition, multiscale searching strategy is applied to avoid large combinatorial searching space. Furthermore, two parameters characterizing the dis placements along vertical and horizontal directions in translation model, suggesting relative positions of each eld, can be used for IQA regarding eld coverage de nition. There are three major steps in the proposed algorithm. First, binary vascu lar trees are extracted from retinal images using a modi ed cooccurrence matrix for local entropybased thresholding method [95, 96]. Next, zerothorder transla tion is estimated by maximizing mutual information based on the binary image pair (areabased). Speci cally, a local entropybased peak selection scheme and a multi resolution searching strategy are developed to improve the accuracy and e ciency of translation estimation. Third, we use two types of features, landmark points and sampling points, for higherorder transformation estimation. Sampling points, which are acquired by imposing a grid onto the thinned vascular tree, are only introduced when landmark points do not meet certain criteria. Simulation results on 504 pairs of ETDRS retinal images show the e ectiveness and robustness of the proposed reg istration algorithm. 75 5.2 Preliminaries 5.2.1 NIH ETDRS Protocol Figure 5.1: ETDRS sevenstandard elds (right/left eyes) (http://eyephoto.ophth.wisc.edu/Photographers.html) The importance of the ETDRS protocols and challenges in their implementation call for the automated software tool for image quality assessment (IQA). The retinal images need to meet the image quality criteria de ned by ETDRS protocol. Each set of fundus photographs should be assessed for quality before the photographs are sent to the Coordinating center. A photographer is required to decide whether a particular image set meets the three ETDRS requirements: (1) clarity & focus; (2) eld de nition; and (3) stereo e ect. In this work, we focus on eld de nition. The evaluation of eld de nition involves (1) horizontal and vertical displacements between image elds; and (2) veri cation of relative positions of key features, i.e. optic disc and fovea, in an image. The images are captured sequentially from eld 1 and the required coverage de nes acceptable overlapping regions. The ETDRS imaging standard speci es seven stereoscopic 30 elds of each eye, is de ned in Table 5.1 and 76 illustrated in Fig. 5.1. Field 1 is centered on the optic disc. Field 2 is centered on the center of macula. Overlapping parts of elds 1 and 2 ( elds 1/2) as well as elds 2/3 are roughly 50% of the image size. For other elds, the overlapping parts are typically less than 25%. It is worth mentioning that the displacements are not always consistent and depend on patient cooperation and photographer skills. 5.2.2 Image Quality Assessment (IQA): Field De nition The ETDRS protocol speci es seven stereoscopic 30 elds of each eye, as de ned in Table 5.1 and Fig. 5.1. The overlap of eld pairs 1 and 2 (or elds 1/2) 1 as well as that of elds 2/3 are roughly 50% of the image size. For other eld pairs, the overlaps are typically less than 25%. It is worth mentioning that the eld displacements are not always consistent and depend on patient cooperation and photographer's skills. The importance of the ETDRS protocol and the challenges in its practical imple mentation call for automated software tools for image quality assessment (IQA) that checks the relative positions, i.e., horizontal/vertical displacements, of every image pair according to Table 5.1. By comparing the o set, i.e., To, which is the di er ence between the desired vertical/horizontal displacements and actual ones, with the diameter of optic disc (DD), an image pair is categorized as good (To < 1=2DD), fair (1=2DD To 1DD), or poor (To > 1DD). Therefore, the IQA of ETDRS eld de nition boils down to a problem of image registration followed by displace ment veri cation. We will brie y review some technical background of retinal image registration in the following. 1The notation of \ elds 1/2" indicates that eld 1 is the xed image (the model image) and eld 2 is the image being mapped to (the distorted image) 77 5.2.3 Global and Local Entropy The global entropy, or the entropy, of a nstate system is de ned as a function of the state probability [83], H = n Xi=1 pi log2 pi; (5.1) where pi is the probability of state i. In the case of an image, the entropy is not an adequate measure since image pixel intensities are not independent of each other. Di erent images with identical histograms will always yield the same value of entropy since the spatial distribution is not taken into account in the global entropy computa tion. The local entropy [85], on the other hand, can better distinguish two images in terms of their spatial structures, because it considers the dependency of image pixel intensities. The local entropy is de ned as H(2) = Xi Xj pij log2 pij ; (5.2) where pij is the probability of cooccurrence of the intensity levels, i and j. The cooccurrence matrix is an L L dimensional matrix (L is the number of intensity levels) [85]. It indicates the transition of pixel intensities between adjacent pixels. The local entropy, therefore, can indicate the spatial structure/patt 



A 

B 

C 

D 

E 

F 

I 

J 

K 

L 

O 

P 

R 

S 

T 

U 

V 

W 


