

small (250x250 max)
medium (500x500 max)
Large
Extra Large
large ( > 500x500)
Full Resolution


NEW INFORMATION DISPERSAL TECHNIQUES FOR TRUSTWORTHY COMPUTING By Abhishek Parakh Bachelor of Technology in Electronics and Communication Engineering Dr. B. R. Ambedkar National Institute of Technology Jalandhar, Punjab, India 2005 Master of Science in Electrical Engineering Louisiana State University Baton Rouge, Louisiana, USA 2007 Submitted to the Faculty of the Graduate College of Oklahoma State University in partial fulfillment of the requirements for the Degree of DOCTOR OF PHILOSOPHY May, 2011 COPYRIGHT c By Abhishek Parakh May, 2011 NEW INFORMATION DISPERSAL TECHNIQUES FOR TRUSTWORTHY COMPUTING Dissertation Approved: Dr. Subhash C. Kak Dissertation Advisor Dr. Nohpill Park Dr. K. M. George Dr. Sohum Sohoni Dr. Mark Payton Dean of the Graduate College iii ACKNOWLEDGMENTS My deepest gratitude to my Ph.D. advisor, Dr. Subhash Kak, who has also been my mentor in several realms of life over the past six years. He gave me the freedom to explore my areas of interest and guided me through toughest of times during the course of my work. He has been very patient with me and I continue to look up to him. Dr. Nohpill Park and Dr. K.M. George have been my committee members and teachers. They taught me several core areas of computer science that have been very helpful in my research work. I have also worked as Teaching Assistant for them. While Dr George bore my foray into being a teaching assistant and helped me learn the skill, Dr. Park gave me the additional opportunity to guest lecture to his class which helped me hone my teaching skills. Dr. Sohum Sohoni, a member of my committee, comes from the Electrical and Computer Engineering Department and has been very helpful to me. He helped me organize the student conferences (TACS’09 and TACS’10) and has always been very cheerful and encouraging. I would also like to thank Dr. V. Sarangan and Dr. A. Raghuram (Mathematics Department), who were my committee members during the initial phases of my Ph.D. work, for being very supportive and enthusiastic towards my research projects. Lastly, I would like to thank my parents for their love and support and my friends for making my stay cheerful. My special thanks to the Hatfields (Gary and Melita) for being like family away from home. iv TABLE OF CONTENTS Chapter Page 1 INTRODUCTION 1 1.1 The Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 ONLINE DATA STORAGE USING IMPLICIT SECURITY 9 2.1 Proposed Data Partitioning Scheme . . . . . . . . . . . . . . . . . . . 11 2.1.1 The (k, k) Data Partitioning Scheme . . . . . . . . . . . . . . 11 2.1.2 Choosing the Coefficients . . . . . . . . . . . . . . . . . . . . . 14 2.1.3 Introducing Redundancy . . . . . . . . . . . . . . . . . . . . . 15 2.1.4 Complexity of the proposed protocols . . . . . . . . . . . . . . 16 2.2 An alternate approach to partitioning . . . . . . . . . . . . . . . . . . 19 2.3 A stronger variation to the protocol modulo a composite number . . . 21 2.4 Addressing the Data Partitions . . . . . . . . . . . . . . . . . . . . . 22 2.5 Security by Key Distribution . . . . . . . . . . . . . . . . . . . . . . . 23 2.5.1 Key Partitioning in Galois Field 2m . . . . . . . . . . . . . . . 23 2.6 Partition redundancy v/s no redundancy . . . . . . . . . . . . . . . . 25 2.7 Application in Internet Voting Protocols . . . . . . . . . . . . . . . . 25 2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3 A TREE BASED RECURSIVE SCHEME FOR SPACE EFFICIENT SECURE DISTRIBUTED DATA STORAGE 28 3.1 A Comparison Based (2, 3) Recursive Scheme . . . . . . . . . . . . . 32 3.2 Proposed koutofn Recursive Secret Sharing Scheme . . . . . . . . . 34 v 3.3 Proposed Computational Secret Sharing Scheme . . . . . . . . . . . . 39 3.4 On Security of the Proposed Schemes . . . . . . . . . . . . . . . . . . 45 3.5 Efficiency and height plots . . . . . . . . . . . . . . . . . . . . . . . . 46 3.6 Recursive hiding in threshold visual cryptography . . . . . . . . . . . 47 3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4 INTERNET VOTING PROTOCOL BASED ON IMPLICIT SECURITY 54 4.1 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2 The Proposed Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.2.1 Registration Phase . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2.2 Voting Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2.3 Counting Phase . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.4 Recasting of ballot . . . . . . . . . . . . . . . . . . . . . . . . 64 4.3 Security of the proposed protocol . . . . . . . . . . . . . . . . . . . . 65 4.4 A Variation for Identification of Cheating Server/s . . . . . . . . . . . 67 4.4.1 Voting Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.2 Cheating Server Identification . . . . . . . . . . . . . . . . . . 68 4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5 CONCLUSIONS 71 BIBLIOGRAPHY 73 vi LIST OF FIGURES Figure Page 1.1 Illustration of conventional (explicit) data security (left) and illustration of the idea of implicit data security (IDA) (right). . . . . . . . . 2 2.1 Data partitioning and storage. . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Illustration of key partitioning. . . . . . . . . . . . . . . . . . . . . . 24 3.1 Recursive hiding of smaller messages in the shares of larger messages 33 3.2 Illustration of recursion tree for example 1 (partial illustration) . . . . 34 3.3 Illustration of application of algorithm 1 for n = 4. . . . . . . . . . . 38 3.4 (a) Plot of possible values m can take as n and h vary. (b) Plot of new effective share sizes relative to the size of the original secret S versus m and n (only a few values are shown). . . . . . . . . . . . . . . . . . 40 3.5 (a) Plot of efficiency improvement factor as a function of n and h. (b) Plot of efficiency improvement factor as a function of n and h (larger values). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.6 (a) Shows how the height h changes as a function of the number of pieces m and the number of players n. (b) Shows how the efficiency improvement factor changes as a function of arbitrary m and n. In both (a) and (b) each piece is taken to be of the minimum size possible (ex. a single digit when using a decimal base). . . . . . . . . . . . . . 44 3.7 Plot of maximum efficiency improvement factor with corresponding height h against m. Here n=2. . . . . . . . . . . . . . . . . . . . . . . 47 vii 3.8 Plot of maximum efficiency improvement factor with corresponding height h against m (a) n=3 (b) n=5. . . . . . . . . . . . . . . . . . . 48 3.9 Possible partitions for black and white pixels . . . . . . . . . . . . . . 49 3.10 Illustration of recursive hiding of secret images in shares of larger original image using a 2outof3 threshold scheme . . . . . . . . . . . . . 50 3.11 Illustration of the process of recursive hiding of secrets in shares of larger original image . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.12 Illustration of regeneration of smaller images from the shares hidden inside the shares of the original larger image . . . . . . . . . . . . . . 52 4.1 (a) Polynomial interpolation using vote (0, V ) and (i, ri), i = 1 to 8. (b) Lack of knowledge of (1, r1) and (2, r2) leaves the polynomial undetermined; hence the vote remains secure. . . . . . . . . . . . . . 66 viii CHAPTER 1 INTRODUCTION Embedded, mobile and cloud computing applications are a pervasive feature of the emerging information society. In such a world, with users connecting with intelligent devices and interacting with services available through the Internet and joining the network at short notice while uploading or downloading large volumes of data, it is essential to guarantee that the network is selfaware, selfadjusting, and selfhealing with respect to malicious and natural threats. This is especially so in critical applications such as electronic medical records, pharmaceutical supply chains, electronic ballots, and records related to electronic commerce. The assessment of risk and prediction of failures are some of the challenges of working in the field of trustworthy computing. One way to deal with risk and potential failure of the system is to include the adversary in the security model. Traditionally, the idea of explicit security by instituting firewalls and encryption systems has been used but it is not effective in dealing with natural or manmade disasters. The idea of information dispersal algorithms (IDA) is an example of implicit security in which the data is partitioned and stored over many different servers, which is illustrated in figure 1.1. Its direct application is to survivability of the system and protection against data loss due to disasters and intrusions as was the motivation in survivable storage systems of Delta4, Publius, Intermemory, Oceanstore, Farsite, eVault, and PASIS [1]. IDA systems make the selfhealing of data possible, and if the data on a device becomes corrupted or destroyed, an automated process can detect this and recalculate all data contained in the missing or corrupted partition by inspecting 1 Figure 1.1: Illustration of conventional (explicit) data security (left) and illustration of the idea of implicit data security (IDA) (right). available partitions. In RAID, several physical hard disks are so managed that they present themselves as a single logical unit to the operating system. RAID uses mirroring, striping, and error correction to improve reliability and availability of data and drive failure is masked from the end user. But RAID can involve significant computation overhead when reading and writing information. Of the several RAID schemes, RAID 5 uses striping along with distributed parity. In RAID 6, there is fault tolerance from two drive failures. Upon limited drive failure, subsequent reads may be calculated from the distributed parity. With copy and replication, RAID can allow for significantly more drive failures before there is data loss. But these systems are expensive since they are both copy and paritybased systems. RAID 7, a triple parity system, has been proposed to overcome the impending shortcomings of RAID 6 systems that will allow for up to 3 drive failures [2]. However, RAID does not focus on data security. Most distributed storage systems use (p, k, n) data distribution schemes, where the data is divided into pieces so that fewer than p pieces do not reveal any information, p or more than p but less than k pieces reveal some information and at least k pieces 2 are required to reveal the entire data. Here n−k pieces are redundant. For example, a (1, 1, n) distribution scheme creates n replicas of the original data and retrieving one of the replicas provides the original data. A (1, n, n) scheme divides the data into pieces of 1 n th the size of the original data while an (n, n, n) scheme creates n pieces each of the same size as that of the original data but all the n pieces are required for data reconstruction. Distributed storage systems that use secret sharing schemes [3, 4, 5] use (k, k, n) data distribution schemes, which creates n pieces of the same size as the data and thus are space inefficient. Rabin’s [6] information dispersal scheme is a (1, k, n) data distribution scheme that simultaneously encodes k data values into n pieces and each piece reveals partial information about the encoded values. Other schemes with general values of p, k and n are known as ramp schemes [7] and use p − 1 random values and k − (p − 1) data values. In general, the size of each piece is proportional to 1 k−(p−1) of the original size of the data object that is encoded. Apart from the threshold schemes discussed above an additional layer of security may be provided by using an encryption key to encrypt the data before dividing it up into pieces [8]. The encryption key is then stored using a threshold secret sharing scheme of type (k, k, n) and the encrypted data is divided into n pieces using (1, k, n) information dispersal scheme. Such a hybrid system balances security and space efficiency. By coding and dispersing information, the reliability, security and efficiency of data storage can be vastly improved by IDA systems over traditional copy and paritybased systems. Copybased systems offer reliability by mirroring data on n storage devices, and up to n − 1 of the devices can fail without data loss. But this comes at high cost which is n times greater than storing a single copy. Paritybased systems, commonly used in RAID [9], typically allow at most two storage devices to fail without data loss. Paritybased systems are not as wasteful as copybased systems and they 3 are equally reliable. A special benefit of IDA schemes over copybased systems is the inherent security advantages of dispersing information since the data is not stored at any single location. Each stored partition is a small and unique fraction of the data. An IDA system can be configured to disperse data to n number of devices which can sustain up to n − k failures without loss. Even if an attacker gained access to the devices, the system remains secure until at least k devices are compromised. Maintaining multiple copies creates multiple points of attack, further decreasing the reliability of the system. IDA systems are efficient since the storage overhead is low. They offer the high reliability of copybased storage and the high efficiency of paritybased systems. IDA systems are not only reliable, secure, and less expensive than backup and archive systems, they are the obvious choice for geographically distributed storage. If each of the n devices is kept at a different location throughout the world, then the resultant system is effectively a disaster proof backup system. So long as k locations remain up, the data would be safe and accessible even in the presence of power failure, earthquakes, floods, and other disasters. Delta4 is one of the earliest systems proposed for distributed data storage [10, 11] and uses security servers and data storage servers. The system works by dividing the data into fragments and then storing encrypted fragments. All the encrypted fragments are sent to all the storage servers that run a pseudorandom algorithm to decide on which fragments to store. A client could access a piece of data by getting authorization from a set of security servers. Publius [12] aims at providing survivable storage as well as anonymity. However, the system is inefficient in the sense that it replicates data for storage. Before replication the data is encrypted and the encryption key is divided into pieces and each piece is stored with a replica of the encrypted data. Intermemory [13] is an archival storage system for public domain. It provides 4 high availability and minimizes storage and expected cost of reading data. The nodes store a single piece of data, then apply a threshold scheme and distribute pieces of this data to other nodes which in turn again apply a threshold scheme and further distribute the pieces of these pieces. This recursive encoding of pieces is termed as deepencoding. OceanStore [14] is a storage scheme for wideareanetworks and addresses many issues such as client mobility, robust naming, data migration and replica discovery. It stores active data as encrypted replicas and the archival data using deepencoding as in Intermemory. Farsite [15] is aimed at peer based storage over desktop personal computers. It uses the workstation idle cycles and unused storage capacity of desktop workstations to store encrypted replicas of the data. Evault [16] uses information dispersal or short secret sharing [8] depending on the desired security level and distributed finger prints [17] for data integrity. 1.1 The Objective The major objective of this dissertation is to develop new information dispersal techniques, more general than have been researched before, that are suited for applications involving confidentiality and verifiability in the presence of erratic servers and in the presence of an adversary as one might expect in a variety of situations related to cloud computing, sensor networks, information hiding, and Internet voting. Mathematically, we will develop selfverifiable information dispersal techniques so that a client can distribute the data among a set of n servers, of which up to n − k might be faulty, in such a way that the client can always recover the stored data correctly, independently from the behavior of the faulty servers. Network based protocols for information dispersal that could tolerate a certain degree of errors on servers were proposed by Garay et. al. [18] and Cachin and Tessaro [19]. The first of these relies on synchronous networks, which are adequate for tightly coupled nodes such as clusters, but unrealistic for geographically distributed 5 or heterogeneous systems, and the second uses an asynchronous network model with certain bounds on the computational capacity of the adversary. We go beyond the scope of the earlier works and not only consider verifiability based on the relationship between partitions of data stored on various servers but also use structure within the partitions themselves to provide higher performance both for Byzantine servers as well as clients. The information dispersal techniques proposed in this dissertation lead to new scalable and optimized schemes that reduce computational overhead and provide applicationspecific balance between confidentiality and verifiability. It should be possible to apply these data distribution schemes to a variety of applications including secure data storage in cloud computing, sensor networks, Internet voting, peertopeer networks, and information hiding. Some of our results have appeared in [20, 21, 22, 23, 24, 25, 26, 27, 28]. We divide our approach into two phases that integrate the theoretical aspects of our research into an effective strategy towards achieving application specific goals. The new idea that we will use is that of recursively generated partitions, or partitions that are nested, which means creating a system where dependencies are not distributed equally across servers. This research has investigated several basic space efficient nested schemes [20, 23, 27]. In these schemes, the original data block is broken into several pieces and the pieces are divided into slices one by one, by repeatedly using already generated slices. This establishes the ground for a more fundamental analysis, in an abstract setting, of such nonuniform IDA schemes that not only provide the theory for data storage on computer and sensor networks, but also to steganography. The aims of this dissertation are the following: 1. To develop new information dispersal techniques that are applicable to resource constraint environments such as sensor networks as well as cloud data storage. 6 2. Discover new applications for distributed data storage and information dispersal techniques that we develop. 3. Implement fault tolerance for the dispersal techniques in order to detect errors in data partitions and if possible also detect the faulty/malicious server. Chapter 2 proposes a computationally efficient data distribution technique that can be used in resource constraint environments such as sensor networks as well as for online data storage. These results have been published in [25, 27]. We describe the use of a data partitioning scheme for implementing implicit security involving the roots of a polynomial in finite field. The partitions are stored on randomly chosen servers on the network and they need to be retrieved to recreate the original data. Data reconstruction requires access to each server, login password and the knowledge of the servers on which the partitions are stored. This scheme may also be used for data security in sensor networks and Internet voting protocols. Chapter 3 proposes a recursive secret sharing technique that can be applied to data partitioning and steganography. Steganographic applications can not only be used for storing hidden information but also for checking data consistency once the original data is recreated using the retrieved partitions. The recursive techniques presented have been applied to visual cryptography to send watermarks and other secret images. These results have been published in [20, 23]. The chapter presents a koutofn recursive secret sharing scheme based on an nary tree data structure. In recursive hiding of secrets, the user encodes additional secrets in the shares of the secret intended to be originally shared without an expansion in the size of the latter, thereby decreasing the effective share size per secret and increasing the overall space efficiency of secret sharing, with a tradeoff in security. Chapter 4 discusses the application of distributed data storage to solve the problem of secure and verifiable internet voting. We present a protocol to achieve this and our work has appeared in [21, 26]. The chapter presents an improved protocol for 7 Internet voting that allows recasting of ballots to eliminate voting errors, encourages early voting and provides an opportunity for changing votes as the election campaign progresses. The protocol is able to meet the competing requirements of verifiability and privacy by using a distributed security architecture, involving multiple servers, and multiple audit channels. It also detects servers that might be cheating. 8 CHAPTER 2 ONLINE DATA STORAGE USING IMPLICIT SECURITY Securing data stored on distributed servers is of fundamental importance in cloud computing. The traditional (explicit) approach to securing data is to store and back it up on a single server and allow access upon the use of passwords that are needed to be frequently changed. But there is a tendency among users to keep passwords simple and memorable [29, 30, 31] leading to the possibility of brute force attacks. Furthermore since the data on the Web is archived, keys that provide adequate encryption today are likely to be broken in the future. Therefore, explicit security architecture may not be adequate for many applications. To go beyond the present approach, one may incorporate further security within the system by using puzzles [32] or otherwise by another layer that increases the space that the intruder must search in order to break the system. Here, we propose an implicit security architecture in which security is distributed amongst many entities. In contrast to hashbased distributed security models for wireless sensor networks [33, 34, 35, 36, 37], we consider a more general method of data partitioning. In this approach, stored data is partitioned into two or more pieces and stored at randomly chosen places on the network that are known only to the owner of the data. Access to these pieces not only depends on the knowledge of a password but also on the knowledge of where the pieces are stored. The division of data is performed in such a way that the knowledge of all the pieces is required to recreate the data and that none of the individual pieces reveals any useful information. In scenarios where one or more pieces may be at the danger of being lost or inaccessible due to system or 9 network failure, one may employ schemes that can recreate the data from a subset of original pieces. Formally, our scheme consists of two parts  the first part is a (k, k) partitioning scheme where all the k partitions are required to recreate the data and the second part extends the first part of the scheme to a (k, n) partitioning scheme, where k n and k > 0 , i.e. only k out of n partitions are required to recreate the data. A number of schemes have been proposed in the communications context for splitting and sharing of decryption keys [3, 38, 39]. These schemes fall under the category of “secret sharing schemes”, where the decryption key is considered to be a secret. Motivated by the need to have an analog of the case where several officers must simultaneously use their keys before a bank vault or a safe deposit box can be opened, these schemes do not consider the requirement of data protection for a single party. Further, in any secret sharing scheme it is assumed that the encrypted data is stored in a secure place and that none of it can be compromised without the decryption key. Some schemes that directly apply secret sharing for distributed data storage in sensor networks have been proposed [40, 41, 42], but have a complexity of O(n log2n) at best, whereas our scheme can achieve linear complexity. Other schemes aim at sharing multiple secrets [43] but have a complexity of O(n2) or generate group keys for group sharing of information [44]. Certain probability models have been proposed for reconstructing secret sharing over the Internet [45], but only focus on strategies for share distribution over the network and do not provide methods for data partitioning. We protect data by distributing its parts over various servers. The idea of doing these partitions is a generalization of the use of 3 or 9 roots of a number in a cubic transformation [46]. The scheme we present is simple and easily implementable. We would like to stress that the presented scheme is different from Shamir’s secret sharing scheme [3] which takes the advantage of polynomial interpolation and maps 10 the secret on the yaxis; whereas we map the data as roots of a polynomial on the xaxis. One of the potential application of the idea of implicit security is in secure internet voting. Internet voting is an inherent part of the online virtual worlds, such as Second Life, Active Worlds, Habbo Hotel, and others, and is only a few years away from large scale real world implementations [47, 48, 49, 50]. These virtual worlds provide an excellent testbed for online voting schemes, which once successful may be used in realworld democratic process. To make use of the implicit security architecture in online voting, the voter’s cast ballot is considered as his data and the partitions of this ballot may be stored on different servers for distributed security. Such a system will protect the intermediate election results from being revealed while distributing the security of the ballots over different servers [26]. 2.1 Proposed Data Partitioning Scheme 2.1.1 The (k, k) Data Partitioning Scheme We now describe our data partitioning scheme. By the fundamental theorem of algebra, every equation of degree k has k roots. We use this fact to partition data into k partitions such that each of the partition is stored on a different server. No explicit encryption of data is required to secure each partition. The partitions in themselves do not reveal any information and hence are implicitly secure. Only when all the partitions are brought together is the data revealed. Consider an equation of order k xk + ak−1xk−1 + ak−2xk−2 + ... + a1x + a0 = 0 (2.1) Equation 2.1 has k roots denoted by {r1, r2, ..., rk} {set of complex numbers} and 11 can be rewritten as (x − r1)(x − r2)...(x − rk) = 0 (2.2) In cryptography, it is more convenient to use the finite field Zp where p is a large prime. If we replace a0 in (2.1) with the data d 2 Zp that we wish to partition then, xk + Xk−1 i=1 ak−ixk−i + d 0 mod p (2.3) where 0 ai p− 1 and 0 d p− 1. (Note that one may alternatively use −d in (2.3) instead of d.) This may be rewritten as Yk i=1 (x − ri) 0 mod p (2.4) where 1 ri p− 1. The ri are the partitions. It is clear that the term d in (2.3) is independent of variable x and therefore Yk i=1 ri d mod p (2.5) If we allow the coefficients in (2.3) to take values a1 = a2 = ... = ak−1 = 0, then (2.3) will have k roots only if GCD(p − 1, k) 6= 1 and 9b 2 Zp such that d is the kth power of b. One simple way to chose such a p would be to choose a prime of the form (k · s + 1), where s 2 N. However, such a choice would not provide good security because knowledge of the number of roots and one of the partitions would be sufficient to recreate the original data by computing the kth power of that partition. Furthermore, not all values of d will have a kth root and hence one cannot use any arbitrary integer, which would ideally be required. Therefore, one of the restrictions on choosing the coefficients is that not all of them are simultaneously zero. For example, if the data needs to be divided into two parts then an equation of 12 second degree is chosen and the roots computed. If we represent this general equation by x2 + a1x + d 0 mod p (2.6) then the two roots can be calculated by solving the following equation modulo p, x = −a1 ± p a21 − 4d 2 (2.7) which has an solution in Zp only if the square root p a21 − 4d exists modulo p. If the square root does not exist then a different value of a1 needs to be chosen. We present a practical way of choosing the coefficients below. However, this brings out the second restriction on the coefficients, i.e. they should be so chosen such that a solution to the equation exists in Zp. Theorem 2.1 If the coefficients ai, 1 i k−1 in equation (2.3) are randomly and uniformly chosen and are not all simultaneously zero, then the knowledge of any k−1 roots of the equation, such that equation (2.4) holds, does not provide any information about the value of d with a probability greater than that of a random guess of 1/p. Proof. Given a specific d, the coefficients in (2.3) can be chosen to satisfy (2.4) in Zp in the following manner. Choose at random from the field k − 1 random roots r1, r2, ..., rk−1. Then kth root rk can be computed by solving the following equation, rk = d · (r1 · r2 · ... · rk−1)−1 mod p (2.8) Since the roots are uniformly and randomly distributed in Zp, the probability of guessing rk without knowing the value of d is 1/p. Conversely, d cannot be estimated with a probability greater than 1/p without knowing the kth root rk. It follows from Theorem 2.1 that data is represented as a multiple of k numbers 13 Figure 2.1: Data partitioning and storage. in the finite field. Example 1. Let data d = 10, prime p = 31, and let k = 3. We need to partition the data into three parts for which we will need to use a cubic equation, x3 + a2x2 + a1x − d 0 mod p. We can find the equation satisfying the required properties using Theorem 2.1. Assume, (x − r1)(x − r2)(x − r3) 0 mod 31. We randomly choose 2 roots from the field, r1 = 19 and r2 = 22. Therefore, r3 d· (r1 · r2)−1 10· (19·22)−1 mod 31 11. The equation becomes (x − r1)(x − r2)(x − r3) x3 − 21x2 + x − 10 0 mod 31, where the coefficients are a1 = 1 and a2 = −21 and the partitions are 11, 22 and 19. 2.1.2 Choosing the Coefficients In the previous section, we described two conditions that must be satisfied by the coefficients. The first condition was that not all the coefficients are simultaneously zero and second that the choice of coefficients should result in an equation with roots in Zp. Since no generalized method for solving equations of degree higher than 4 exists [51], a numerical method must be used which becomes impractical as the number of partitions grow. An easier method to compute the coefficients is exemplified by Theorem 2.1 and Example 1. One might ask why should we want to compute the coefficients if we already have 14 all the roots? We answer this question in section 3. 2.1.3 Introducing Redundancy In situations when the data pieces stored on one or more (less than a threshold number of) servers over the Internet may not be accessible, the user should be able to recreate the data from the available pieces. The procedure outlined below extend the k partitions to n, n k partitions such that only k of them need to be brought together to recreate the data. If {r1, r2, . . . , rk} is the original set of partitions then they can be mapped into a set of n partitions {p1, p2, . . . , pn} by the use of a mapping function based on linear algebra. If we construct n linearly independent equations such that a11r1 + a12r2 + . . . + a1krk = c1 a21r1 + a22r2 + . . . + a2krk = c2 ... an1r1 + an2r2 + . . . + ankrk = cn where numbers aij , rj and ci are randomly chosen from the finite field Zp as in the previous sections, then the n new partitions are pi = {ai1, ai2, . . . , aik, ci}, 1 i n. The above linear equation can be written as matrix operation, 2 66666664 a11 a12 . . . a1k a21 a22 . . . a2k ... an1 an2 . . . ank 3 77777775 2 66666664 r1 r2 ... rk 3 77777775 = 2 66666664 c1 c2 ... cn 3 77777775 15 To recreate rj , 1 j k from the new partitions, any k of them can be brought together, 2 66666664 r1 r2 ... rk 3 77777775 = 2 66666664 am1 am2 . . . amk an1 an2 . . . ank ... ai1 ai2 . . . aik 3 77777775 −1 k×k 2 66666664 c1 c2 ... ck 3 77777775 k×1 A feature of the presented scheme is that new partitions may be added and deleted without affecting any of the existing partitions. 2.1.4 Complexity of the proposed protocols The complexity of the (k, k) data partitioning scheme The complexity of the (k, k) partitioning scheme is O(k). This is illustrated by the following algorithm. Algorithm 1a (k,k) Data Partitioning Scheme Input data d 1. Choose randomly and uniformly from a finite field Zp, k − 1 numbers r1, r2, . . . , rk−1. 2. Compute rk = d · (r1 · r2 · . . . rk−1)−1 mod p. 3. Construct the kth degree polynomial: p(k) = (x − r1)(x − r2) . . . (x − rk) mod p = xk + ak−1xk−1 + ak−2xk−2 + . . . + a1x + a0 mod p. 4. The roots r1, r2, . . . rk of the polynomial p(k) represent the data partitions. Output ! vector ~R = [r1, r2, . . . , rk] As seen from the above algorithm, partition creation requires k multiplications and one inversion operation modulo prime p. Hence the complexity of O(k). 16 Similarly, data reconstruction requires k multiplications which is illustrated by the following algorithm. Algorithm 1b (k,k) Data Reconstruction Input vector ~R = [r1, r2, . . . , rk] 1. Data d = r1 · r2 · . . . rk mod p. Output ! d Linear complexity of Algorithms 1a and 1b is highly desirable for resource constraint environments such as that of sensor networks. The complexity of the (k, n) data partitioning scheme The proposed (k, n) data partitioning scheme uses the (k, k) partitioning scheme as a subalgorithm. That is, we first obtain k partitions using Algorithm 1a and then introduce redundancy to cope with possible inaccessibility of some of the partitions. The (k, n) partitioning scheme works as follows. Algorithm 2a (k,n) Data Partitioning Scheme Input data d 1. Use Algorithm 1a to create k partitions of data d. Obtain vector ~R = [r1, r2, . . . , rk] 2. Generate n × k matrix A such that all the rows of matrix are linearly independent. Note. A simple way to choose such a matrix would be to choose a Vandermonde matrix of the following form A = 2 66666664 1 x1 x21 x31 . . . xk−1 1 1 x2 x22 x32 . . . xk−1 2 ... 1 xn x2 n x3 n . . . xk−1 n 3 77777775 17 3. Create n partitions by computing [A]n×k · [~R T ]k×1 = [C]n×1, where C = [c1, c2, . . . , cn]T . 4. The partitions are denoted by the pair pi = {xi, ci}, 1 i n. Output ! pi = {xi, ci}, 1 i n Algorithm 2b (k,n) Data Reconstruction Input any k partitions pi = {xi, ci} 1. Construct the k × k matrix B by choosing the rows of matrix A corresponding to the given pairs pi and compute B−1. 2. Reconstruct the k shares by evaluating [~R T ]k×1 = [B−1]k×k · [C]k×1. 3. Use Algorithm 1b to reconstruct data d, where the input is the algorithm is ~R . Output ! d The complexity of Algorithms 2a and 2b is O(n log2n) [52, 53]. A more efficient method, that eliminates multiplications during partition creation, would be to use a n×k matrix with elements from GF(2). Such matrices are used in errorcorrection coding [54]. This will reduce all multiplication operations to additions operations for partition creation (assuming the cost of multiplications with 0 and 1 can be ignored). And inversion of such a matrix during data reconstruction would require at most O(n3) operations and O(n) multiplications. An example of such a matrix in GF(2) is given below, where we have assumed n = 4 and k = 3. A = 2 66666664 1 0 1 0 1 0 0 1 1 1 1 0 3 77777775 18 Any k = 3 columns the above matrix A are linearly independent. Since the above matrix only requires multiplications with 0 and 1, the time complexity of the scheme becomes linear for partition creation. This is one of the advantages of our scheme over traditional schemes such as that of Shamir’s. A note on applications As seen above, Algorithms 1a and 1b require O(k) computations while Algorithms 2a and 2b require O(n log2n) computations. Both these implementations are efficient and suitable for resource constraint environments such as wireless sensor networks. Further, while sensors generate crucial data and transmit it securely to the base station, the reconstruction of data happens for most cases only at the base station. Since base stations have much higher computational capacity than the sensors themselves, a matrix in GF(2) may be used to reduce the computational burden on the sensors. 2.2 An alternate approach to partitioning Once the user has computed all the k roots then he may compute the equation resulting from (2.4) and store one or all of the roots on different servers and the coefficients on different servers. Recreation of original data can now be performed in two ways: either using (2.5) or choosing one of the roots at random and retrieving the coefficients and substituting the appropriate values in the equation (2.3) to compute −a0 xk + ak−1xk−1 + ak−2xk−2 + ... + a1x mod p (2.9) Therefore, the recreated data d (p − a0) mod p. Parenthetically, this may provide a scheme for fault tolerance. Additionally, a user may store just one of the roots and k − 1 coefficients on the network. These together represent k partitions. Note. Distinct sets of coefficients (for a given constant term in the equation) result 19 in distinct sets roots and vice versa. This is because two distinct sets of coefficients represent distinct polynomials because two polynomials are said to be equal if and only if they have the same coefficients. By the fundamental theorem of algebra, every polynomial has unique set of roots. Two sets of roots R1 and R2 are distinct if and only if 9ri8rj(ri 6= rj), where ri 2 R1 and rj 2 R2. To compute the corresponding polynomials and the two sets coefficients C1 and C2, we perform Qk i=1,ri2R1 (x−ri) 0 mod p and Qk j=1,rj2R2 (x−rj) 0 mod p and read the coefficients off from the resulting polynomials, respectively. It is clear the at least one of the factors of the two polynomials is distinct because at least one of the roots is distinct; hence the resulting polynomials for a distinct set of roots are distinct. Theorem 2.2 Determining the coefficients of a polynomial of degree k 2 in a finite field Zp, where p is prime, by brute force, requires (d pk−1 (k−1)!e) computations. Proof. If A represents the set of coefficients A = {a1, a2, ..., ak−1}, where 0 ai p − 1, then by the above note each distinct instance of set A gives rise to a distinct set of roots R = {r1, r2, ..., rk}, and conversely every distinct instance of set R gives rise to a distinct set of coefficients A. Therefore, if we fix d to a constant then (2.5) can be used to compute a set of roots. Every distinct set of roots is therefore a k − 1 combination of a multiset, where each element has infinite multiplicity, and equivalently a k−1 combination of set S = {0, 1, 2, ..., p−1} with repetition allowed. Thus, the number of possibilities for the choices of coefficients 20 is given by the following expression * p k − 1 + = 0 B@ p − 1 + (k − 1) k − 1 1 CA = (p+k−2)! (p−1)!(k−1)! = (p+k−2)(p+k−3)...(p+k−k)(p−1)! (p−1)!(k−1)! = (p+k−2)(p+k−3)...p (k−1)! d pk−1 (k−1)!e (2.10) Here we have used the fact that in practice p k 2, hence the result. We have ignored the one prohibited case of all coefficients being zero, which has no effect on our result. 2.3 A stronger variation to the protocol modulo a composite number An additional layer of security may be added to the implementation by performing computations modulo a composite number n = p · q, p and q are primes, and using an encryption exponent to encrypt the data before computing the roots of the equation. In such a variation, knowledge of all the roots and coefficients of the equation will not reveal any information about the data and the adversary will require to know the secret factors of n. For this we can use an equation such as the follows xk + Xk−1 i=1 ak−ixk−i + dy 0 mod n (2.11) 21 where y is a secretly chosen exponent and GCD(y, n) = 1. If the coefficients are chosen such that (2.11) has k roots then Yk i=1 ri dy mod n (2.12) Appropriate coefficients may be chosen in a manner similar to that described in previous sections. It is clear that compromise of all the roots and coefficients will at the most reveal c = dy mod n. In order to compute the original value of d, the adversary will require the factors of n which are held secret by the user. 2.4 Addressing the Data Partitions The previous sections consider the security of the proposed scheme when prime p and composite n are public knowledge. However, there is nothing that compels the user to disclose the values of p and n. If we assume that p and n are secret values then the partitions may be stored in the form of an “encrypted link list”. We define an encryptedlinklist as a link list in which every pointer is in encrypted form and in order to find out which node the present node points to, the user needs to decrypt the pointer which can be done only if certain secret information is known. If we assume that p and n are public values then the pointer can be so encrypted that each decryption either leads to multiple addresses or depends on the knowledge of the factors or both. Only the authentic user will know which of the multiple addresses is to be picked. Alternatively, one may use a random number generator and generate a random sequence of server IDs using a secret seed. One way to generate this seed may be to find the hash of the original data and use it to seed the generated sequence and keep the hash secret. 22 2.5 Security by Key Distribution In some cases, when there is a large amount of data, a user may find it inefficient to create data partitions and distribute them over the network but may wish to encrypt the data and store it on a single server which he trusts and keep encryption key secret. Almost always, encryption keys are very large random numbers and cannot be memorized. Therefore, the user may create partitions of the key and spread them over the network. This approach may be more efficient, if not more secure, than creating data partitions of enormous amounts of data. Furthermore, the access to the servers on which the key partitions are stored may be password restricted. We stress that this scenario is different from the secret sharing scenario where multiple parties are involved since our case involves data security for a single party. Keys may be partitioned using the scheme presented in the previous sections where the data is replaced by the key. Alternatively one may use the key partitioning scheme presented in the following subsection that uses polynomials in Galois Field GF(2m), for which, given the distributed nature of the scheme, the security will be better than polynomial generalization of power encryption transformations [55, 56]. 2.5.1 Key Partitioning in Galois Field 2m A Galois Field (2m) consists of polynomials of degree m − 1 or less, such that their coefficients lie in GF(2). For example, am−1xm−1 + am−2xm−2 + ... + a1x + a0, where ai = 1 or 0 represents a polynomial belonging to GF(2m) and it may be written in binary representation as (am−1am−2...a1a0). Multiplication in GF(2m) is performed modulo an irreducible polynomial g(x) = xm + bm−1xm−1 + ... + b1x + b0, where bi 2GF(2). A polynomial g(x) is said to be irreducible if it cannot be factored into two or more polynomials each with coefficients in GF(2) and each of degree less than m. Therefore, a user can generate a random key and partition it into k partitions 23 Figure 2.2: Illustration of key partitioning. using the following procedure. He generates k random polynomials of any random degree in GF(2m) and computes their product modulo the irreducible polynomial g(x). The resultant polynomial of degree m − 1 with binary coefficients is taken as the required key in its binary representation and the randomly generated polynomials are the partitions. Example 2. In order to generate a random 8 bit key and create 5 partitions of it, the user may proceed as follows. Let g(x) = x8 +x2 +1 be an irreducible polynomial in GF(2m). He chooses 5 polynomials of degree m − 1 or less (at random), such as p1(x) = x7 +x6 +x, p2(x) = x4 +x2 +x+1, p3(x) = x5 +x3, p4(x) = x7 +x6 +x5 +x and p5(x) = x6+x2+x+1 and computes there product, p1(x)p2(x)p3(x)p4(x)p5(x) k(x) mod g(x), where k(x) is the “key polynomial” and the coefficients are the binary representation of the key. k(x) p1(x)p2(x)p3(x)p4(x)p5(x) x6 + x4 + x3 + x + 1 mod g(x) Therefore, the random key generated is k=01011011 and the partitions are, p1=11000010, p2=00010111, p3=00101000, p4=11100010, p5=01000111. Note that the knowledge of k−1 partitions, in the scheme presented above, leaves the kth partition completely undetermined and the adversary still requires (2m−1) 24 trials to determine the encryption key. This is because all the partitions 2GF(2m) and that each one of them is independently and randomly chosen from the field. Hence, knowledge of k −1 partitions does not reveal any information about the choice made for the kth partition. Further, since the kth partition can be any random polynomial from the field, in order to compute this polynomial by brute force would require on an average half the number of trial of 2m, i.e. (2m−1). Since the right hand side of the equation (the key) is completely dependent on the choice of left hand side of the equation (the factors of the key), one would require (2m−1) trial to compute the key with the knowledge of k − 1 partitions [57]. In practice, keys are of 128 bits to 512 bits in AES, i.e. (2127) to (2511) possible trials on an average. The above scheme can be extended to a (k, n) partitioning scheme by converting the existing partitions into decimal integers and using the extension presented in section 2.1.3. 2.6 Partition redundancy v/s no redundancy In cases where the accessibility to the partitions is guaranteed and the data disks are sufficiently backedup, partition redundancy may lead to unnecessary wastage of storage space. This is certainly true for insystem storage where data partitions may be distributed over numerous sectors randomly and not contiguously stored. Also in case of network storage larger number of partitions would require larger upload time which may be avoidable. 2.7 Application in Internet Voting Protocols Internet voting is a challenge for cryptography because of its opposite requirements of confidentiality and verifiability. There is the further restriction of “fairness” that the intermediate election results must be kept secret. One of the ways to solve this 25 problem is to use multiple layers of encryption such that the decryption key for each layer is available with a different authority. This obviously leaves open the question as to who is to be entrusted the encrypted votes. A more effective way to implement fairness would be to avoid encryption keys altogether and divide each cast ballot into k or more pieces such that each authority is given one of the pieces [26]. This solves the problem of entrusting any one authority with all the votes and if any of the authorities (less than the threshold) try to cheat by deleting some of the cast ballots, then the votes may be recreated using the remaining partitions. Such a system implicitly provides a backup for the votes. 2.8 Conclusions We have described an implicit security architecture suited for the application of online storage. In this scheme data is partitioned in such a way that each partition is implicitly secure and does not need to be encrypted. These partitions are stored on different servers on the network which are known only to the user. Reconstruction of the data requires access to each server and the knowledge as to which servers the data partitions are stored. Several variations of this scheme are described, which include the implicit storage of encryption keys rather than the data, and where a subset of the partitions may be brought together to recreate the data. One can propose variations to the scheme presented in this paper where the data partitions need to be brought together in a definite sequence. One way to accomplish it is by representing partition in the following manner: p1, p1(p2), p2(p3), ..., where pi(pj) represents the encryption of pj by means of pi. Such a scheme will increase the complexity of the bruteforce decryption task for an adversary. The proposed scheme is also of potential use in sensor networks. To protect sensitive information on sensors that are physically compromised, one may partition 26 the data and store these parts on other sensors in the network. The key that is used to find the location of these parts is encrypted and this encryption is changed from time to time. This provides solutions that are more general than other distributed security models used in sensor networks [33]. Our approach leads to interesting research issues such as the optimal way to distribute the parts in a network of n sensors, so that the network can work even if some of the sensors happen to malfunction by themselves or as a consequence of intervention by an adversary. It also leads to the question of reliability and tolerance to faults in such a system. 27 CHAPTER 3 A TREE BASED RECURSIVE SCHEME FOR SPACE EFFICIENT SECURE DISTRIBUTED DATA STORAGE One of the techniques to implement distributed storage is the use of secret sharing scheme, with the difference that the secret is replaced with data. In this chapter we develop recursive secret sharing schemes for distributed storage. We will be presenting our scheme under the context of secret sharing, however, when applied to the problem of secure distributed data storage, the secrets may be replace with data that is to be distributed or it can be used in conjunction with other information dispersal algorithms for secure key distribution. Conventionally secret sharing schemes divide the data into pieces such that fewer than k pieces do not reveal any information about the secret, or data in our case, and fall under the information theoretic framework. Information theoretically secure secret sharing schemes [58, 59, 3, 60, 61, 62], are space inefficient. For example, a koutofn, denoted as (k, n), secret sharing scheme expands a secret of b bits into n shares each of at least b bits in size. Furthermore, since only k of these shares are needed to recreate the secret, each bit of any share, in a threshold secret sharing scheme, effectively conveys at most d 1 k e bits of secret. If k = n, as in the case of a nonthreshold scheme, where all the shares must be brought together to recreate the secret, the effective information conveyed by each bit of any share is d 1 ne bits of the secret. One way to improve space efficiency is to distribute shares smaller in size than the secret itself, however at the cost of reduction in security. Computational secret sharing 28 techniques have been developed to achieve this [63, 64, 65, 66, 8], in which a symmetric key is used to encrypt the original secret and the encrypted secret is divided into pieces to which redundancy is added by the use of block error correction techniques [8, 6, 18]. The encryption key is split into shares using information theoretically secure methods of secret sharing. This leads to an nfold increase in key size, shares of which have to be stored with every piece of the encrypted secret, hence incurring an overhead [8]. A method for sharing multiple secrets with security based on assumption of hardness of discrete logarithm problem is presented in [67]. In [68] a scheme based on systematic block codes and in [69, 70] schemes using Shamir’s secret sharing have been proposed to share multiple secrets, however they require a large amount of side information to be stored as public knowledge and further [68, 69, 70, 71] attempt to maintain ideal security. An extension of secret sharing schemes is visual cryptography [58] that aims at dividing an image into two or more shares such that when a predetermined number of shares are aligned and stacked together, the secret image is revealed [58, 72, 73], without the requirement of any computation. However, information theoretically secure approaches to visual cryptography also suffer from inefficiency in terms of number of bits of secret conveyed per bit of share. A number of multiimage hiding schemes have been developed [74, 75, 76, 77]. However, the implementation in [75] does not hide the information using “real” visual cryptography, in the sense that computation needs to be performed to extract the hidden information. Lou et. al. [74] use a secret key to generate a permutation order and the key needs to be conveyed in addition to the shares, becoming a overhead and also amounts to computation, which was to be avoided by visual cryptography. In [76], two watermarks are hidden, one during halftoning of original image and second while creating shares, however, both these hiding techniques do not conform to visual cryptography because, to extract first hidden image, an exclusive OR reconstruction 29 of the image needs to be performed, which is never done in visual cryptography and the extraction of the second hidden image requires additional XOR operations. Fang and Lin [77] hide only integer from 0 to 6, which has very limited applications and again the integers are not decoded visually but require computation to be extracted. Wu and Chen [78] propose a scheme to hide two images at a rotation angle of 90 degrees while Wu and Chang [79] discuss hiding multiple images using circular shares at a limited number of rotation angles. Hsu et. al. [80] extended the scheme to allow arbitrary rotation angle by rolling shares into rings. However, circular shares distort the aspect ratio of the original image. Further knowing how and by what degree the shares are to be rotated requires additional side information to be supplied along with the share. Also pixel expansion is an issue, for example in [81], which encodes two or more secrets into circular shares, the pixel expansion is proportional to the number of secrets being hidden. Another way to improve information efficiency is to hide additional secrets in the shares of the original secret, without increasing the share size of the latter in comparison to what it would be without the additional hidden secrets. This effectively reduces the share sizes per secret, taking both hidden and original secrets into account. Recursive hiding of secrets was proposed in [24] for this purpose and to serve as a steganographic channel, with applications to binary images and binary text but was only limited to a (2,2) secret sharing. The idea involved is recursive hiding of smaller secrets in shares of larger secrets with secret sizes doubling at every step, thereby increasing the information that every bit of share conveys to (n−1) n bit of secret i.e. nearly 100%. However, the scheme described in [24] is a nonthreshold scheme where all the shares are needed to recreate the secret. We present a recursive 2outofn visual secret sharing scheme which is an extension of the recursive 2outof2 visual secret sharing scheme. This small but important extension helps in understanding the basic idea behind recursive hiding of secrets. 30 In this chapter, we present a general (nonvisual) koutofn secret sharing scheme that hides additional information in the shares of the original secret, without any increase in the share sizes of the latter. This represents an increase in space efficiency of the secret sharing scheme. Further, our algorithm acts as a dual algorithm in the sense that it can be used as a multisecret sharing algorithm as well as a computational secret sharing algorithm. When used as a multisecret sharing algorithm, it recursively encodes different secrets into the shares of other secrets such that the “inner” secrets do not lead to any expansion in share sizes of the “outer” secrets. In other words, the shares of the “outer” secrets now also convey the “inner” secrets, thereby increasing the information efficiency. When used as a computational secret sharing algorithm, we simulate a multisecret sharing algorithm, by dividing the original (larger) secret into multiple pieces of smaller sizes. These pieces are then recursively encoded into shares such that some of the pieces act as the “inner secrets” and some as “outer secrets”. Further, since the algorithm produces shares of size on the order of the size of the pieces, it effectively results in smaller secret share sizes than conventional schemes. It is to be noted, however, that the security in both cases, the multisecret sharing and the computational secret sharing schemes is computational. The proposed recursive secret sharing scheme has applications in distributed online storage of information discussed in [6, 18]. Systems implementing such distributed data storage have been described in [82, 16, 83, 84, 64]. In section 3.1, we present a generalization of the (2,2) recursive scheme presented in [24] to a (2,3) threshold scheme for binary strings. The aim of this section is to make clear the idea of recursion, for text, in secrecy context and how it can be applied to improve the efficiency of secret sharing. Section 3.2, extends the (2,3) recursive scheme for text to a (n, k) recursive scheme, where, the new scheme is based on 31 repeated polynomial interpolation and sampling. Section 3.3 discusses the use of the proposed secret sharing scheme as a computational secret sharing scheme and section 3.4 comments on the security of the scheme, while section 6 is conclusions. 3.1 A Comparison Based (2, 3) Recursive Scheme For text represented as a binary sequence, a 2 out of 3 secret sharing scheme can be developed using a simple comparison based algorithm as follows: we divide a secret bit into 3 shares p1, p2, and p3 such that p1 = p2 = p3 if we wish to encode bit 0, and p1 6= p2 6= p3 if we wish to encode bit 1. To satisfy the above conditions we would need at least 3 symbols, say 0, 1 and 2. Thechap3:refore to encode bit 0 we could create pieces p1p2p3 as 000, 111, or 222. Whereas the candidates to encode bit 1 would be 012 and all possible permutations of it, i.e. 021, 102, 120, 210, and 201. In all, to encode secret bit 0 and secret bit 1, we have 3 and 6 possibilities, respectively. This asymmetry in the number of choices does not affect the security of the scheme because an adversary can guess a bit correctly, by brute force, with a probability of onehalf. This is the maximum security one can achieve for any one bit. Now in our scheme in the encoded form for any bit, given one of its shares, there exist two possible choices for the second share, i.e. either the second share is the same as the first or the second share is different from the first. Therefore, a player can guess the second share only with a probability of onehalf. Example 1. If M is a 27 bit long message that we wish to encode into 3 shares and the threshold is 2, then nonrecursive shares S1, S2, and S3 may be created as follows: M : 011011010110110011100101101 S1 : 102012012010201201201020102 S2 : 110020022120111210101221001 32 S3 : 121001002200021222001122200 Viewed as a ternary alphabet, the efficiency of this system is 33%. As a comparison, if 0, 1 and 2 are encoded using prefix coding as 0, 10, and 11 respectively, then we are effectively mapping each bit of secret into 5 bits of shares and the efficiency is only 27 27×5 = 1 5 , i.e. 20%. The above efficiency can be improved by recursively hiding additional secrets in the shares of M. However, since each bit is mapped into 3 shares, in order to take advantage of the recursive technique, the secrets at each step must increase by a factor of 3. We can then hide the following secrets M1, M2, and M3 in shares of M as follows (figure 3.1): Figure 3.1: Recursive hiding of smaller messages in the shares of larger messages Note that at each step we have used the shares of the previous smaller messages to create the shares of larger messages; these smaller shares are denoted in bold. Also, we have distributed the shares at each step so that no player has access to all the shares of the smaller messages and hence, every message (seen from a permessage view point) remains secure until at least two players come together. This approach is different from that discussed in [24], where the shares of smaller messages were all accumulated into one of the larger shares instead of distributing them among all the 33 possible players. As a result in [24], any player having that share which encodes the smaller images could in principle recreate these smaller images without the help of the other player, which in some cases might not be acceptable. Therefore, our new approach is more secure for certain applications. Figure 3.2 illustrates the development of recursion tree for example 1. It is seen that the tree forms a ternary structure with nodes at each level giving rise to 3 nodes in the following level, and the nodes shown in bold are the nodes carried over from the previous level. The shares are distributed from left to right one at a time, i.e. if we number the tree leaves starting from the left as 1,2,3,1,2,3 and so on. Then these numbers denote the player’s number to whom the shares belong. Figure 3.2: Illustration of recursion tree for example 1 (partial illustration) Also seen in figure 3.1 is that using recursive hiding of secrets, we have been able to encode 13 bits of M1, M2, and M3 and 27 bits of M into shares of M alone. As a result the efficiency, considering a ternary alphabet, is 13+27 3×27 = 40 81 1 2 (i.e. 50% compared to 33% in the nonrecursive case). If one considers the binary representations of each character then each share now conveys 13+27 5×27 = 8 27 1 3 bits. Compared to 1/5 bits in the conventional approach, this is an almost 40% increase in efficiency. 3.2 Proposed koutofn Recursive Secret Sharing Scheme A secret sharing scheme based on polynomial sampling and interpolation in finite field is discussed in [3], where the secret is mapped as a point at x = 0 and k − 1 34 additional points are chosen randomly and uniformly from the field. These k points are then interpolated to generate a polynomial of degree k−1, which is then sampled at n points (except at x = 0). These n samples, the (x, y) points, are distributed as shares among the players. In order to reconstruct the secret, any k of the shares (points) can be interpolated to regenerate the k − 1th degree polynomial, which can then be sampled at x = 0 to retrieve the secret. At each level of recursion, in our proposed scheme, we also use polynomial interpolation and sampling. However, we will be hiding additional pieces of information within the shares of the original secret. We work in a finite field Zp, where p is a prime and it is public knowledge. Further, we assume that a secret S is represented as string of numbers S = s1s2 . . . sr, where each si 2 Zp and S = r = nh, where S denotes the length of secret S for some integer h. For example, if we assume that the secret is a text message composed of ASCII characters, then it can be represented as a string of numbers less than p = 257 [6]. Furthermore, assume that we have another string denoted by M = m1m2 . . .mx, mi 2 Zp, where M = x = nh−1 n−1 , to be hidden within the shares of the original secret S. For instance, in example 1, n = 3, h = 3, and hence in the shares of the original secret S = nh = 33 = 27 bits long we were able to encode nh−1 n−1 = 13 additional bits of information. The upper limit on the number of additional “pieces” of information that can be encoded within the shares of the original message of size nh, in the proposed scheme, is nh−1 n−1 . We also observe that the recursive schemes proposed in this paper (as in section 2), forms a nary tree structure where the original secret forms the leaves of the tree and the hidden information forms the internal nodes. Consequently, a comparison can be made between the efficiency of conventional secret sharing schemes and tree 35 based recursive secret sharing schemes. In information theoretic secret sharing schemes, each share of the secret is of the same size as the secret itself, as a result, for a secret of given size nh, each of the n shares are of size nh. This results in an efficiency of c = nh nh·n = 1 n, where denotes efficiency and subscript c denotes conventional (information theoretically secure) scheme. A tree based recursive scheme hides nh−1 n−1 pieces of additional information within the n shares each of size nh. Consequently, the efficiency of tree based recursive schemes is r = 1 nh·n · (nh + nh−1 n−1 ). A recursive tree based secret sharing improves the efficiency of conventional secret sharing methods by a factor of e = 1 + 1 nh · (nh−1 n−1 ). Which is one plus the ratio of the number of additional pieces hidden within the pieces of the original secret to the number of pieces of the original secret. With the above in mind, the proposed scheme works as follows: Inputs: Original secret  S = s1s2 . . . sr; message to be hidden  M = m1m2 . . .mx; n; k and p where sa, mb 2 Zp, 1 a r, 1 b x, r = nh and x = nh−1 n−1 . Algorithm 3a. Creation of shares 1. Choose k − 1 random numbers rl 2 Zp, l = 1 to k − 1 uniformly and randomly. 2. Interpolate a polynomial p11(x) using k points (0,m1) and (l, rl), 1 l k−1. Let pic(x), denotes the cth polynomial at the ith level in the recursion (tree). 3. Sample p11(x) at n points: D1 11, D2 11, . . ., Dn 11, where in Dk ic, k refers to the index number of the sample as well as the xcoordinate at which the sample is taken; i and c are the same as noted in point 2 above. 4. Initialize a = 1, b = 2. 5. For i = 2 to h 36 (a) c = 1 (b) For k = 1 to ni−2 For j = 1 to n if i < h i. Interpolate pic(x), using points (0,mb), (j,Dj (i−1)k) and k − 2 randomly and uniformly chosen numbers. ii. Sample pic(x) to generate samples Dq ic, 1 q n. iii. b = b + 1, c = c + 1 else i. Interpolate pic(x), using points (0, sa), (j,Dj (i−1)k) and k − 2 randomly and uniformly chosen numbers. ii. Sample pic(x) to generate samples Dq ic, 1 q n. iii. a = a + 1 and c = c + 1 iv. Distribute Dq ic, 1 q n to players from 1 to n, respectively. We do not consider the final shares as a part of the tree. Hence, the tree has only nh leaves, which are then interpolated and sampled at n points to generate the final shares. Algorithm 3b. Reconstruction of secret and hidden information 1. For 1 c r (a) Interpolate k shares Di hc, 1 i k to generate polynomial Phc(x). (b) Sample Phc(x) at x = 0 to retrieve sc. 2. For i = h − 1 down to 1 (a) j = 1, b = 1, q = ni−1−1 n−1 + 1 (b) For c = 1 to ni i. Sample P(i+1)c(x) at point x = b, denote as Dx ij . 37 Figure 3.3: Illustration of application of algorithm 1 for n = 4. 38 ii. b = b + 1 iii. if c mod n = 0 A. Interpolate (x,Dx ij), x = 1 to n to generate Pij(x). B. Sample Pij(x) at x = 0 to retrieve mq. C. x = 1, j = j + 1, b = 1, q = q + 1. The share reconstruction process traverses the tree from leaves to the root (figure 3.3), while the reconstruction process retrieves the secret and the hidden messages in a last in first out manner. In figure 3.3, R denotes a vector of length k − 2 numbers randomly and uniformly chosen from the field. Note that R0 is a k−1 element vector in the first step (level 1). Each instance of R is independent. 3.3 Proposed Computational Secret Sharing Scheme We can use the proposed scheme of section 3.2 as a computational secret sharing scheme by dividing the secret into smaller pieces and then recursively encoding the pieces as suggested in algorithm 3a. However the difference would be that instead of inner pieces being that of a different message to be hidden, they would be pieces of the secret itself. If we create m pieces pi of the secret, then in the ideal case m should be equal to nh+1−1 n−1 , for a given n and some integer h, where pi = S m . Further, the tree has nh number of leaves which then yield n shares each, resulting in n · nh number of shares. Therefore, each of the n player receives a share of effective size nh · S m = (1 − m−1 m·n ) · S. This represents a reduction in share sizes; for example, if the secret is broken into m = 15 pieces and n = 2, then the effective share size for each player is (1 − 14 30 ) · S compared to S in the conventional case. However, the above holds only when mnh+1−1 n−1 . A plot for the values of m for n = 2 to 5 and h = 1 to 5 is shown in figure 3.4a. Figure 3.4b shows how the size of resulting shares change relative to the size of 39 (a) (b) Figure 3.4: (a) Plot of possible values m can take as n and h vary. (b) Plot of new effective share sizes relative to the size of the original secret S versus m and n (only a few values are shown). 40 original secret. The plot in figure 3.4b is drawn for the values that the number of pieces m can take from that shown in figure 3.4a against the number of players n. The zaxis is the ratio of the new effective share size to the share size in a conventional scheme, i.e. 1 − m−1 m·n . The efficiency improvement factor for the ideal case is given by 1 + 1 nh · (nh−1 n−1 ). A plot of how efficiency improvement factor varies as a factor of n and h is given in figure 3.5. Figure 3.5 shows that given a n, more the height h, better the efficiency improvement factor. An efficiency factor of 2 implies a 50% reduction in share size compared to information theoretically secure schemes where each share is of size S. Therefore, new effective share size is given by SN = S efficiency improvement factor , where S is the original secret size. Optimal number of pieces: In practice, since the number of pieces may be arbitrary, the nary tree may or may not be complete, and one may need to stuff additional (dummy) pieces to complete the tree. The number of pieces required to complete the tree is a factor of the height h of tree. Further in order to maximize the information efficiency, we would like to determine what h one would want to choose. In general, we assume that we are working in a decimal base, i.e. each secret may be represented as a sequence of integers 09. This means that the smallest piece one may create is a single digit number. Also, note that the prime p used in algorithm 3, can only be chosen after the piece sizes are decided upon. For example, in the case of smallest possible pieces (single digits) a prime p = 11 would suffice. However, if one was to choose each piece to be of two digits in size, then a prime p = 101 would be needed. Let us redefine m to be the number of smallest pieces in the original secret, for example the number of digits in the secret. In order to understand how stuffing of pieces would work, assume that we are working in a binary field; therefore smallest piece that can be created is of one bit 41 (a) (b) Figure 3.5: (a) Plot of efficiency improvement factor as a function of n and h. (b) Plot of efficiency improvement factor as a function of n and h (larger values). 42 in size. Since the total number of pieces has to form a nary tree, if we denote by m the number of pieces of the original secret then m may not always be an integral multiple of nh+1−1 n−1 for the given value of n and any value of h. As a result, in order to complete the tree, we may either need to adjust the piece sizes (to more than one bit in size, in turn changing the prime p required in algorithm 1), and/or stuff the secret with dummy bits. Below, we will investigate both the cases and determine which case results in a better efficiency. Assume single bit pieces and bit stuffing (if required), and that m and n are fixed. Since the last level of the tree has nh number of leaves, each player will receive nh bits. Therefore, the value of h chosen will decide the number of bits stuffed to complete the tree. Also, since we would want to have each piece as a single bit, we note that if nh > m, then each player will receive more bits than originally in the secret, which is not desired. Consequently, one of the criterions in choosing h is that nh m, where m is the total number of bits in the original secret. This gives the upper bound on h. Also, to maintain the requirement of one bit per piece, the total number of nodes in the tree must be greater than the total number of bits in the original secret. Hence, nh+1−1 n−1 m gives the lower bound for h. On the other hand, if we assume that each pieces may be larger in size than the minimum size possible, then starting from the upper bound on h, h = blogn(m)c, we could gradually reduce the tree height by one at each step and then readjust the shares so as to complete the tree, with or without stuffing of pieces. Then a comparison could be made between the resulting efficiencies. Note that as we change the piece sizes, we may require changing the prime p for algorithm 1. For example, again consider working in a binary field and that the original secret consists of m = 8 bits and let n = 2, so that the algorithm constructs a binary tree. Since, the smallest piece possible is a bit, if one was to construct a recursion tree with one bit pieces, then the tree at the 4th (last) level requires 7 bits stuffed to be 43 (a) (b) Figure 3.6: (a) Shows how the height h changes as a function of the number of pieces m and the number of players n. (b) Shows how the efficiency improvement factor changes as a function of arbitrary m and n. In both (a) and (b) each piece is taken to be of the minimum size possible (ex. a single digit when using a decimal base). 44 complete. When these bits are divided into shares, since the last level now has 8 bits, it would result in each of the two shares being 8 bits. However, if one was to reduce the tree height by one level by adjusting the pieces to be of two bits each then the 3rd (last) level of the tree would require some pieces to be stuffed and would result in 4 × 2 bits for each share. This does not result in any efficiency improvement yet. However, if one was to further reduce another level of the tree, so that the tree now has only two levels, then each piece would be of 3 bits each, where only 1 dummy bit is stuffed. Since, now the 2nd (last) level of the tree has 2 × 3 bits, each share would be of 6 bits each. This is a reduction in share sizes compared to a nonrecursive scheme. Consequently, in general, in order to maximize efficiency, the height h of the recursion tree must be chosen such that it minimizes d S nh+1−1 · (n − 1)e × nh for 1 h blogn(m)c. Here, m denotes the number of pieces of the smallest size present in the original secret corresponding to the base that we working in. The section 3.5 shows plots for the maximum efficiency improvement achievable and the corresponding height against the number of (smallest) pieces present in the original secret. 3.4 On Security of the Proposed Schemes When applying a secret sharing scheme based on polynomial interpolation and sampling (Shamir’s scheme) where k−1 random numbers are interpolated along with the secret to generate a kth degree equation, the samples (taken appropriately, excluding at x = 0) do not provide any information about the secret which is mapped as point at x = 0. The first step in the algorithm, hence, directly executes Shamir’s secret sharing scheme. The shares so generated, can then be treated as random numbers and are reused as one of the points in further encoding of secrets. As a result, during the share construction (assuming previous shares are treated 45 as random number), we have used k−1 random numbers at each step (a length k−2 vector R and a sample Dk ij from the previous iteration, see figure 3.3). This along with the secret (or secret piece) are used to interpolate a kth degree polynomial which is sampled at n points at each step. Now, if the dummy pieces are preagreed pieces (such as special characters or zeros), then each player would know, two points, one the sample given to him and the other the dummy piece itself used during interpolation, and hence would need only k− 2 additional players to collude to recreate the node corresponding to that polynomial. This may lead to partial disclosure of secret. As a result, the dummy pieces chosen to stuff must be uniformly and randomly chosen from the field. Side information regarding the number of dummy pieces stuffed would convey to the players, how many trailing pieces are to be discarded. Assuming that the dummy pieces are randomly and uniformly chosen elements from the field, it is clear that k players need to collude in order to recreate the nodes in the last level of the tree and then proceed from there. However, since we have encoded additional secrets or created smaller effective shares, the overall security of the scheme is inversely proportional to the efficiency improvement factor. This security is in turn relative to the security achieved in information theoretically secure schemes. For example, if the efficiency improvement factor is 1, then the security of the scheme is the same as the security of an information theoretically security scheme. However, if the efficiency factor is 2, then the security is half relative to that provided by an information theoretically secure scheme. 3.5 Efficiency and height plots Figures 3.7 and 3.8 show the maximum efficiency improvement factor that can be achieved and the corresponding height h for it as the number of (smallest) pieces 46 vary in the original secret. Figure 3.7: Plot of maximum efficiency improvement factor with corresponding height h against m. Here n=2. 3.6 Recursive hiding in threshold visual cryptography The idea described in [24] is applied to images to develop a recursive 2 out of 3 visual cryptographic scheme [85]. For this purpose we divide each pixel into 3 subpixels as shown in figure 3.9. As seen in figure 3.9, when the partitions of white pixel are stacked upon each other one third of the pixel is white and hence appears light gray to human eye. However, the subpixels of the black pixel are so arranged that when 2 shares are stacked together, the resulting pixel is completely dark. Yet another way to create subpixels would be to have only one third of the subpixel colored dark. Therefore, when subpixels of a white pixel are stacked upon each other 47 (a) (b) Figure 3.8: Plot of maximum efficiency improvement factor with corresponding height h against m (a) n=3 (b) n=5. 48 Figure 3.9: Possible partitions for black and white pixels they would appear light gray and the stacking of the subpixels of a black pixel would result in dark gray. However, the human eye can perceives the difference between gray and completely dark pixels better than two different shades of gray itself. Hence our construction of subpixels in figure 3.9. As an example to make the working of the proposed scheme clear, we present in figure 3.10 the encoding of a 3×3 pixel image such that each share of the 3×3 image contains shares of a 1 pixel secret image and a 3×1 pixel secret image. The subpixels of an original pixel can be represented as a matrix. For example if the original pixel was black then the 3 shares representing it may be written as [100]T , [010]T , and [001]T . Since, these matrices can be stored as a sequence of bits; it implies that there is an expansion by a factor of 1×9=9 because the original black pixel can be represented as a single bit 1, while each of the 3 shares consists of 3 subpixels requiring 3 bits for their representation. If we were not to perform a recursive hiding, we would be creating 9×9=81 bits for each share corresponding to 9 pixels of the original image (figure 3.10). However, using recursive hiding we have been able 49 Figure 3.10: Illustration of recursive hiding of secret images in shares of larger original image using a 2outof3 threshold scheme to hide additional 1×9+3×9=9+27=36 bits of information in those 81 bits, thereby increasing the information conveyed per share of the original image. Higher efficiency could be achieved if we were to number the subpixels as 0, 1, and 2 and use prefix coding to represent these numbers and store them instead of storing the matrices or pixels. This would only lead to a per bit expansion factor of 5, instead of 9 and the efficiency improvement will be similar to that in the case of text, i.e. an improvement of 40%. Figure 3.11 shows the application of the proposed scheme to three images, smallest image being a Smiley face, next being a watermark and the third and the largest image being that of Lena. Figure 3.12 shows the reconstruction of hidden images after appropriately extracting the smaller shares from the shares of Lena. 50 Figure 3.11: Illustration of the process of recursive hiding of secrets in shares of larger original image 51 Figure 3.12: Illustration of regeneration of smaller images from the shares hidden inside the shares of the original larger image 52 3.7 Conclusions This paper has presented a recursive scheme for multisecret sharing, which in turn can be used to create shares of smaller sizes by dividing the secret into smaller pieces and then simulating multisecret sharing. The scheme forms a recursion tree and does not require any encryption key, unlike previously proposed computational secret sharing schemes. The efficiency and security of the scheme have been analyzed and it is seen that a efficiency improvement is achieved with a tradeoff against the security of the scheme and an inverse relation between the two has been established. The proposed scheme has widespread applications in secure distributed storage and information dispersal protocols by substituting the secret with the data or pieces of the data that is to be distributed. Further, it may be used as a steganographic channel to transmit hidden information in secret sharing, which may be used for authentication and verification of shares and the reconstructed secret itself. 53 CHAPTER 4 INTERNET VOTING PROTOCOL BASED ON IMPLICIT SECURITY One of the important applications of distributed data storage is in internet voting. In this chapter we see how secure distributed data storage can be leveraged to meet the requirements of internet voting systems. The basic idea involves treating a voter’s cast ballot as data and distribute pieces of this vote among different servers belonging to different authorities. The contrasting requirements of confidentiality and verifiability are fundamental to internet voting protocols. The requirements are such that they preclude a general solution but satisfactory solutions are possible for specific applications. Internet voting represents an application area where these requirements may be met using tools of cryptography and a distributed security architecture. In spite of certain limitations of current methods [86], Internet voting is expected to be used widely in the next few years [87] since it offers ease of access to senior citizens, disabled people, people traveling on electionday, citizens living abroad who are eligible to vote, and eliminates the hassle of obtaining an absentee ballot in advance. It also encourages larger participation on the part of the younger generation that has become accustomed to online banking, online shopping, secure email transactions and secure online storage. The acceptance of Internet voting will become wider if limitations of the current methods are lessened. Several voting schemes are to be found in the literature [88, 89, 90, 91, 92, 93, 94, 86]. Conventionally, blind signatures [95] are popular for achieving ballot secrecy, while anonymization is performed using MixNets that use multiple encryptions, decryptions and random permutations [89, 90, 91]. Other schemes treat votes as ecash 54 [94, 93, 96], exposing the voter and his ballot if a recast is attempted. Some schemes propose to protect intermediate election results [88] by asking the voter to encrypt the vote. However, it is not clear how the decryption key is to be provided at the time of vote counting. Often postal services are used as a part of the scheme to provide untappable channels [92], but their use would clearly defeat the purpose of Internet voting that aims at eliminating postal delays and paperwork. Further, such schemes are not suitable for large scale elections due to the overhead. In order to minimize the effects of viruses, Trojan horses and denial of service attacks [47], Internet elections are held over a period of several days, as in the case of Switzerland for two weeks [97], and in Estonia [98] for a few days. Despite the long voting periods, undecided voters and conservative people tend to wait until the last day to cast their ballots which causes the network attacks to be aimed at this last minute voting period. Further, proper design of ballots is a critical element in their effectiveness. Poorly designed interfaces and ballots can lead to voting errors (similar to errors in choosing wrong options in online bookings and purchases), with the unacceptable effect of a wrong candidate receiving the vote. Instances of such mistakes were seen in the California Recall elections [99]. Most existing protocols do not make any provisions to correct such mistakes. One could propose that voting be allowed only after the campaigning for the elections is over. But to ensure that campaigning cease on a certain date might be impossible to enforce, and the public information available on the media can lead people to change their mind. Further, such an idea would not eliminate mistakes in voting. Therefore, we believe that election systems should be designed to allow ballot recasts, i.e. a voter should be able to change his vote either during the period that the voting is open or during a specified time frame when ballot recasts are allowed. This will encourage a larger participation by voters in the early voting process because they 55 will now have an option of changing their votes if they wished to do so. We further believe that ballot recasts will act to deter vote selling, because the vote buyer cannot be sure if the seller has sold his vote to multiple parties. The scheme described in this paper allows voters to recast ballots, while protecting the intermediate election results using an implicit data security architecture. Rabin [6] spoke of distributed storage of data over different servers in order to protect the data. He, however, discounted the use of secret sharing schemes for data distribution because of their space inefficiency and resorted to an insecure method of distribution. However, since then several space efficient secret sharing schemes [23, 100, 85, 24] have been developed based on the idea of recursion. In recursive secret sharing, the original secret is broken into several pieces and the pieces are divided into shares one by one, by repeatedly using already generated shares. Since we only have a small secret, namely the cast ballot, to share between different servers, we use recursion to hide additional information within the shares to provide a mechanism for validation of shares. We call this model of security as implicit data security, where the word implicit signifies that no explicit encryption keys have been used, but the data is stored in a manner that every piece is implicitly secure. The problem of recast of ballots was considered in [101], where the intermediate election results are protected using secret sharing. However, the scheme has three shortcomings: the lack of redundancy in the system, where the loss of a single piece of the ballot results in a loss of that vote; lack of verifiability for the pieces brought together during the tallying process; and there is no mechanism for cheater identification. 4.1 Our Contribution We overcome the limitations of the scheme [101] and use the technique of generating anonymous IDs for ballot secrecy that eliminates the need for MixNets. These 56 anonymous IDs then become the anchor point for the votes and are recorded along with the cast ballot. Consequently, to change a vote, the voter only needs to send the anonymous ID and the new ballot, which replaces the old one in the system. Our contribution here is threefold: First, we propose a new recursive method for dividing the votes into n pieces such that loss of some of the pieces (less than a threshold k) does not result in the loss of the vote. Second, we present a method for validation/verification of the ballot pieces that are brought together at the time of tally. Third, a variant of the protocol is presented that can detect invalid ballot pieces as well as expose the cheating server. The improvements are implemented using recursion such that it does not lead to an increase in share size and stores only a small constant amount O(1) of information for verification. 4.2 The Proposed Protocol The protocol proceeds as follows: The voter contacts the registration authority (RA) using his real identity. He then randomly generates a number, blinds it and gets it signed by the RA. Upon receiving this signed number, he unblinds it and uses it as his anonymous ID. Consequently, to cast his ballot, the voter does not blind the ballot but simply uses the anonymous ID to vote. However, in order to protect the intermediate election results, the voter divides his ballot into several pieces and sends them to distinct servers. This provides a distributed security architecture. Before dividing the ballot, the voter encodes certain secret information within the pieces of the ballot and distributes a oneway hash of the secret information among the servers with the ballot pieces. We present a new procedure for dividing the votes into pieces such that additional 57 information may be hidden within the pieces. This hidden information can be used to validate the pieces at the time of ballot counting. Further, the additional information does not increase the share sizes and does not increase algorithmic complexity of the scheme being used. The procedure for registration and creating anonymous ID is similar to that described in [101]. Here, gx is the public key of the registration authority (RA) and x is its corresponding private key. All computations are performed modulo a prime p where g is a primitive root that is taken to be public knowledge. Further, mx denotes the signature on a message m, where 1 < m < (p − 1). A oneway hashing function h(·), used for verification, satisfies following properties [102]. 1. h(·) when applied to an argument produces a fixedsize output. 2. Given x, it is easy to compute h(x); but given h(x) it is computationally infeasible to determine x. 3. h(·) is collision free, i.e. it is computationally infeasible to find distinct x and y with h(x) = h(y). At the time of registration the voters send a oneway hash of the anonymous ID that they have generated. The registration authority maintains a list L of all the oneways hashes that have been used in order to avoid collisions between the anonymous IDs. It is assumed that the voter has n servers at disposal to cast his ballot; and any k of them are required to reconstruct the ballot, k n. Every vote V is assumed to be a number in Zp that is assigned to a candidate and is public knowledge. 4.2.1 Registration Phase 1. The voter randomly and uniformly chooses a number rid 2 Zp. 58 2. The voter randomly and uniformly picks a blinding factor b and computes u = rid · gb mod p. He sends a 3tuple to the registration authority (RA): u, Vid, h(rid); where Vid is his true identity in cleartext. 3. If the registration authority finds the voter eligible to vote (by checking his Vid, it checks if h(rid) is already present in L. If the hash is found, then the RA requests the voter to repeat steps 1 & 2. If the hash is not found, the RA adds the new h(rid) to L and sends to the voter ux. 4. The voter retrieves the signed rid as follows: (a) Using the RA’s public key compute v = (gx)b mod p and v−1. (b) Compute r = u · v−1 = (rid)x mod p. 5. The voter randomly and uniformly chooses an exponent c and generates: d = ((rid)xgx)c and sends it to the RA. 6. The RA sends back: e = d 1 x . 7. The voter compares if (rid · g)c is equal to e, he accepts (rid)x as valid signature. The voter at the end of registration phase has a valid signed rid that may be used to vote and proceeds to cast his ballot as follows. 4.2.2 Voting Phase 1. The voter contacts the online polling station using a secure shell (SSL) connection over the Internet and sends: (rid)x mod p, rid. 2. The polling station verifies the validity of rid by conducting a zeroknowledge challenge/response protocol with the RA to validate the signature. 59 3. If the signature is found valid the online polling station stores the voter’s rid and provides the voter with a secure session key that he can use to cast his ballot by contacting the n voting servers. 4. The voter generates a random secret message M. 5. The voter chooses the ballot V corresponding to the candidate that he wants to vote for and divides the ballot into n pieces using Algorithm A(V,M,k,n) (detailed below) which returns a set of n points, (j + k − 1,Dj), 8 j from 1 to n. 6. The voter sends to every server Sj : ((j + k − 1,Dj), h(M)), for all j = 1 to n. Algorithm A(V,M,k,n) takes as input: ballot V; message M; and integers k and n. Algorithm A(V,M,k,n)  Dealing Phase 1. Divide M into k − 2 pieces: m1, m2, ..., mk−2 2 Zp, such that concatenations of mis, taken in order, is equal to M. 2. Choose randomly and uniformly a number r1 and compute r2 = m1 · (r1)−1 mod p. 3. For i = 2 to k − 2. Compute ri+1 = mi · ( i j=1rj)−1 mod p. 4. Map ri as points (i, ri), 1 i k − 1. 5. Map the ballot V as point (0, V ). 6. Interpolate points (0, V ) and (i, ri), 1 i k − 1 as the polynomial f(x). 7. Evaluate n samples: Di = f(x), where x = k, k + 1, k + 2, . . . , (k + n − 1). 60 8. Output (i + k − 1,Di), 1 i n. To understand the working of Algorithm A, consider the following example. Example 1. Let the voter have n = 8 servers at his disposal to cast his ballot. Each of these servers may be at separate locations and controlled by an independent authority. Let k = 5 be the threshold number of servers that must remain honest and come together to reconstruct a vote correctly. Further, let the prime field chosen to work in be p = 257. Each candidate on the ballot is assigned a candidate ID which may be a random number from the field Zp. Consequently, to cast a ballot the voter must send this number (the candidate ID) to the servers. Assume, in order to implement authentication the voter chooses to hide a secret message M =“USA”, in his ballot. The message to be hidden is entirely a choice of the voter and is it may as well be “RUSSIA”. Let us assume that the voter wishes to cast a ballot to a candidate with candidate ID 157. Therefore, the cast ballot is V = 157. If we consider the voter’s secret message M=“USA” to be formed of ASCII characters, then M can be divided into three pieces such that m1 = 85, m2 = 83 and m3 = 65, such that each belongs to Zp. Under such conditions, the algorithm proceeds as follows, 1. Divide M into 3 pieces: m1 = 85, m2 = 83 and m3 = 65. 2. Choose randomly and uniformly a number r1 = 101 2 Zp and compute r2 m1 · r−1 1 85 · (101)−1 85 · 28 67 mod 257. 3. Compute r3 m2 · (r1 · r2)−1 83 · (101 · 67)−1 83 · 127 4 mod 257. 4. Compute r4 m3 · (r1 · r2 · r3)−1 65 · (101 · 67 · 4)−1 65 · 96 72 mod 257. 61 5. Map the ris as points (1, r1) = (1, 101), (2, r2) = (2, 67), (3, r3) = (3, 4), and (4, r4) = (4, 72). 6. Map the ballot V as point (0, V ) = (0, 157). 7. Interpolate the five points, (0, V ) and (i, ri), to generate a 4th degree polynomial, f(x) = 148x4 + 3x3 + 251x2 + 56x + 157 mod 257. 8. Evaluate f(x) at eight points (starting from x = k): D1 = f(5) = 128, D2 = f(6) = 240, D3 = f(7) = 173, D4 = f(8) = 160, D5 = f(9) = 131, D6 = f(10) = 227, D7 = f(11) = 29, and D8 = f(12) = 100. 9. Return (j + k − 1,Dj)s; where k = 5 and j = 1 to n. Note that in the above example, the voter has recursively encoded the pieces of his secret message into the shares of the ballot. Steps 24 generate 4 points that are required to simulate the 4 randomly chosen points in the Shamir’s secret sharing scheme. Step 7 executes Shamir’s scheme assuming ris are randomly chosen points and the ballot as the ycoordinate at x = 0. Finally, the algorithm returns 8 shares of the cast ballot, these shares have the secret message hidden within them. Each of these eight pieces is sent to a different voting server, along with the hash of the secret message. 4.2.3 Counting Phase Assume a threshold k, number of servers pool their pieces, (j + k − 1,Dj)s, together to reconstruct the votes. (If more than the threshold number of servers are available, then it may provide a mechanism for crosschecking and verification.) 1. Use Algorithm A  Reconstruction Phase (detailed below) to reconstruct the hidden message and the polynomial f0(x). 2. Concatenate m0 js in order to reconstruct M0. 62 3. Compute h(M0). 4. If h(M0) is equal to h(M), then the shares are valid and f(x) = f0(x). 5. Retrieve, the cast ballot as V = f0(0). Algorithm A  Reconstruction Phase takes as input a set of points (shares) and returns the pieces m0 j of the hidden message and a polynomial f0(x) constructed by interpolation of the shares. Algorithm A  Reconstruction Phase 1. Interpolate the shares (points) to reconstruct f0(x). 2. Sample f0(x) at points x = 1, 2, . . . , k − 1 to retrieve r0i , 1 i k − 1. 3. Reconstruct the hidden message as follows: (a) Do for j = 1 to k − 2: m0 j = i=j+1 i=1 r0i mod p. 4. Return m0 j , 1 j (k − 2), and f0(x). Example 2. Recall from example 1, we had created eight shares (or pieces) of the cast ballot and each of these pieces were sent to a different voting server. After the polling is closed, the servers combine their pieces to recreate the cast ballot. Since only k = 5 pieces are required for recreation, assume that the 5 pieces that are brought together for reconstruction of the vote are (6,D2) = (6, 240), (7,D3) = (7, 173), (9,D5) = (9, 131), (11,D7) = (11, 29), and (12,D8) = (12, 100). The reconstruction proceeds as follows, 1. Interpolate the shares to reconstruct the 4th degree polynomial f0(x) = 148x4+ 3x3 + 251x2 + 56x + 157. 63 2. Sample f0(x) at x = 0 to retrieve the ballot V 0 = f0(0). (We denote the retrieved ballot as V 0 because its validity is yet to checked, which is done in the following steps.) 3. Sample f0(x) at 4 points x = 1, 2, 3, 4: r01 = f0(1) = 101, r02 = f0(2) = 67, r03 = f0(3) = 4, and r04 = f0(4) = 72. 4. Reconstruct the pieces of the hidden message, (a) m0 1 = r01 · r02 = 85. (b) m0 2 = r01 · r02 · r03 = 83. (c) m0 3 = r01 · r02 · r03 · r04 = 65. 5. Concatenate m0 1, m0 2, and m0 3 to reconstruct M0. 6. Evaluate the hash: h(M0). 7. If h(M0)=h(M), then the shares are valid and the vote, V 0 = V , is counted in the tally. If more than 5 servers come together to recreate the ballot, then any 5 shares can be chosen at random and their validity checked. If the shares turn out to be invalid, then another set of 5 shares can be checked or the extra share can be used to cross check the reconstructed ballot. However, for detection of invalid pieces, only 5 shares are required. 4.2.4 Recasting of ballot If a voter wishes to change his vote, he may do so by contacting the online polling booth using his rid to authenticate himself and follow the procedure described above. The servers will overwrite his previously cast ballot partitions if any. 64 The protocol has been able to detect attempts of cheating by validating the pieces of the cast ballot. 4.3 Security of the proposed protocol The security of the proposed protocol depends on the security of the pieces created for the votes. First, in the dealing process we recursively encode the secret message M. We also provide each server with h(M). Hashing functions are known to be computationally secure and do not provide any information about the hidden message M, in practice. Since the message M is a randomly chosen secret message and r1 was randomly and uniformly chosen from the field, we may consider (i, ri)s as random points. We use these random points to interpolate a polynomial with the vote as the coefficient free term (as a point mapped at x = 0). This polynomial is then sampled at n new points. If k − 1 servers collude to cheat, they would have no knowledge of the kth point and by the properties of interpolation, all the polynomials will remain equally probable and therefore, all the ballots will remain equally probable, i.e. if there are m valid ballots, then each of them have a probability of 1 m. Now assume that an extremely powerful server is able to break the hashing function determining mis. The next task, in order to determine the vote, is to determine rjs. But it turns out that knowledge of M does yield all the values of rjs as shown below: m1 is chosen to be a product of two numbers r1 and r2, here r1 is randomly and uniformly chosen from the field and therefore given any number y from the field Pr(y = r1) = Pr(y = r2) = 1 p . Hence, the pieces of m1 are secure. However, ri, 3 i k − 1 may be determined, using the mis. Further, the voting protocol uses ris as points at x = 1 to k − 1 along with the ballot V at x = 0, i.e. V is the coefficient free term in the polynomial. It turns out that since we cannot determine r1 and r2 with a probability greater than a random guess, all polynomials remain equally probable, even if M is known. The 65 last step of division of ballots into pieces, although not the same, is close to Shamir’s implementation of secret sharing [3]. Figure 4.1 shows two graphs. Graph (a) illustrates the process of polynomial interpolation using (0, V ) and (i, ri), i = 1 to 8. Graph (b) depicts the fact that if M is known (by breaking of the hashing function), then r1 and r2 remain undetermined and do not reveal any information about the ballot because all the polynomials remain equally probable; a few polynomials are shown for illustration. Figure 4.1: (a) Polynomial interpolation using vote (0, V ) and (i, ri), i = 1 to 8. (b) Lack of knowledge of (1, r1) and (2, r2) leaves the polynomial undetermined; hence the vote remains secure. 66 The protocol further satisfies the various requirements of electronic voting: 1. Receipt freeness: The proposed protocol is receipt free. 2. Distributed security: It is assumed that at least one of the servers will be honest (an assumption similar to that of MixNet schemes) providing assurance to the voter that his vote will be counted as cast and any discrepancy will be detected with a very high probability. 3. Fairness: The ballot cast is divided into partitions during the voting phase and unless the partitions are known, all the ballots remain equally likely. 4. Unforgeability: An ineligible voter cannot cast a vote because the vote consists of two parts: the signed ballot and the certified rid. The ineligible voter needs to generate a random ID and forge the signature of the registration authority, i.e. solve for x, given g, p and gx mod p. This is equivalent to solving the discrete log problem, which is computationally infeasible. Leakage of information about the signature exponent is avoided by using a zeroknowledge challenge and response protocol for signature verification. 5. Unreusability: Since every eligible voter is issued only one rid, it cannot be used to cast multiple votes because if that is done the servers will simply overwrite the previous cast ballot for that rid. 4.4 A Variation for Identification of Cheating Server/s To identify the cheating server/s, we make use of the following properties of arithmetic coding (see [102]), 1. Let T = n i=1aipi−1, where 0 ai < p, then b T pj−1 cmod p aj 67 2. Let T = n i=1aip2(i−1) + n−1 i=1 cp2i−1, where −p < ai < p and 1 c < p, then b T p2(j−1) cmod p aj Following minimal changes are required in the proposed protocol to implement cheater identification. 4.4.1 Voting Phase We apply the voting phase as described before until step 5. Step 6 is replaced by the following 2 steps, 1. The voter computes T = n i=1h(Di)p2(i−1) + n−1 i=1 cp2i−1, where c 2 Zp is a random constant. 2. The voter sends to every server Sj : ((j + k − 1,Dj), h(M), T), for all j = 1 to n. Example 3. As an example to illustrate the operations described above, consider that k = n = 3 is the number of shares the voter has created for his ballot, given by (1,D1) = (1, 4), (2,D2) = (1, 2) and (3,D3) = (1, 5). Let p = 11 and let the voter choose a random number c = 7 from field Zp. Then before sending the shares of his ballot to the server the voter needs to compute T = n i=1h(Di)p2(i−1) + n−1 i=1 cp2i−1. For simplicity, consider a dummy hashing function with mapping h(x) = x + 1. Therefore, h(D1) = 4+1 = 5, h(D2) = 2+1 = 3 and h(D3) = 5+1 = 6. Consequently, T = 5 · 112·0 + 3 · 112·1 + 6 · 112·2 + 7 · (112·1−1 + 112·2−1) = 97608. The voter now deposits this value of T on every server along with his ballot. 4.4.2 Cheating Server Identification To identify a cheating server during reconstruction phase, following steps are performed, 68 1. Each server makes the above calculations and based on a majority of results, if c = 0, then Sj is honest, else Sj is cheating and its share is invalid. We see that c = 0 if b T p2(j−1) − T0 p2(j−1) c = h(Dj)−h(D0j ) = 0 mod p, which happens only when Dj=D0j , i.e. the pieces presented by the server are valid pieces. An analysis of the identification protocol can be found in [102], where the solution has been applied to secret sharing. Example 4. Continuing with example 3; assume that 3 servers have come together to reconstruct the ballot. Also assume that the reconstructed vote turns out to be invalid after checking the hash of the retrieved hidden message against the hash deposited by the voter (see example 2), then in order to determine which server is cheating, the servers perform the following. Each server presents its share. Let us assume that sever 2 has cheated and provided an incorrect share such that D02 = 3. Servers 1 and 3 have provided correct shares D01 = 4 and D03 = 5. The servers then compute, T0 = Sjh(D0j )p2(j−1) = 5 · 112·0 + 4 · 112·1 + 6 · 112·2 = 88335. Now for each server Sj , compute c = b T−T0 p2(j−1) c. Therefore, for server 1, j = 1, and c = b97608−88335 p2(1−1) c = 9273 0 mod 11. For server 2, j = 2, and c = b97608−88335 p2(2−1) c = 76 10 mod 11. For server 3, j = 3, and c = b97608−88335 p2(3−1) c 0 mod 11. From the above we observe c 6= 0 for server 2, which had provided an incorrect share, and we have successfully identified the cheating server. 4.5 Conclusions In the proposed protocol for Internet voting, the security of the cast ballot depends on numerous servers and the fairness property is satisfied by the use of a ballot partitioning scheme. No encryption/decryption key is used and there is no explicit 69 encryption of the votes. The partitions of the ballot provide implicit security. It overcomes the limitation of lack of redundancy and verifiability in our earlier scheme. Our proposed method divides the votes into pieces such that loss of some of the pieces (less than threshold) does not result in the loss of the vote. A method for validation/verification of the ballot pieces that are brought together at the time of tally, and, a variant of the protocol is presented that can detect invalid ballot pieces as well as expose the cheating server. Internet voting is just one of the areas of applications of the distributed data storage. 70 CHAPTER 5 CONCLUSIONS We have proposed new efficient data dispersal techniques that have applications in cloud computing, sensor networks and online distributed storage. Chapter 2 described the idea of implicit security. The framework of implicit security, intends to do away with traditional key based encryption systems. It combines the goal of fault tolerance with security by creating unintelligible partitions of data. We proposed efficient algorithms to create these partitions that may be sent to n independent servers for storage. The partitions were created by mapping the data as the coefficient free term of a kth degree polynomial and then computing the roots of the polynomial. These roots were then treated as the partitions of the data. However, all the roots of a polynomial are required to reconstruct the polynomial. Hence to introduce redundancy we created linear combinations of the roots which were then stored as data partitions. A compromise of less than k servers does not reveal any information about the data and n − k servers store redundant partitions. We looked at issues of data integrity once the original data is reconstructed from k partitions. We also looked at the problem of key partitioning by the means of polynomial multiplication in GF(2m). The algorithm presented has low computational overhead and is useful in sensor networks. Chapter 3 extended the scheme presented in the chapter 2 to include the paradigm of information dispersal. We proposed a recursive scheme to achieve this. Recursive schemes work by reusing the data partitions created for a data item to encode other data items. The proposed scheme increases the efficiency of data storage in the sense 71 that same number of partitions now carry more information than before. However, this comes at the cost of reduced security which, however, may be practical for many application. These recursive techniques may be used to send hidden authentication and verification information, acting as a steganographic channel. We also extended the idea of recursive hiding of data to visual cryptography wherein partitions of smaller images were stored within the partitions of a larger image. This can be used for the purpose of Digital Rights Management. Chapter 4 presented the application of algorithms developed in previous chapters to Internet voting. Secure Internet voting presents contrasting requirements of ballot secrecy and verifiability. We meet these requirements to certain extent under the stated assumptions. Tampering of recorded ballots is prevented by recursively hiding verification information within the ballot partitions. The presented techniques are applicable to cloud computing and has the potential to provide long term security of data. Since, a large number of data breaches happen due to weak passwords, unscrupulous employees and improper system configurations, distributing unintelligible partitions of the data over independently maintained servers makes the task of the attacker difficult. Further, a secure addressing scheme may be developed to store data partitions on a randomly chosen subset of servers from a server farm. We have seen that recursive techniques provide a method for verification of reconstructed data. If one recursively stores the hash of the data that is being partitioned then such a technique may be termed as embedded finger printing, in that the fingerprints (hash values) of the data is stored within the data partitions. 72 BIBLIOGRAPHY [1] J. J. Wylie, M. Bakkaloglu, V. Pandurangan, M. W. Bigrigg, S. Oguz, K. Tew, C. Williams, G. Ganger, and P. Khosla, “Selecting the right data distribution scheme for a survivable storage system,” in Technical report CMUCS01120, Carnegie Mellon University, 2001. [2] A. Leventhal, “Tripleparity raid and beyond,” Communications of the ACM, vol. 53, no. 1, pp. 58–63, 2010. [3] A. Shamir, “How to share a secret,” Commununications of the ACM, vol. 22, no. 11, pp. 612–613, 1979. [4] G. R. Blakley, “Safeguarding cryptographic keys,” in Proceedings of AFIPS National Computer Conference, pp. 313–317, 1979. [5] E. Karnin, J. Greene, and M. Hellman, “On secret sharing systems,” IEEE Transactions on Information Theory, vol. 29, pp. 35–41, January 1983. [6] M. O. Rabin, “Efficient dispersal of information for security, load balancing, and fault tolerance,” Journal of the ACM, vol. 36, no. 2, pp. 335–348, 1989. [7] G. R. Blakley and C. Meadows, “Security of ramp schemes,” in Proceedings of CRYPTO 84 on Advances in cryptology, (New York, NY, USA), pp. 242–268, SpringerVerlag New York, Inc., 1985. [8] H. Krawczyk, “Secret sharing made short,” in CRYPTO ’93: Proceedings of the 13th Annual International Cryptology Conference on Advances in Cryptology, (London, UK), pp. 136–146, SpringerVerlag, 1994. 73 [9] S. K. Mishra, S. K. Vemulapalli, and P. Mohapatra, “Dualcrosshatch disk array: A highly reliable hybridraid architecture,” in Proceedings of International Conference on Parallel Processing, pp. 146–149, 1995. [10] J.M. Fray, Y. Deswarte, and D. Powell, “Intrusiontolerance using finegrain fragmentationscattering,” IEEE Symposium on Security and Privacy, vol. 0, pp. 194–203, 1986. [11] Y. Deswarte, L. Blain, and J.C. Fabre, “Intrusion tolerance in distributed computing systems,” IEEE Symposium on Security and Privacy, vol. 0, pp. 110– 121, 1991. [12] M. Waldman, A. D. Rubin, and L. F. Cranor, “Publius: a robust, tamperevident, censorshipresistant web publishing system,” in Proceedings of the 9th conference on USENIX Security Symposium  Volume 9, (Berkeley, CA, USA), pp. 5–5, USENIX Association, 2000. [13] Y. Chen, J. Edler, A. V. Goldberg, A. Gottlieb, S. Sobti, and P. N. Yianilos, “A prototype implementation of archival intermemory,” in Proceedings of the fourth ACM conference on Digital libraries, DL ’99, (New York, NY, USA), pp. 28–37, ACM, 1999. [14] J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, C. Wells, and B. Zhao, “Oceanstore: an architecture for globalscale persistent storage,” in Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, ASPLOSIX, (New York, NY, USA), pp. 190–201, ACM, 2000. [15] W. J. Bolosky, J. R. Douceur, D. Ely, and M. Theimer, “Feasibility of a serverless distributed file system deployed on an existing set of desktop pcs,” in Pro 74 ceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, SIGMETRICS ’00, (New York, NY, USA), pp. 34–43, ACM, 2000. [16] A. Iyengar, R. Cahn, J. A. Garay, and C. Jutla, “Design and implementation of a secure distributed data repository,” Proceedings of the 14th IFIP Internet Information Security Conference, p. 123135, 1998. [17] H. Krawczyk, “Distributed fingerprints and secure information dispersal,” in Proceedings of the twelfth annual ACM symposium on Principles of distributed computing, PODC ’93, (New York, NY, USA), pp. 207–218, ACM, 1993. [18] J. A. Garay, R. Gennaro, C. Jutla, and T. Rabin, “Secure distributed storage and retrieval,” Theoretical Computer Science, vol. 243, pp. 363–389, July 2000. [19] C. Cachin and S. Tessaro, “Asynchronous verifiable information dispersal,” in 24th IEEE Symposium on Reliable Distributed Systems, 2005. SRDS 2005., pp. 191–201, October 2005. [20] A. Parakh and S. Kak, “Recursive secret sharing for distributed storage and information hiding,” in Proceedings of the 3rd international conference on Advanced networks and telecommunication systems, ANTS’09, (Piscataway, NJ, USA), pp. 88–90, IEEE Press, 2009. [21] A. Parakh and S. Kak, “Internet voting protocol based on improved implicit security,” Cryptologia, vol. 34, pp. 258–268, 2010. [22] S. Kak, “Cryptography for multilocated parties,” CoRR, vol. abs/0905.2977, 2009. 75 [23] A. Parakh and S. Kak, “A tree based recursive information hiding scheme,” in 2010 IEEE International Conference on Communications (ICC), pp. 1–5, may 2010. [24] M. Gnanaguruparan and S. Kak, “Recursive hiding of secrets in visual cryptography,” Cryptologia, vol. 26, pp. 68–76, January 2002. [25] A. Parakh and S. Kak, “Online data storage using implicit security,” Information Sciences, vol. 179, no. 19, pp. 3323–3331, 2009. [26] A. Parakh and S. Kak, “Internet voting protocol based on implicit data security,” in Proceedings of 17th International Conference on Computer Communications and Networks, 2008. ICCCN ’08., pp. 1–4, August 2008. [27] A. Parakh and S. Kak, “A distributed data storage scheme for sensor networks,” in Security and Privacy in Mobile Information and Communication Systems, vol. 17 of Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pp. 14–22, Springer Berlin Heidelberg, 2009. [28] A. Parakh and S. Kak, “A key distribution scheme for sensor networks using structured graphs,” in International Conference on Emerging Trends in Electronic and Photonic Devices Systems, 2009. ELECTRO ’09, pp. 10 –13, December 2009. [29] E. Gehringer, “Choosing passwords: security and human factors,” in 2002 International Symposium on Technology and Society. (ISTAS’02), pp. 369–373, 2002. [30] R. Proctor, M. Lien, K. Vu, and G. Salvendy, “Improving computer security for authentication of users: Influence of proactive password restrictions,” Behavior Research Methods, Instruments and Computers, vol. 34, pp. 163–169, 2002. 76 [31] S. Riley, “Password security: What users know and what they actually do,” Usability News, vol. 8.1, 2006. [32] S. Kak, “On the method of puzzles for key distribution,” International Journal of Parallel Programming, vol. 13, pp. 103–109, 1984. [33] N. Subramanian, C. Yang, and W. Zhang, “Securing distributed data storage and retrieval in sensor networks,” Pervasive and Mobile Computing, vol. 3, pp. 659–676, December 2007. [34] F. Ye, H. Luo, J. Cheng, S. Lu, and L. Zhang, “A twotier data dissemination model for largescale wireless sensor networks,” in Proceedings of the 8th annual international conference on Mobile computing and networking, MobiCom ’02, (New York, NY, USA), pp. 148–159, ACM, 2002. [35] G. Wang, W. Zhang, G. Cao, and T. La Porta, “On supporting distributed collaboration in sensor networks,” in IEEE Military Communications Conference, 2003. MILCOM 2003, vol. 2, pp. 752–757, October 2003. [36] W. Zhang, G. Cao, and T. La Porta, “Data dissemination with ringbased index for wireless sensor networks,” IEEE Transactions on Mobile Computing, vol. 6, pp. 832–847, July 2007. [37] B. Greenstein, S. Ratnasamy, S. Shenker, R. Govindan, and D. Estrin, “Difs: a distributed index for features in sensor networks,” Ad Hoc Networks, vol. 1, no. 23, pp. 333–349, 2003. Sensor Network Protocols and Applications. [38] D. Denning, Cryptography and Data Security. AddisonWesley Publishing Company, 1982. 77 [39] A. Renvall and C. Ding, “A nonlinear secret sharing scheme,” in Proceedings of the First Australasian Conference on Information Security and Privacy, ACISP ’96, (London, UK), pp. 56–66, SpringerVerlag, 1996. [40] T. Claveirole, M. de Amorim, M. Abdalla, and Y. Viniotis, “Securing wireless sensor networks against aggregator compromises,” IEEE Communications Magazine, vol. 46, pp. 134–141, April 2008. [41] A. Eldin, A. Ghalwash, and H. ElShandidy, “Enhancing packet forwarding in mobile ad hoc networks by exploiting the information dispersal algorithm,” Communications, Computers and Applications, 2008. MICCCA 2008. Mosharaka International Conference on, pp. 20–25, August 2008. [42] T. Yuan and S. Zhang, “Secure fault tolerance in wireless sensor networks,” CITWORKSHOPS ’08: Proceedings of the 2008 IEEE 8th International Conference on Computer and Information Technology Workshops, pp. 477–482, 2008. [43] M. H. Dehkordi and S. Mashhadi, “New efficient and practical verifiable multisecret sharing schemes,” Information Sciences, vol. 178, no. 9, pp. 2262 – 2274, 2008. [44] C. KuangHui and C.WenTsuen, “A new group key generating model for group sharing,” Information Sciences, vol. 64, no. 12, pp. 83 – 94, 1992. [45] C.Y. Lee, Y.S. Yeh, D.J. Chen, and K.L. Ku, “A probability model for reconstructing secret sharing under the internet environment,” Information Sciences, vol. 116, no. 24, pp. 109 – 127, 1999. [46] S. Kak, “A cubic publickey transformation,” Circuits, Systems and Signal Processing, vol. 26, pp. 353–359, 2007. 78 [47] “Expanding the use of electronic voting technology for uocava citizens.” United States Department of Defense, May 2007. [48] R. Sinnott, T. Selker, B. Lewis, B. Whelan, J. Williams, and J. McBridem, “Evaluation of voting machine, peripherals and software.” In First Report of the Commission on Electronic Voting on the Secrecy, Accuracy and Testing of the Chosen Electronic Voting System Appendix 2C, Dublin 2004. [49] “Reuters  second life. http://secondlife.reuters.com/stories/2008/05/07/eveonline experimentswithvirtualdemocracy/.” [50] “Second life herald. http://www. secondlifeherald.com.” [51] A. BharuchaReid and M. Sambandham, Random Polynomials. New York: Academic Press, 1986. [52] A. Aho, J. Hopcroft, and J. Ullman, The Design and Analysis of Computer Algorithms. AddisonWesley, 1974. [53] D. Knuth, The Art of Computer Programming Vol 2. AddisonWesley, 1969. [54] T. Moon, Error Correction Coding: Mathematical Methods and Algorithms. WileyInterscience, 2005. [55] S. Kak, “Exponentiation modulo a polynomial for data security,” International Journal of Parallel Programming, vol. 12, pp. 337–346, 1983. [56] L. Dickson, Linear Groups with an Exposition of the Galois Field Theory. Dover Publications, 1958. [57] K. Rosen, Discrete Mathematics and its Applications. McGrawHill, 2007. 79 [58] M. Naor and A. Shamir, “Visual cryptography,” in Advances in Cryptology EUROCRYPT’94, vol. 950 of Lecture Notes in Computer Science, pp. 1–12, Springer Berlin / Heidelberg, 1995. [59] P. Feldman, “A practical scheme for noninteractive verifiable secret sharing,” in 28th Annual Symposium on Foundations of Computer Science, 1987, pp. 427– 438, October 1987. [60] T. P. Pedersen, “Noninteractive and informationtheoretic secure verifiable secret sharing,” in CRYPTO ’91: Proceedings of the 11th Annual International Cryptology Conference on Advances in Cryptology, (London, UK), pp. 129–140, SpringerVerlag, 1992. [61] A. Herzberg, S. Jarecki, H. Krawczyk, and M. Yung, “Proactive secret sharing or: How to cope with perpetual leakage,” in CRYPTO ’95: Proceedings of the 15th Annual International Cryptology Conference on Advances in Cryptology, (London, UK), pp. 339–352, SpringerVerlag, 1995. [62] J. He and E. Dawson, “Multistage secret sharing based on oneway function,” Electronics Letters, vol. 30, pp. 1591–1592, Sep 1994. [63] B. Schneier, Schneier’s Cryptography Classics Library: Applied Cryptography, Secrets and Lies, and Practical Cryptography. Wiley Publishing, 2007. [64] P. Rogaway and M. Bellare, “Robust computational secret sharing and a unified account of classical secretsharing goals,” in CCS ’07: Proceedings of the 14th ACM conference on Computer and communications security, (New York, NY, USA), pp. 172–184, ACM, 2007. [65] V. Vinod, A. Narayanan, K. Srinathan, C. P. Rangan, and K. Kim, “On the power of computational secret sharing,” in INDOCRYPT, pp. 162–176, 2003. 80 [66] P. B´eguin and A. Cresti, “General short computational secret sharing schemes,” in Proceedings of the 14th annual international conference on Theory and application of cryptographic techniques, EUROCRYPT’95, (Berlin, Heidelberg), pp. 194–208, SpringerVerlag, 1995. [67] L. Harn, “Efficient sharing (broadcasting) of multiple secrets,” IEE Proceedings  Computers and Digital Techniques, vol. 142, pp. 237–240, May 1995. [68] J.K. J. H.Y. Chien and Y.M. Tseng, “A practical (t,n) multisecret sharing scheme,” IEICE transactions on fundamentals of electronics, communications and computer sciences, vol. 83, no. 12, pp. 2762–2765, 2000. [69] L.J. Pang and Y.M. Wang, “A new (t,n) multisecret sharing scheme based on shamir’s secret sharing,” Applied Mathematics and Computation, vol. 167, no. 2, pp. 840–848, 2005. [70] C.C. Yang, T.Y. Chang, and M.S. Hwang, “A (t,n) multisecret sharing scheme,” Applied Mathematics and Computation, vol. 151, no. 2, pp. 483–490, 2004. [71] C.W. Chan and C.C. Chang, “A scheme for threshold multisecret sharing,” Applied Mathematics and Computation, vol. 166, no. 1, pp. 1 – 14, 2005. [72] G. Horng, T. Chen, and D.S. Tsai, “Cheating in visual cryptography,” Designs, Codes and Cryptography, vol. 38, no. 2, pp. 219–236, 2006. [73] R. D. Prisco and M. Yung, eds., Security and Cryptography for Networks, 5th International Conference, SCN 2006, Maiori, Italy, September 68, 2006, Proceedings, vol. 4116 of Lecture Notes in Computer Science, Springer, 2006. [74] H. Luo and J.S. Pan, “Hiding multiple watermarks in transparencies of visual cryptography,” in IIHMSP ’07: Proceedings of the Third International Con 81 ference on International Information Hiding and Multimedia Signal Processing (IIHMSP 2007), (Washington, DC, USA), pp. 303–306, IEEE Computer Society, 200
Click tabs to swap between content that is broken into logical sections.
Rating  
Title  New Information Dispersal Techniques for Trustworthy Computing 
Date  20110501 
Author  Parakh, Abhishek 
Keywords  Cloud computing, Data partitioning, Data storage security, Distributed secure data storage, Distributed security, Information dispersal 
Department  Computer Science 
Document Type  
Full Text Type  Open Access 
Abstract  Information dispersal algorithms (IDA) are used for distributed data storage because they simultaneously provide security, reliability and space efficiency, constituting a trustworthy computing framework for many critical applications, such as cloud computing, in the information society. In the most general sense, this is achieved by dividing data into smaller pieces and then storing these pieces on different servers such that access to a stored piece reveals only a very limited amount of information about the original data. The effectiveness of IDA is predicated on spreading of risk across servers that provides protection against loss of data due to disasters and intrusion. The main objective of this dissertation is to develop new information dispersal techniques together with efficient reconstruction algorithms that will perform well in the presence of adversaries and erratic servers. The new techniques consider a general model of risk distribution that is appropriate for situations related to cloud computing, sensor networks, information hiding, and internet voting. We develop verifiable IDA algorithms so that a client can distribute the data among a set of servers, of which a certain subset might be faulty or compromised, in such a way that the client can always recover the stored data correctly, independent of the behavior of the faulty servers, and not have the data compromised to an adversary. 
Note  Dissertation 
Rights  © Oklahoma Agricultural and Mechanical Board of Regents 
Transcript  NEW INFORMATION DISPERSAL TECHNIQUES FOR TRUSTWORTHY COMPUTING By Abhishek Parakh Bachelor of Technology in Electronics and Communication Engineering Dr. B. R. Ambedkar National Institute of Technology Jalandhar, Punjab, India 2005 Master of Science in Electrical Engineering Louisiana State University Baton Rouge, Louisiana, USA 2007 Submitted to the Faculty of the Graduate College of Oklahoma State University in partial fulfillment of the requirements for the Degree of DOCTOR OF PHILOSOPHY May, 2011 COPYRIGHT c By Abhishek Parakh May, 2011 NEW INFORMATION DISPERSAL TECHNIQUES FOR TRUSTWORTHY COMPUTING Dissertation Approved: Dr. Subhash C. Kak Dissertation Advisor Dr. Nohpill Park Dr. K. M. George Dr. Sohum Sohoni Dr. Mark Payton Dean of the Graduate College iii ACKNOWLEDGMENTS My deepest gratitude to my Ph.D. advisor, Dr. Subhash Kak, who has also been my mentor in several realms of life over the past six years. He gave me the freedom to explore my areas of interest and guided me through toughest of times during the course of my work. He has been very patient with me and I continue to look up to him. Dr. Nohpill Park and Dr. K.M. George have been my committee members and teachers. They taught me several core areas of computer science that have been very helpful in my research work. I have also worked as Teaching Assistant for them. While Dr George bore my foray into being a teaching assistant and helped me learn the skill, Dr. Park gave me the additional opportunity to guest lecture to his class which helped me hone my teaching skills. Dr. Sohum Sohoni, a member of my committee, comes from the Electrical and Computer Engineering Department and has been very helpful to me. He helped me organize the student conferences (TACS’09 and TACS’10) and has always been very cheerful and encouraging. I would also like to thank Dr. V. Sarangan and Dr. A. Raghuram (Mathematics Department), who were my committee members during the initial phases of my Ph.D. work, for being very supportive and enthusiastic towards my research projects. Lastly, I would like to thank my parents for their love and support and my friends for making my stay cheerful. My special thanks to the Hatfields (Gary and Melita) for being like family away from home. iv TABLE OF CONTENTS Chapter Page 1 INTRODUCTION 1 1.1 The Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 ONLINE DATA STORAGE USING IMPLICIT SECURITY 9 2.1 Proposed Data Partitioning Scheme . . . . . . . . . . . . . . . . . . . 11 2.1.1 The (k, k) Data Partitioning Scheme . . . . . . . . . . . . . . 11 2.1.2 Choosing the Coefficients . . . . . . . . . . . . . . . . . . . . . 14 2.1.3 Introducing Redundancy . . . . . . . . . . . . . . . . . . . . . 15 2.1.4 Complexity of the proposed protocols . . . . . . . . . . . . . . 16 2.2 An alternate approach to partitioning . . . . . . . . . . . . . . . . . . 19 2.3 A stronger variation to the protocol modulo a composite number . . . 21 2.4 Addressing the Data Partitions . . . . . . . . . . . . . . . . . . . . . 22 2.5 Security by Key Distribution . . . . . . . . . . . . . . . . . . . . . . . 23 2.5.1 Key Partitioning in Galois Field 2m . . . . . . . . . . . . . . . 23 2.6 Partition redundancy v/s no redundancy . . . . . . . . . . . . . . . . 25 2.7 Application in Internet Voting Protocols . . . . . . . . . . . . . . . . 25 2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3 A TREE BASED RECURSIVE SCHEME FOR SPACE EFFICIENT SECURE DISTRIBUTED DATA STORAGE 28 3.1 A Comparison Based (2, 3) Recursive Scheme . . . . . . . . . . . . . 32 3.2 Proposed koutofn Recursive Secret Sharing Scheme . . . . . . . . . 34 v 3.3 Proposed Computational Secret Sharing Scheme . . . . . . . . . . . . 39 3.4 On Security of the Proposed Schemes . . . . . . . . . . . . . . . . . . 45 3.5 Efficiency and height plots . . . . . . . . . . . . . . . . . . . . . . . . 46 3.6 Recursive hiding in threshold visual cryptography . . . . . . . . . . . 47 3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4 INTERNET VOTING PROTOCOL BASED ON IMPLICIT SECURITY 54 4.1 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2 The Proposed Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.2.1 Registration Phase . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2.2 Voting Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2.3 Counting Phase . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.4 Recasting of ballot . . . . . . . . . . . . . . . . . . . . . . . . 64 4.3 Security of the proposed protocol . . . . . . . . . . . . . . . . . . . . 65 4.4 A Variation for Identification of Cheating Server/s . . . . . . . . . . . 67 4.4.1 Voting Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.2 Cheating Server Identification . . . . . . . . . . . . . . . . . . 68 4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5 CONCLUSIONS 71 BIBLIOGRAPHY 73 vi LIST OF FIGURES Figure Page 1.1 Illustration of conventional (explicit) data security (left) and illustration of the idea of implicit data security (IDA) (right). . . . . . . . . 2 2.1 Data partitioning and storage. . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Illustration of key partitioning. . . . . . . . . . . . . . . . . . . . . . 24 3.1 Recursive hiding of smaller messages in the shares of larger messages 33 3.2 Illustration of recursion tree for example 1 (partial illustration) . . . . 34 3.3 Illustration of application of algorithm 1 for n = 4. . . . . . . . . . . 38 3.4 (a) Plot of possible values m can take as n and h vary. (b) Plot of new effective share sizes relative to the size of the original secret S versus m and n (only a few values are shown). . . . . . . . . . . . . . . . . . 40 3.5 (a) Plot of efficiency improvement factor as a function of n and h. (b) Plot of efficiency improvement factor as a function of n and h (larger values). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.6 (a) Shows how the height h changes as a function of the number of pieces m and the number of players n. (b) Shows how the efficiency improvement factor changes as a function of arbitrary m and n. In both (a) and (b) each piece is taken to be of the minimum size possible (ex. a single digit when using a decimal base). . . . . . . . . . . . . . 44 3.7 Plot of maximum efficiency improvement factor with corresponding height h against m. Here n=2. . . . . . . . . . . . . . . . . . . . . . . 47 vii 3.8 Plot of maximum efficiency improvement factor with corresponding height h against m (a) n=3 (b) n=5. . . . . . . . . . . . . . . . . . . 48 3.9 Possible partitions for black and white pixels . . . . . . . . . . . . . . 49 3.10 Illustration of recursive hiding of secret images in shares of larger original image using a 2outof3 threshold scheme . . . . . . . . . . . . . 50 3.11 Illustration of the process of recursive hiding of secrets in shares of larger original image . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.12 Illustration of regeneration of smaller images from the shares hidden inside the shares of the original larger image . . . . . . . . . . . . . . 52 4.1 (a) Polynomial interpolation using vote (0, V ) and (i, ri), i = 1 to 8. (b) Lack of knowledge of (1, r1) and (2, r2) leaves the polynomial undetermined; hence the vote remains secure. . . . . . . . . . . . . . 66 viii CHAPTER 1 INTRODUCTION Embedded, mobile and cloud computing applications are a pervasive feature of the emerging information society. In such a world, with users connecting with intelligent devices and interacting with services available through the Internet and joining the network at short notice while uploading or downloading large volumes of data, it is essential to guarantee that the network is selfaware, selfadjusting, and selfhealing with respect to malicious and natural threats. This is especially so in critical applications such as electronic medical records, pharmaceutical supply chains, electronic ballots, and records related to electronic commerce. The assessment of risk and prediction of failures are some of the challenges of working in the field of trustworthy computing. One way to deal with risk and potential failure of the system is to include the adversary in the security model. Traditionally, the idea of explicit security by instituting firewalls and encryption systems has been used but it is not effective in dealing with natural or manmade disasters. The idea of information dispersal algorithms (IDA) is an example of implicit security in which the data is partitioned and stored over many different servers, which is illustrated in figure 1.1. Its direct application is to survivability of the system and protection against data loss due to disasters and intrusions as was the motivation in survivable storage systems of Delta4, Publius, Intermemory, Oceanstore, Farsite, eVault, and PASIS [1]. IDA systems make the selfhealing of data possible, and if the data on a device becomes corrupted or destroyed, an automated process can detect this and recalculate all data contained in the missing or corrupted partition by inspecting 1 Figure 1.1: Illustration of conventional (explicit) data security (left) and illustration of the idea of implicit data security (IDA) (right). available partitions. In RAID, several physical hard disks are so managed that they present themselves as a single logical unit to the operating system. RAID uses mirroring, striping, and error correction to improve reliability and availability of data and drive failure is masked from the end user. But RAID can involve significant computation overhead when reading and writing information. Of the several RAID schemes, RAID 5 uses striping along with distributed parity. In RAID 6, there is fault tolerance from two drive failures. Upon limited drive failure, subsequent reads may be calculated from the distributed parity. With copy and replication, RAID can allow for significantly more drive failures before there is data loss. But these systems are expensive since they are both copy and paritybased systems. RAID 7, a triple parity system, has been proposed to overcome the impending shortcomings of RAID 6 systems that will allow for up to 3 drive failures [2]. However, RAID does not focus on data security. Most distributed storage systems use (p, k, n) data distribution schemes, where the data is divided into pieces so that fewer than p pieces do not reveal any information, p or more than p but less than k pieces reveal some information and at least k pieces 2 are required to reveal the entire data. Here n−k pieces are redundant. For example, a (1, 1, n) distribution scheme creates n replicas of the original data and retrieving one of the replicas provides the original data. A (1, n, n) scheme divides the data into pieces of 1 n th the size of the original data while an (n, n, n) scheme creates n pieces each of the same size as that of the original data but all the n pieces are required for data reconstruction. Distributed storage systems that use secret sharing schemes [3, 4, 5] use (k, k, n) data distribution schemes, which creates n pieces of the same size as the data and thus are space inefficient. Rabin’s [6] information dispersal scheme is a (1, k, n) data distribution scheme that simultaneously encodes k data values into n pieces and each piece reveals partial information about the encoded values. Other schemes with general values of p, k and n are known as ramp schemes [7] and use p − 1 random values and k − (p − 1) data values. In general, the size of each piece is proportional to 1 k−(p−1) of the original size of the data object that is encoded. Apart from the threshold schemes discussed above an additional layer of security may be provided by using an encryption key to encrypt the data before dividing it up into pieces [8]. The encryption key is then stored using a threshold secret sharing scheme of type (k, k, n) and the encrypted data is divided into n pieces using (1, k, n) information dispersal scheme. Such a hybrid system balances security and space efficiency. By coding and dispersing information, the reliability, security and efficiency of data storage can be vastly improved by IDA systems over traditional copy and paritybased systems. Copybased systems offer reliability by mirroring data on n storage devices, and up to n − 1 of the devices can fail without data loss. But this comes at high cost which is n times greater than storing a single copy. Paritybased systems, commonly used in RAID [9], typically allow at most two storage devices to fail without data loss. Paritybased systems are not as wasteful as copybased systems and they 3 are equally reliable. A special benefit of IDA schemes over copybased systems is the inherent security advantages of dispersing information since the data is not stored at any single location. Each stored partition is a small and unique fraction of the data. An IDA system can be configured to disperse data to n number of devices which can sustain up to n − k failures without loss. Even if an attacker gained access to the devices, the system remains secure until at least k devices are compromised. Maintaining multiple copies creates multiple points of attack, further decreasing the reliability of the system. IDA systems are efficient since the storage overhead is low. They offer the high reliability of copybased storage and the high efficiency of paritybased systems. IDA systems are not only reliable, secure, and less expensive than backup and archive systems, they are the obvious choice for geographically distributed storage. If each of the n devices is kept at a different location throughout the world, then the resultant system is effectively a disaster proof backup system. So long as k locations remain up, the data would be safe and accessible even in the presence of power failure, earthquakes, floods, and other disasters. Delta4 is one of the earliest systems proposed for distributed data storage [10, 11] and uses security servers and data storage servers. The system works by dividing the data into fragments and then storing encrypted fragments. All the encrypted fragments are sent to all the storage servers that run a pseudorandom algorithm to decide on which fragments to store. A client could access a piece of data by getting authorization from a set of security servers. Publius [12] aims at providing survivable storage as well as anonymity. However, the system is inefficient in the sense that it replicates data for storage. Before replication the data is encrypted and the encryption key is divided into pieces and each piece is stored with a replica of the encrypted data. Intermemory [13] is an archival storage system for public domain. It provides 4 high availability and minimizes storage and expected cost of reading data. The nodes store a single piece of data, then apply a threshold scheme and distribute pieces of this data to other nodes which in turn again apply a threshold scheme and further distribute the pieces of these pieces. This recursive encoding of pieces is termed as deepencoding. OceanStore [14] is a storage scheme for wideareanetworks and addresses many issues such as client mobility, robust naming, data migration and replica discovery. It stores active data as encrypted replicas and the archival data using deepencoding as in Intermemory. Farsite [15] is aimed at peer based storage over desktop personal computers. It uses the workstation idle cycles and unused storage capacity of desktop workstations to store encrypted replicas of the data. Evault [16] uses information dispersal or short secret sharing [8] depending on the desired security level and distributed finger prints [17] for data integrity. 1.1 The Objective The major objective of this dissertation is to develop new information dispersal techniques, more general than have been researched before, that are suited for applications involving confidentiality and verifiability in the presence of erratic servers and in the presence of an adversary as one might expect in a variety of situations related to cloud computing, sensor networks, information hiding, and Internet voting. Mathematically, we will develop selfverifiable information dispersal techniques so that a client can distribute the data among a set of n servers, of which up to n − k might be faulty, in such a way that the client can always recover the stored data correctly, independently from the behavior of the faulty servers. Network based protocols for information dispersal that could tolerate a certain degree of errors on servers were proposed by Garay et. al. [18] and Cachin and Tessaro [19]. The first of these relies on synchronous networks, which are adequate for tightly coupled nodes such as clusters, but unrealistic for geographically distributed 5 or heterogeneous systems, and the second uses an asynchronous network model with certain bounds on the computational capacity of the adversary. We go beyond the scope of the earlier works and not only consider verifiability based on the relationship between partitions of data stored on various servers but also use structure within the partitions themselves to provide higher performance both for Byzantine servers as well as clients. The information dispersal techniques proposed in this dissertation lead to new scalable and optimized schemes that reduce computational overhead and provide applicationspecific balance between confidentiality and verifiability. It should be possible to apply these data distribution schemes to a variety of applications including secure data storage in cloud computing, sensor networks, Internet voting, peertopeer networks, and information hiding. Some of our results have appeared in [20, 21, 22, 23, 24, 25, 26, 27, 28]. We divide our approach into two phases that integrate the theoretical aspects of our research into an effective strategy towards achieving application specific goals. The new idea that we will use is that of recursively generated partitions, or partitions that are nested, which means creating a system where dependencies are not distributed equally across servers. This research has investigated several basic space efficient nested schemes [20, 23, 27]. In these schemes, the original data block is broken into several pieces and the pieces are divided into slices one by one, by repeatedly using already generated slices. This establishes the ground for a more fundamental analysis, in an abstract setting, of such nonuniform IDA schemes that not only provide the theory for data storage on computer and sensor networks, but also to steganography. The aims of this dissertation are the following: 1. To develop new information dispersal techniques that are applicable to resource constraint environments such as sensor networks as well as cloud data storage. 6 2. Discover new applications for distributed data storage and information dispersal techniques that we develop. 3. Implement fault tolerance for the dispersal techniques in order to detect errors in data partitions and if possible also detect the faulty/malicious server. Chapter 2 proposes a computationally efficient data distribution technique that can be used in resource constraint environments such as sensor networks as well as for online data storage. These results have been published in [25, 27]. We describe the use of a data partitioning scheme for implementing implicit security involving the roots of a polynomial in finite field. The partitions are stored on randomly chosen servers on the network and they need to be retrieved to recreate the original data. Data reconstruction requires access to each server, login password and the knowledge of the servers on which the partitions are stored. This scheme may also be used for data security in sensor networks and Internet voting protocols. Chapter 3 proposes a recursive secret sharing technique that can be applied to data partitioning and steganography. Steganographic applications can not only be used for storing hidden information but also for checking data consistency once the original data is recreated using the retrieved partitions. The recursive techniques presented have been applied to visual cryptography to send watermarks and other secret images. These results have been published in [20, 23]. The chapter presents a koutofn recursive secret sharing scheme based on an nary tree data structure. In recursive hiding of secrets, the user encodes additional secrets in the shares of the secret intended to be originally shared without an expansion in the size of the latter, thereby decreasing the effective share size per secret and increasing the overall space efficiency of secret sharing, with a tradeoff in security. Chapter 4 discusses the application of distributed data storage to solve the problem of secure and verifiable internet voting. We present a protocol to achieve this and our work has appeared in [21, 26]. The chapter presents an improved protocol for 7 Internet voting that allows recasting of ballots to eliminate voting errors, encourages early voting and provides an opportunity for changing votes as the election campaign progresses. The protocol is able to meet the competing requirements of verifiability and privacy by using a distributed security architecture, involving multiple servers, and multiple audit channels. It also detects servers that might be cheating. 8 CHAPTER 2 ONLINE DATA STORAGE USING IMPLICIT SECURITY Securing data stored on distributed servers is of fundamental importance in cloud computing. The traditional (explicit) approach to securing data is to store and back it up on a single server and allow access upon the use of passwords that are needed to be frequently changed. But there is a tendency among users to keep passwords simple and memorable [29, 30, 31] leading to the possibility of brute force attacks. Furthermore since the data on the Web is archived, keys that provide adequate encryption today are likely to be broken in the future. Therefore, explicit security architecture may not be adequate for many applications. To go beyond the present approach, one may incorporate further security within the system by using puzzles [32] or otherwise by another layer that increases the space that the intruder must search in order to break the system. Here, we propose an implicit security architecture in which security is distributed amongst many entities. In contrast to hashbased distributed security models for wireless sensor networks [33, 34, 35, 36, 37], we consider a more general method of data partitioning. In this approach, stored data is partitioned into two or more pieces and stored at randomly chosen places on the network that are known only to the owner of the data. Access to these pieces not only depends on the knowledge of a password but also on the knowledge of where the pieces are stored. The division of data is performed in such a way that the knowledge of all the pieces is required to recreate the data and that none of the individual pieces reveals any useful information. In scenarios where one or more pieces may be at the danger of being lost or inaccessible due to system or 9 network failure, one may employ schemes that can recreate the data from a subset of original pieces. Formally, our scheme consists of two parts  the first part is a (k, k) partitioning scheme where all the k partitions are required to recreate the data and the second part extends the first part of the scheme to a (k, n) partitioning scheme, where k n and k > 0 , i.e. only k out of n partitions are required to recreate the data. A number of schemes have been proposed in the communications context for splitting and sharing of decryption keys [3, 38, 39]. These schemes fall under the category of “secret sharing schemes”, where the decryption key is considered to be a secret. Motivated by the need to have an analog of the case where several officers must simultaneously use their keys before a bank vault or a safe deposit box can be opened, these schemes do not consider the requirement of data protection for a single party. Further, in any secret sharing scheme it is assumed that the encrypted data is stored in a secure place and that none of it can be compromised without the decryption key. Some schemes that directly apply secret sharing for distributed data storage in sensor networks have been proposed [40, 41, 42], but have a complexity of O(n log2n) at best, whereas our scheme can achieve linear complexity. Other schemes aim at sharing multiple secrets [43] but have a complexity of O(n2) or generate group keys for group sharing of information [44]. Certain probability models have been proposed for reconstructing secret sharing over the Internet [45], but only focus on strategies for share distribution over the network and do not provide methods for data partitioning. We protect data by distributing its parts over various servers. The idea of doing these partitions is a generalization of the use of 3 or 9 roots of a number in a cubic transformation [46]. The scheme we present is simple and easily implementable. We would like to stress that the presented scheme is different from Shamir’s secret sharing scheme [3] which takes the advantage of polynomial interpolation and maps 10 the secret on the yaxis; whereas we map the data as roots of a polynomial on the xaxis. One of the potential application of the idea of implicit security is in secure internet voting. Internet voting is an inherent part of the online virtual worlds, such as Second Life, Active Worlds, Habbo Hotel, and others, and is only a few years away from large scale real world implementations [47, 48, 49, 50]. These virtual worlds provide an excellent testbed for online voting schemes, which once successful may be used in realworld democratic process. To make use of the implicit security architecture in online voting, the voter’s cast ballot is considered as his data and the partitions of this ballot may be stored on different servers for distributed security. Such a system will protect the intermediate election results from being revealed while distributing the security of the ballots over different servers [26]. 2.1 Proposed Data Partitioning Scheme 2.1.1 The (k, k) Data Partitioning Scheme We now describe our data partitioning scheme. By the fundamental theorem of algebra, every equation of degree k has k roots. We use this fact to partition data into k partitions such that each of the partition is stored on a different server. No explicit encryption of data is required to secure each partition. The partitions in themselves do not reveal any information and hence are implicitly secure. Only when all the partitions are brought together is the data revealed. Consider an equation of order k xk + ak−1xk−1 + ak−2xk−2 + ... + a1x + a0 = 0 (2.1) Equation 2.1 has k roots denoted by {r1, r2, ..., rk} {set of complex numbers} and 11 can be rewritten as (x − r1)(x − r2)...(x − rk) = 0 (2.2) In cryptography, it is more convenient to use the finite field Zp where p is a large prime. If we replace a0 in (2.1) with the data d 2 Zp that we wish to partition then, xk + Xk−1 i=1 ak−ixk−i + d 0 mod p (2.3) where 0 ai p− 1 and 0 d p− 1. (Note that one may alternatively use −d in (2.3) instead of d.) This may be rewritten as Yk i=1 (x − ri) 0 mod p (2.4) where 1 ri p− 1. The ri are the partitions. It is clear that the term d in (2.3) is independent of variable x and therefore Yk i=1 ri d mod p (2.5) If we allow the coefficients in (2.3) to take values a1 = a2 = ... = ak−1 = 0, then (2.3) will have k roots only if GCD(p − 1, k) 6= 1 and 9b 2 Zp such that d is the kth power of b. One simple way to chose such a p would be to choose a prime of the form (k · s + 1), where s 2 N. However, such a choice would not provide good security because knowledge of the number of roots and one of the partitions would be sufficient to recreate the original data by computing the kth power of that partition. Furthermore, not all values of d will have a kth root and hence one cannot use any arbitrary integer, which would ideally be required. Therefore, one of the restrictions on choosing the coefficients is that not all of them are simultaneously zero. For example, if the data needs to be divided into two parts then an equation of 12 second degree is chosen and the roots computed. If we represent this general equation by x2 + a1x + d 0 mod p (2.6) then the two roots can be calculated by solving the following equation modulo p, x = −a1 ± p a21 − 4d 2 (2.7) which has an solution in Zp only if the square root p a21 − 4d exists modulo p. If the square root does not exist then a different value of a1 needs to be chosen. We present a practical way of choosing the coefficients below. However, this brings out the second restriction on the coefficients, i.e. they should be so chosen such that a solution to the equation exists in Zp. Theorem 2.1 If the coefficients ai, 1 i k−1 in equation (2.3) are randomly and uniformly chosen and are not all simultaneously zero, then the knowledge of any k−1 roots of the equation, such that equation (2.4) holds, does not provide any information about the value of d with a probability greater than that of a random guess of 1/p. Proof. Given a specific d, the coefficients in (2.3) can be chosen to satisfy (2.4) in Zp in the following manner. Choose at random from the field k − 1 random roots r1, r2, ..., rk−1. Then kth root rk can be computed by solving the following equation, rk = d · (r1 · r2 · ... · rk−1)−1 mod p (2.8) Since the roots are uniformly and randomly distributed in Zp, the probability of guessing rk without knowing the value of d is 1/p. Conversely, d cannot be estimated with a probability greater than 1/p without knowing the kth root rk. It follows from Theorem 2.1 that data is represented as a multiple of k numbers 13 Figure 2.1: Data partitioning and storage. in the finite field. Example 1. Let data d = 10, prime p = 31, and let k = 3. We need to partition the data into three parts for which we will need to use a cubic equation, x3 + a2x2 + a1x − d 0 mod p. We can find the equation satisfying the required properties using Theorem 2.1. Assume, (x − r1)(x − r2)(x − r3) 0 mod 31. We randomly choose 2 roots from the field, r1 = 19 and r2 = 22. Therefore, r3 d· (r1 · r2)−1 10· (19·22)−1 mod 31 11. The equation becomes (x − r1)(x − r2)(x − r3) x3 − 21x2 + x − 10 0 mod 31, where the coefficients are a1 = 1 and a2 = −21 and the partitions are 11, 22 and 19. 2.1.2 Choosing the Coefficients In the previous section, we described two conditions that must be satisfied by the coefficients. The first condition was that not all the coefficients are simultaneously zero and second that the choice of coefficients should result in an equation with roots in Zp. Since no generalized method for solving equations of degree higher than 4 exists [51], a numerical method must be used which becomes impractical as the number of partitions grow. An easier method to compute the coefficients is exemplified by Theorem 2.1 and Example 1. One might ask why should we want to compute the coefficients if we already have 14 all the roots? We answer this question in section 3. 2.1.3 Introducing Redundancy In situations when the data pieces stored on one or more (less than a threshold number of) servers over the Internet may not be accessible, the user should be able to recreate the data from the available pieces. The procedure outlined below extend the k partitions to n, n k partitions such that only k of them need to be brought together to recreate the data. If {r1, r2, . . . , rk} is the original set of partitions then they can be mapped into a set of n partitions {p1, p2, . . . , pn} by the use of a mapping function based on linear algebra. If we construct n linearly independent equations such that a11r1 + a12r2 + . . . + a1krk = c1 a21r1 + a22r2 + . . . + a2krk = c2 ... an1r1 + an2r2 + . . . + ankrk = cn where numbers aij , rj and ci are randomly chosen from the finite field Zp as in the previous sections, then the n new partitions are pi = {ai1, ai2, . . . , aik, ci}, 1 i n. The above linear equation can be written as matrix operation, 2 66666664 a11 a12 . . . a1k a21 a22 . . . a2k ... an1 an2 . . . ank 3 77777775 2 66666664 r1 r2 ... rk 3 77777775 = 2 66666664 c1 c2 ... cn 3 77777775 15 To recreate rj , 1 j k from the new partitions, any k of them can be brought together, 2 66666664 r1 r2 ... rk 3 77777775 = 2 66666664 am1 am2 . . . amk an1 an2 . . . ank ... ai1 ai2 . . . aik 3 77777775 −1 k×k 2 66666664 c1 c2 ... ck 3 77777775 k×1 A feature of the presented scheme is that new partitions may be added and deleted without affecting any of the existing partitions. 2.1.4 Complexity of the proposed protocols The complexity of the (k, k) data partitioning scheme The complexity of the (k, k) partitioning scheme is O(k). This is illustrated by the following algorithm. Algorithm 1a (k,k) Data Partitioning Scheme Input data d 1. Choose randomly and uniformly from a finite field Zp, k − 1 numbers r1, r2, . . . , rk−1. 2. Compute rk = d · (r1 · r2 · . . . rk−1)−1 mod p. 3. Construct the kth degree polynomial: p(k) = (x − r1)(x − r2) . . . (x − rk) mod p = xk + ak−1xk−1 + ak−2xk−2 + . . . + a1x + a0 mod p. 4. The roots r1, r2, . . . rk of the polynomial p(k) represent the data partitions. Output ! vector ~R = [r1, r2, . . . , rk] As seen from the above algorithm, partition creation requires k multiplications and one inversion operation modulo prime p. Hence the complexity of O(k). 16 Similarly, data reconstruction requires k multiplications which is illustrated by the following algorithm. Algorithm 1b (k,k) Data Reconstruction Input vector ~R = [r1, r2, . . . , rk] 1. Data d = r1 · r2 · . . . rk mod p. Output ! d Linear complexity of Algorithms 1a and 1b is highly desirable for resource constraint environments such as that of sensor networks. The complexity of the (k, n) data partitioning scheme The proposed (k, n) data partitioning scheme uses the (k, k) partitioning scheme as a subalgorithm. That is, we first obtain k partitions using Algorithm 1a and then introduce redundancy to cope with possible inaccessibility of some of the partitions. The (k, n) partitioning scheme works as follows. Algorithm 2a (k,n) Data Partitioning Scheme Input data d 1. Use Algorithm 1a to create k partitions of data d. Obtain vector ~R = [r1, r2, . . . , rk] 2. Generate n × k matrix A such that all the rows of matrix are linearly independent. Note. A simple way to choose such a matrix would be to choose a Vandermonde matrix of the following form A = 2 66666664 1 x1 x21 x31 . . . xk−1 1 1 x2 x22 x32 . . . xk−1 2 ... 1 xn x2 n x3 n . . . xk−1 n 3 77777775 17 3. Create n partitions by computing [A]n×k · [~R T ]k×1 = [C]n×1, where C = [c1, c2, . . . , cn]T . 4. The partitions are denoted by the pair pi = {xi, ci}, 1 i n. Output ! pi = {xi, ci}, 1 i n Algorithm 2b (k,n) Data Reconstruction Input any k partitions pi = {xi, ci} 1. Construct the k × k matrix B by choosing the rows of matrix A corresponding to the given pairs pi and compute B−1. 2. Reconstruct the k shares by evaluating [~R T ]k×1 = [B−1]k×k · [C]k×1. 3. Use Algorithm 1b to reconstruct data d, where the input is the algorithm is ~R . Output ! d The complexity of Algorithms 2a and 2b is O(n log2n) [52, 53]. A more efficient method, that eliminates multiplications during partition creation, would be to use a n×k matrix with elements from GF(2). Such matrices are used in errorcorrection coding [54]. This will reduce all multiplication operations to additions operations for partition creation (assuming the cost of multiplications with 0 and 1 can be ignored). And inversion of such a matrix during data reconstruction would require at most O(n3) operations and O(n) multiplications. An example of such a matrix in GF(2) is given below, where we have assumed n = 4 and k = 3. A = 2 66666664 1 0 1 0 1 0 0 1 1 1 1 0 3 77777775 18 Any k = 3 columns the above matrix A are linearly independent. Since the above matrix only requires multiplications with 0 and 1, the time complexity of the scheme becomes linear for partition creation. This is one of the advantages of our scheme over traditional schemes such as that of Shamir’s. A note on applications As seen above, Algorithms 1a and 1b require O(k) computations while Algorithms 2a and 2b require O(n log2n) computations. Both these implementations are efficient and suitable for resource constraint environments such as wireless sensor networks. Further, while sensors generate crucial data and transmit it securely to the base station, the reconstruction of data happens for most cases only at the base station. Since base stations have much higher computational capacity than the sensors themselves, a matrix in GF(2) may be used to reduce the computational burden on the sensors. 2.2 An alternate approach to partitioning Once the user has computed all the k roots then he may compute the equation resulting from (2.4) and store one or all of the roots on different servers and the coefficients on different servers. Recreation of original data can now be performed in two ways: either using (2.5) or choosing one of the roots at random and retrieving the coefficients and substituting the appropriate values in the equation (2.3) to compute −a0 xk + ak−1xk−1 + ak−2xk−2 + ... + a1x mod p (2.9) Therefore, the recreated data d (p − a0) mod p. Parenthetically, this may provide a scheme for fault tolerance. Additionally, a user may store just one of the roots and k − 1 coefficients on the network. These together represent k partitions. Note. Distinct sets of coefficients (for a given constant term in the equation) result 19 in distinct sets roots and vice versa. This is because two distinct sets of coefficients represent distinct polynomials because two polynomials are said to be equal if and only if they have the same coefficients. By the fundamental theorem of algebra, every polynomial has unique set of roots. Two sets of roots R1 and R2 are distinct if and only if 9ri8rj(ri 6= rj), where ri 2 R1 and rj 2 R2. To compute the corresponding polynomials and the two sets coefficients C1 and C2, we perform Qk i=1,ri2R1 (x−ri) 0 mod p and Qk j=1,rj2R2 (x−rj) 0 mod p and read the coefficients off from the resulting polynomials, respectively. It is clear the at least one of the factors of the two polynomials is distinct because at least one of the roots is distinct; hence the resulting polynomials for a distinct set of roots are distinct. Theorem 2.2 Determining the coefficients of a polynomial of degree k 2 in a finite field Zp, where p is prime, by brute force, requires (d pk−1 (k−1)!e) computations. Proof. If A represents the set of coefficients A = {a1, a2, ..., ak−1}, where 0 ai p − 1, then by the above note each distinct instance of set A gives rise to a distinct set of roots R = {r1, r2, ..., rk}, and conversely every distinct instance of set R gives rise to a distinct set of coefficients A. Therefore, if we fix d to a constant then (2.5) can be used to compute a set of roots. Every distinct set of roots is therefore a k − 1 combination of a multiset, where each element has infinite multiplicity, and equivalently a k−1 combination of set S = {0, 1, 2, ..., p−1} with repetition allowed. Thus, the number of possibilities for the choices of coefficients 20 is given by the following expression * p k − 1 + = 0 B@ p − 1 + (k − 1) k − 1 1 CA = (p+k−2)! (p−1)!(k−1)! = (p+k−2)(p+k−3)...(p+k−k)(p−1)! (p−1)!(k−1)! = (p+k−2)(p+k−3)...p (k−1)! d pk−1 (k−1)!e (2.10) Here we have used the fact that in practice p k 2, hence the result. We have ignored the one prohibited case of all coefficients being zero, which has no effect on our result. 2.3 A stronger variation to the protocol modulo a composite number An additional layer of security may be added to the implementation by performing computations modulo a composite number n = p · q, p and q are primes, and using an encryption exponent to encrypt the data before computing the roots of the equation. In such a variation, knowledge of all the roots and coefficients of the equation will not reveal any information about the data and the adversary will require to know the secret factors of n. For this we can use an equation such as the follows xk + Xk−1 i=1 ak−ixk−i + dy 0 mod n (2.11) 21 where y is a secretly chosen exponent and GCD(y, n) = 1. If the coefficients are chosen such that (2.11) has k roots then Yk i=1 ri dy mod n (2.12) Appropriate coefficients may be chosen in a manner similar to that described in previous sections. It is clear that compromise of all the roots and coefficients will at the most reveal c = dy mod n. In order to compute the original value of d, the adversary will require the factors of n which are held secret by the user. 2.4 Addressing the Data Partitions The previous sections consider the security of the proposed scheme when prime p and composite n are public knowledge. However, there is nothing that compels the user to disclose the values of p and n. If we assume that p and n are secret values then the partitions may be stored in the form of an “encrypted link list”. We define an encryptedlinklist as a link list in which every pointer is in encrypted form and in order to find out which node the present node points to, the user needs to decrypt the pointer which can be done only if certain secret information is known. If we assume that p and n are public values then the pointer can be so encrypted that each decryption either leads to multiple addresses or depends on the knowledge of the factors or both. Only the authentic user will know which of the multiple addresses is to be picked. Alternatively, one may use a random number generator and generate a random sequence of server IDs using a secret seed. One way to generate this seed may be to find the hash of the original data and use it to seed the generated sequence and keep the hash secret. 22 2.5 Security by Key Distribution In some cases, when there is a large amount of data, a user may find it inefficient to create data partitions and distribute them over the network but may wish to encrypt the data and store it on a single server which he trusts and keep encryption key secret. Almost always, encryption keys are very large random numbers and cannot be memorized. Therefore, the user may create partitions of the key and spread them over the network. This approach may be more efficient, if not more secure, than creating data partitions of enormous amounts of data. Furthermore, the access to the servers on which the key partitions are stored may be password restricted. We stress that this scenario is different from the secret sharing scenario where multiple parties are involved since our case involves data security for a single party. Keys may be partitioned using the scheme presented in the previous sections where the data is replaced by the key. Alternatively one may use the key partitioning scheme presented in the following subsection that uses polynomials in Galois Field GF(2m), for which, given the distributed nature of the scheme, the security will be better than polynomial generalization of power encryption transformations [55, 56]. 2.5.1 Key Partitioning in Galois Field 2m A Galois Field (2m) consists of polynomials of degree m − 1 or less, such that their coefficients lie in GF(2). For example, am−1xm−1 + am−2xm−2 + ... + a1x + a0, where ai = 1 or 0 represents a polynomial belonging to GF(2m) and it may be written in binary representation as (am−1am−2...a1a0). Multiplication in GF(2m) is performed modulo an irreducible polynomial g(x) = xm + bm−1xm−1 + ... + b1x + b0, where bi 2GF(2). A polynomial g(x) is said to be irreducible if it cannot be factored into two or more polynomials each with coefficients in GF(2) and each of degree less than m. Therefore, a user can generate a random key and partition it into k partitions 23 Figure 2.2: Illustration of key partitioning. using the following procedure. He generates k random polynomials of any random degree in GF(2m) and computes their product modulo the irreducible polynomial g(x). The resultant polynomial of degree m − 1 with binary coefficients is taken as the required key in its binary representation and the randomly generated polynomials are the partitions. Example 2. In order to generate a random 8 bit key and create 5 partitions of it, the user may proceed as follows. Let g(x) = x8 +x2 +1 be an irreducible polynomial in GF(2m). He chooses 5 polynomials of degree m − 1 or less (at random), such as p1(x) = x7 +x6 +x, p2(x) = x4 +x2 +x+1, p3(x) = x5 +x3, p4(x) = x7 +x6 +x5 +x and p5(x) = x6+x2+x+1 and computes there product, p1(x)p2(x)p3(x)p4(x)p5(x) k(x) mod g(x), where k(x) is the “key polynomial” and the coefficients are the binary representation of the key. k(x) p1(x)p2(x)p3(x)p4(x)p5(x) x6 + x4 + x3 + x + 1 mod g(x) Therefore, the random key generated is k=01011011 and the partitions are, p1=11000010, p2=00010111, p3=00101000, p4=11100010, p5=01000111. Note that the knowledge of k−1 partitions, in the scheme presented above, leaves the kth partition completely undetermined and the adversary still requires (2m−1) 24 trials to determine the encryption key. This is because all the partitions 2GF(2m) and that each one of them is independently and randomly chosen from the field. Hence, knowledge of k −1 partitions does not reveal any information about the choice made for the kth partition. Further, since the kth partition can be any random polynomial from the field, in order to compute this polynomial by brute force would require on an average half the number of trial of 2m, i.e. (2m−1). Since the right hand side of the equation (the key) is completely dependent on the choice of left hand side of the equation (the factors of the key), one would require (2m−1) trial to compute the key with the knowledge of k − 1 partitions [57]. In practice, keys are of 128 bits to 512 bits in AES, i.e. (2127) to (2511) possible trials on an average. The above scheme can be extended to a (k, n) partitioning scheme by converting the existing partitions into decimal integers and using the extension presented in section 2.1.3. 2.6 Partition redundancy v/s no redundancy In cases where the accessibility to the partitions is guaranteed and the data disks are sufficiently backedup, partition redundancy may lead to unnecessary wastage of storage space. This is certainly true for insystem storage where data partitions may be distributed over numerous sectors randomly and not contiguously stored. Also in case of network storage larger number of partitions would require larger upload time which may be avoidable. 2.7 Application in Internet Voting Protocols Internet voting is a challenge for cryptography because of its opposite requirements of confidentiality and verifiability. There is the further restriction of “fairness” that the intermediate election results must be kept secret. One of the ways to solve this 25 problem is to use multiple layers of encryption such that the decryption key for each layer is available with a different authority. This obviously leaves open the question as to who is to be entrusted the encrypted votes. A more effective way to implement fairness would be to avoid encryption keys altogether and divide each cast ballot into k or more pieces such that each authority is given one of the pieces [26]. This solves the problem of entrusting any one authority with all the votes and if any of the authorities (less than the threshold) try to cheat by deleting some of the cast ballots, then the votes may be recreated using the remaining partitions. Such a system implicitly provides a backup for the votes. 2.8 Conclusions We have described an implicit security architecture suited for the application of online storage. In this scheme data is partitioned in such a way that each partition is implicitly secure and does not need to be encrypted. These partitions are stored on different servers on the network which are known only to the user. Reconstruction of the data requires access to each server and the knowledge as to which servers the data partitions are stored. Several variations of this scheme are described, which include the implicit storage of encryption keys rather than the data, and where a subset of the partitions may be brought together to recreate the data. One can propose variations to the scheme presented in this paper where the data partitions need to be brought together in a definite sequence. One way to accomplish it is by representing partition in the following manner: p1, p1(p2), p2(p3), ..., where pi(pj) represents the encryption of pj by means of pi. Such a scheme will increase the complexity of the bruteforce decryption task for an adversary. The proposed scheme is also of potential use in sensor networks. To protect sensitive information on sensors that are physically compromised, one may partition 26 the data and store these parts on other sensors in the network. The key that is used to find the location of these parts is encrypted and this encryption is changed from time to time. This provides solutions that are more general than other distributed security models used in sensor networks [33]. Our approach leads to interesting research issues such as the optimal way to distribute the parts in a network of n sensors, so that the network can work even if some of the sensors happen to malfunction by themselves or as a consequence of intervention by an adversary. It also leads to the question of reliability and tolerance to faults in such a system. 27 CHAPTER 3 A TREE BASED RECURSIVE SCHEME FOR SPACE EFFICIENT SECURE DISTRIBUTED DATA STORAGE One of the techniques to implement distributed storage is the use of secret sharing scheme, with the difference that the secret is replaced with data. In this chapter we develop recursive secret sharing schemes for distributed storage. We will be presenting our scheme under the context of secret sharing, however, when applied to the problem of secure distributed data storage, the secrets may be replace with data that is to be distributed or it can be used in conjunction with other information dispersal algorithms for secure key distribution. Conventionally secret sharing schemes divide the data into pieces such that fewer than k pieces do not reveal any information about the secret, or data in our case, and fall under the information theoretic framework. Information theoretically secure secret sharing schemes [58, 59, 3, 60, 61, 62], are space inefficient. For example, a koutofn, denoted as (k, n), secret sharing scheme expands a secret of b bits into n shares each of at least b bits in size. Furthermore, since only k of these shares are needed to recreate the secret, each bit of any share, in a threshold secret sharing scheme, effectively conveys at most d 1 k e bits of secret. If k = n, as in the case of a nonthreshold scheme, where all the shares must be brought together to recreate the secret, the effective information conveyed by each bit of any share is d 1 ne bits of the secret. One way to improve space efficiency is to distribute shares smaller in size than the secret itself, however at the cost of reduction in security. Computational secret sharing 28 techniques have been developed to achieve this [63, 64, 65, 66, 8], in which a symmetric key is used to encrypt the original secret and the encrypted secret is divided into pieces to which redundancy is added by the use of block error correction techniques [8, 6, 18]. The encryption key is split into shares using information theoretically secure methods of secret sharing. This leads to an nfold increase in key size, shares of which have to be stored with every piece of the encrypted secret, hence incurring an overhead [8]. A method for sharing multiple secrets with security based on assumption of hardness of discrete logarithm problem is presented in [67]. In [68] a scheme based on systematic block codes and in [69, 70] schemes using Shamir’s secret sharing have been proposed to share multiple secrets, however they require a large amount of side information to be stored as public knowledge and further [68, 69, 70, 71] attempt to maintain ideal security. An extension of secret sharing schemes is visual cryptography [58] that aims at dividing an image into two or more shares such that when a predetermined number of shares are aligned and stacked together, the secret image is revealed [58, 72, 73], without the requirement of any computation. However, information theoretically secure approaches to visual cryptography also suffer from inefficiency in terms of number of bits of secret conveyed per bit of share. A number of multiimage hiding schemes have been developed [74, 75, 76, 77]. However, the implementation in [75] does not hide the information using “real” visual cryptography, in the sense that computation needs to be performed to extract the hidden information. Lou et. al. [74] use a secret key to generate a permutation order and the key needs to be conveyed in addition to the shares, becoming a overhead and also amounts to computation, which was to be avoided by visual cryptography. In [76], two watermarks are hidden, one during halftoning of original image and second while creating shares, however, both these hiding techniques do not conform to visual cryptography because, to extract first hidden image, an exclusive OR reconstruction 29 of the image needs to be performed, which is never done in visual cryptography and the extraction of the second hidden image requires additional XOR operations. Fang and Lin [77] hide only integer from 0 to 6, which has very limited applications and again the integers are not decoded visually but require computation to be extracted. Wu and Chen [78] propose a scheme to hide two images at a rotation angle of 90 degrees while Wu and Chang [79] discuss hiding multiple images using circular shares at a limited number of rotation angles. Hsu et. al. [80] extended the scheme to allow arbitrary rotation angle by rolling shares into rings. However, circular shares distort the aspect ratio of the original image. Further knowing how and by what degree the shares are to be rotated requires additional side information to be supplied along with the share. Also pixel expansion is an issue, for example in [81], which encodes two or more secrets into circular shares, the pixel expansion is proportional to the number of secrets being hidden. Another way to improve information efficiency is to hide additional secrets in the shares of the original secret, without increasing the share size of the latter in comparison to what it would be without the additional hidden secrets. This effectively reduces the share sizes per secret, taking both hidden and original secrets into account. Recursive hiding of secrets was proposed in [24] for this purpose and to serve as a steganographic channel, with applications to binary images and binary text but was only limited to a (2,2) secret sharing. The idea involved is recursive hiding of smaller secrets in shares of larger secrets with secret sizes doubling at every step, thereby increasing the information that every bit of share conveys to (n−1) n bit of secret i.e. nearly 100%. However, the scheme described in [24] is a nonthreshold scheme where all the shares are needed to recreate the secret. We present a recursive 2outofn visual secret sharing scheme which is an extension of the recursive 2outof2 visual secret sharing scheme. This small but important extension helps in understanding the basic idea behind recursive hiding of secrets. 30 In this chapter, we present a general (nonvisual) koutofn secret sharing scheme that hides additional information in the shares of the original secret, without any increase in the share sizes of the latter. This represents an increase in space efficiency of the secret sharing scheme. Further, our algorithm acts as a dual algorithm in the sense that it can be used as a multisecret sharing algorithm as well as a computational secret sharing algorithm. When used as a multisecret sharing algorithm, it recursively encodes different secrets into the shares of other secrets such that the “inner” secrets do not lead to any expansion in share sizes of the “outer” secrets. In other words, the shares of the “outer” secrets now also convey the “inner” secrets, thereby increasing the information efficiency. When used as a computational secret sharing algorithm, we simulate a multisecret sharing algorithm, by dividing the original (larger) secret into multiple pieces of smaller sizes. These pieces are then recursively encoded into shares such that some of the pieces act as the “inner secrets” and some as “outer secrets”. Further, since the algorithm produces shares of size on the order of the size of the pieces, it effectively results in smaller secret share sizes than conventional schemes. It is to be noted, however, that the security in both cases, the multisecret sharing and the computational secret sharing schemes is computational. The proposed recursive secret sharing scheme has applications in distributed online storage of information discussed in [6, 18]. Systems implementing such distributed data storage have been described in [82, 16, 83, 84, 64]. In section 3.1, we present a generalization of the (2,2) recursive scheme presented in [24] to a (2,3) threshold scheme for binary strings. The aim of this section is to make clear the idea of recursion, for text, in secrecy context and how it can be applied to improve the efficiency of secret sharing. Section 3.2, extends the (2,3) recursive scheme for text to a (n, k) recursive scheme, where, the new scheme is based on 31 repeated polynomial interpolation and sampling. Section 3.3 discusses the use of the proposed secret sharing scheme as a computational secret sharing scheme and section 3.4 comments on the security of the scheme, while section 6 is conclusions. 3.1 A Comparison Based (2, 3) Recursive Scheme For text represented as a binary sequence, a 2 out of 3 secret sharing scheme can be developed using a simple comparison based algorithm as follows: we divide a secret bit into 3 shares p1, p2, and p3 such that p1 = p2 = p3 if we wish to encode bit 0, and p1 6= p2 6= p3 if we wish to encode bit 1. To satisfy the above conditions we would need at least 3 symbols, say 0, 1 and 2. Thechap3:refore to encode bit 0 we could create pieces p1p2p3 as 000, 111, or 222. Whereas the candidates to encode bit 1 would be 012 and all possible permutations of it, i.e. 021, 102, 120, 210, and 201. In all, to encode secret bit 0 and secret bit 1, we have 3 and 6 possibilities, respectively. This asymmetry in the number of choices does not affect the security of the scheme because an adversary can guess a bit correctly, by brute force, with a probability of onehalf. This is the maximum security one can achieve for any one bit. Now in our scheme in the encoded form for any bit, given one of its shares, there exist two possible choices for the second share, i.e. either the second share is the same as the first or the second share is different from the first. Therefore, a player can guess the second share only with a probability of onehalf. Example 1. If M is a 27 bit long message that we wish to encode into 3 shares and the threshold is 2, then nonrecursive shares S1, S2, and S3 may be created as follows: M : 011011010110110011100101101 S1 : 102012012010201201201020102 S2 : 110020022120111210101221001 32 S3 : 121001002200021222001122200 Viewed as a ternary alphabet, the efficiency of this system is 33%. As a comparison, if 0, 1 and 2 are encoded using prefix coding as 0, 10, and 11 respectively, then we are effectively mapping each bit of secret into 5 bits of shares and the efficiency is only 27 27×5 = 1 5 , i.e. 20%. The above efficiency can be improved by recursively hiding additional secrets in the shares of M. However, since each bit is mapped into 3 shares, in order to take advantage of the recursive technique, the secrets at each step must increase by a factor of 3. We can then hide the following secrets M1, M2, and M3 in shares of M as follows (figure 3.1): Figure 3.1: Recursive hiding of smaller messages in the shares of larger messages Note that at each step we have used the shares of the previous smaller messages to create the shares of larger messages; these smaller shares are denoted in bold. Also, we have distributed the shares at each step so that no player has access to all the shares of the smaller messages and hence, every message (seen from a permessage view point) remains secure until at least two players come together. This approach is different from that discussed in [24], where the shares of smaller messages were all accumulated into one of the larger shares instead of distributing them among all the 33 possible players. As a result in [24], any player having that share which encodes the smaller images could in principle recreate these smaller images without the help of the other player, which in some cases might not be acceptable. Therefore, our new approach is more secure for certain applications. Figure 3.2 illustrates the development of recursion tree for example 1. It is seen that the tree forms a ternary structure with nodes at each level giving rise to 3 nodes in the following level, and the nodes shown in bold are the nodes carried over from the previous level. The shares are distributed from left to right one at a time, i.e. if we number the tree leaves starting from the left as 1,2,3,1,2,3 and so on. Then these numbers denote the player’s number to whom the shares belong. Figure 3.2: Illustration of recursion tree for example 1 (partial illustration) Also seen in figure 3.1 is that using recursive hiding of secrets, we have been able to encode 13 bits of M1, M2, and M3 and 27 bits of M into shares of M alone. As a result the efficiency, considering a ternary alphabet, is 13+27 3×27 = 40 81 1 2 (i.e. 50% compared to 33% in the nonrecursive case). If one considers the binary representations of each character then each share now conveys 13+27 5×27 = 8 27 1 3 bits. Compared to 1/5 bits in the conventional approach, this is an almost 40% increase in efficiency. 3.2 Proposed koutofn Recursive Secret Sharing Scheme A secret sharing scheme based on polynomial sampling and interpolation in finite field is discussed in [3], where the secret is mapped as a point at x = 0 and k − 1 34 additional points are chosen randomly and uniformly from the field. These k points are then interpolated to generate a polynomial of degree k−1, which is then sampled at n points (except at x = 0). These n samples, the (x, y) points, are distributed as shares among the players. In order to reconstruct the secret, any k of the shares (points) can be interpolated to regenerate the k − 1th degree polynomial, which can then be sampled at x = 0 to retrieve the secret. At each level of recursion, in our proposed scheme, we also use polynomial interpolation and sampling. However, we will be hiding additional pieces of information within the shares of the original secret. We work in a finite field Zp, where p is a prime and it is public knowledge. Further, we assume that a secret S is represented as string of numbers S = s1s2 . . . sr, where each si 2 Zp and S = r = nh, where S denotes the length of secret S for some integer h. For example, if we assume that the secret is a text message composed of ASCII characters, then it can be represented as a string of numbers less than p = 257 [6]. Furthermore, assume that we have another string denoted by M = m1m2 . . .mx, mi 2 Zp, where M = x = nh−1 n−1 , to be hidden within the shares of the original secret S. For instance, in example 1, n = 3, h = 3, and hence in the shares of the original secret S = nh = 33 = 27 bits long we were able to encode nh−1 n−1 = 13 additional bits of information. The upper limit on the number of additional “pieces” of information that can be encoded within the shares of the original message of size nh, in the proposed scheme, is nh−1 n−1 . We also observe that the recursive schemes proposed in this paper (as in section 2), forms a nary tree structure where the original secret forms the leaves of the tree and the hidden information forms the internal nodes. Consequently, a comparison can be made between the efficiency of conventional secret sharing schemes and tree 35 based recursive secret sharing schemes. In information theoretic secret sharing schemes, each share of the secret is of the same size as the secret itself, as a result, for a secret of given size nh, each of the n shares are of size nh. This results in an efficiency of c = nh nh·n = 1 n, where denotes efficiency and subscript c denotes conventional (information theoretically secure) scheme. A tree based recursive scheme hides nh−1 n−1 pieces of additional information within the n shares each of size nh. Consequently, the efficiency of tree based recursive schemes is r = 1 nh·n · (nh + nh−1 n−1 ). A recursive tree based secret sharing improves the efficiency of conventional secret sharing methods by a factor of e = 1 + 1 nh · (nh−1 n−1 ). Which is one plus the ratio of the number of additional pieces hidden within the pieces of the original secret to the number of pieces of the original secret. With the above in mind, the proposed scheme works as follows: Inputs: Original secret  S = s1s2 . . . sr; message to be hidden  M = m1m2 . . .mx; n; k and p where sa, mb 2 Zp, 1 a r, 1 b x, r = nh and x = nh−1 n−1 . Algorithm 3a. Creation of shares 1. Choose k − 1 random numbers rl 2 Zp, l = 1 to k − 1 uniformly and randomly. 2. Interpolate a polynomial p11(x) using k points (0,m1) and (l, rl), 1 l k−1. Let pic(x), denotes the cth polynomial at the ith level in the recursion (tree). 3. Sample p11(x) at n points: D1 11, D2 11, . . ., Dn 11, where in Dk ic, k refers to the index number of the sample as well as the xcoordinate at which the sample is taken; i and c are the same as noted in point 2 above. 4. Initialize a = 1, b = 2. 5. For i = 2 to h 36 (a) c = 1 (b) For k = 1 to ni−2 For j = 1 to n if i < h i. Interpolate pic(x), using points (0,mb), (j,Dj (i−1)k) and k − 2 randomly and uniformly chosen numbers. ii. Sample pic(x) to generate samples Dq ic, 1 q n. iii. b = b + 1, c = c + 1 else i. Interpolate pic(x), using points (0, sa), (j,Dj (i−1)k) and k − 2 randomly and uniformly chosen numbers. ii. Sample pic(x) to generate samples Dq ic, 1 q n. iii. a = a + 1 and c = c + 1 iv. Distribute Dq ic, 1 q n to players from 1 to n, respectively. We do not consider the final shares as a part of the tree. Hence, the tree has only nh leaves, which are then interpolated and sampled at n points to generate the final shares. Algorithm 3b. Reconstruction of secret and hidden information 1. For 1 c r (a) Interpolate k shares Di hc, 1 i k to generate polynomial Phc(x). (b) Sample Phc(x) at x = 0 to retrieve sc. 2. For i = h − 1 down to 1 (a) j = 1, b = 1, q = ni−1−1 n−1 + 1 (b) For c = 1 to ni i. Sample P(i+1)c(x) at point x = b, denote as Dx ij . 37 Figure 3.3: Illustration of application of algorithm 1 for n = 4. 38 ii. b = b + 1 iii. if c mod n = 0 A. Interpolate (x,Dx ij), x = 1 to n to generate Pij(x). B. Sample Pij(x) at x = 0 to retrieve mq. C. x = 1, j = j + 1, b = 1, q = q + 1. The share reconstruction process traverses the tree from leaves to the root (figure 3.3), while the reconstruction process retrieves the secret and the hidden messages in a last in first out manner. In figure 3.3, R denotes a vector of length k − 2 numbers randomly and uniformly chosen from the field. Note that R0 is a k−1 element vector in the first step (level 1). Each instance of R is independent. 3.3 Proposed Computational Secret Sharing Scheme We can use the proposed scheme of section 3.2 as a computational secret sharing scheme by dividing the secret into smaller pieces and then recursively encoding the pieces as suggested in algorithm 3a. However the difference would be that instead of inner pieces being that of a different message to be hidden, they would be pieces of the secret itself. If we create m pieces pi of the secret, then in the ideal case m should be equal to nh+1−1 n−1 , for a given n and some integer h, where pi = S m . Further, the tree has nh number of leaves which then yield n shares each, resulting in n · nh number of shares. Therefore, each of the n player receives a share of effective size nh · S m = (1 − m−1 m·n ) · S. This represents a reduction in share sizes; for example, if the secret is broken into m = 15 pieces and n = 2, then the effective share size for each player is (1 − 14 30 ) · S compared to S in the conventional case. However, the above holds only when mnh+1−1 n−1 . A plot for the values of m for n = 2 to 5 and h = 1 to 5 is shown in figure 3.4a. Figure 3.4b shows how the size of resulting shares change relative to the size of 39 (a) (b) Figure 3.4: (a) Plot of possible values m can take as n and h vary. (b) Plot of new effective share sizes relative to the size of the original secret S versus m and n (only a few values are shown). 40 original secret. The plot in figure 3.4b is drawn for the values that the number of pieces m can take from that shown in figure 3.4a against the number of players n. The zaxis is the ratio of the new effective share size to the share size in a conventional scheme, i.e. 1 − m−1 m·n . The efficiency improvement factor for the ideal case is given by 1 + 1 nh · (nh−1 n−1 ). A plot of how efficiency improvement factor varies as a factor of n and h is given in figure 3.5. Figure 3.5 shows that given a n, more the height h, better the efficiency improvement factor. An efficiency factor of 2 implies a 50% reduction in share size compared to information theoretically secure schemes where each share is of size S. Therefore, new effective share size is given by SN = S efficiency improvement factor , where S is the original secret size. Optimal number of pieces: In practice, since the number of pieces may be arbitrary, the nary tree may or may not be complete, and one may need to stuff additional (dummy) pieces to complete the tree. The number of pieces required to complete the tree is a factor of the height h of tree. Further in order to maximize the information efficiency, we would like to determine what h one would want to choose. In general, we assume that we are working in a decimal base, i.e. each secret may be represented as a sequence of integers 09. This means that the smallest piece one may create is a single digit number. Also, note that the prime p used in algorithm 3, can only be chosen after the piece sizes are decided upon. For example, in the case of smallest possible pieces (single digits) a prime p = 11 would suffice. However, if one was to choose each piece to be of two digits in size, then a prime p = 101 would be needed. Let us redefine m to be the number of smallest pieces in the original secret, for example the number of digits in the secret. In order to understand how stuffing of pieces would work, assume that we are working in a binary field; therefore smallest piece that can be created is of one bit 41 (a) (b) Figure 3.5: (a) Plot of efficiency improvement factor as a function of n and h. (b) Plot of efficiency improvement factor as a function of n and h (larger values). 42 in size. Since the total number of pieces has to form a nary tree, if we denote by m the number of pieces of the original secret then m may not always be an integral multiple of nh+1−1 n−1 for the given value of n and any value of h. As a result, in order to complete the tree, we may either need to adjust the piece sizes (to more than one bit in size, in turn changing the prime p required in algorithm 1), and/or stuff the secret with dummy bits. Below, we will investigate both the cases and determine which case results in a better efficiency. Assume single bit pieces and bit stuffing (if required), and that m and n are fixed. Since the last level of the tree has nh number of leaves, each player will receive nh bits. Therefore, the value of h chosen will decide the number of bits stuffed to complete the tree. Also, since we would want to have each piece as a single bit, we note that if nh > m, then each player will receive more bits than originally in the secret, which is not desired. Consequently, one of the criterions in choosing h is that nh m, where m is the total number of bits in the original secret. This gives the upper bound on h. Also, to maintain the requirement of one bit per piece, the total number of nodes in the tree must be greater than the total number of bits in the original secret. Hence, nh+1−1 n−1 m gives the lower bound for h. On the other hand, if we assume that each pieces may be larger in size than the minimum size possible, then starting from the upper bound on h, h = blogn(m)c, we could gradually reduce the tree height by one at each step and then readjust the shares so as to complete the tree, with or without stuffing of pieces. Then a comparison could be made between the resulting efficiencies. Note that as we change the piece sizes, we may require changing the prime p for algorithm 1. For example, again consider working in a binary field and that the original secret consists of m = 8 bits and let n = 2, so that the algorithm constructs a binary tree. Since, the smallest piece possible is a bit, if one was to construct a recursion tree with one bit pieces, then the tree at the 4th (last) level requires 7 bits stuffed to be 43 (a) (b) Figure 3.6: (a) Shows how the height h changes as a function of the number of pieces m and the number of players n. (b) Shows how the efficiency improvement factor changes as a function of arbitrary m and n. In both (a) and (b) each piece is taken to be of the minimum size possible (ex. a single digit when using a decimal base). 44 complete. When these bits are divided into shares, since the last level now has 8 bits, it would result in each of the two shares being 8 bits. However, if one was to reduce the tree height by one level by adjusting the pieces to be of two bits each then the 3rd (last) level of the tree would require some pieces to be stuffed and would result in 4 × 2 bits for each share. This does not result in any efficiency improvement yet. However, if one was to further reduce another level of the tree, so that the tree now has only two levels, then each piece would be of 3 bits each, where only 1 dummy bit is stuffed. Since, now the 2nd (last) level of the tree has 2 × 3 bits, each share would be of 6 bits each. This is a reduction in share sizes compared to a nonrecursive scheme. Consequently, in general, in order to maximize efficiency, the height h of the recursion tree must be chosen such that it minimizes d S nh+1−1 · (n − 1)e × nh for 1 h blogn(m)c. Here, m denotes the number of pieces of the smallest size present in the original secret corresponding to the base that we working in. The section 3.5 shows plots for the maximum efficiency improvement achievable and the corresponding height against the number of (smallest) pieces present in the original secret. 3.4 On Security of the Proposed Schemes When applying a secret sharing scheme based on polynomial interpolation and sampling (Shamir’s scheme) where k−1 random numbers are interpolated along with the secret to generate a kth degree equation, the samples (taken appropriately, excluding at x = 0) do not provide any information about the secret which is mapped as point at x = 0. The first step in the algorithm, hence, directly executes Shamir’s secret sharing scheme. The shares so generated, can then be treated as random numbers and are reused as one of the points in further encoding of secrets. As a result, during the share construction (assuming previous shares are treated 45 as random number), we have used k−1 random numbers at each step (a length k−2 vector R and a sample Dk ij from the previous iteration, see figure 3.3). This along with the secret (or secret piece) are used to interpolate a kth degree polynomial which is sampled at n points at each step. Now, if the dummy pieces are preagreed pieces (such as special characters or zeros), then each player would know, two points, one the sample given to him and the other the dummy piece itself used during interpolation, and hence would need only k− 2 additional players to collude to recreate the node corresponding to that polynomial. This may lead to partial disclosure of secret. As a result, the dummy pieces chosen to stuff must be uniformly and randomly chosen from the field. Side information regarding the number of dummy pieces stuffed would convey to the players, how many trailing pieces are to be discarded. Assuming that the dummy pieces are randomly and uniformly chosen elements from the field, it is clear that k players need to collude in order to recreate the nodes in the last level of the tree and then proceed from there. However, since we have encoded additional secrets or created smaller effective shares, the overall security of the scheme is inversely proportional to the efficiency improvement factor. This security is in turn relative to the security achieved in information theoretically secure schemes. For example, if the efficiency improvement factor is 1, then the security of the scheme is the same as the security of an information theoretically security scheme. However, if the efficiency factor is 2, then the security is half relative to that provided by an information theoretically secure scheme. 3.5 Efficiency and height plots Figures 3.7 and 3.8 show the maximum efficiency improvement factor that can be achieved and the corresponding height h for it as the number of (smallest) pieces 46 vary in the original secret. Figure 3.7: Plot of maximum efficiency improvement factor with corresponding height h against m. Here n=2. 3.6 Recursive hiding in threshold visual cryptography The idea described in [24] is applied to images to develop a recursive 2 out of 3 visual cryptographic scheme [85]. For this purpose we divide each pixel into 3 subpixels as shown in figure 3.9. As seen in figure 3.9, when the partitions of white pixel are stacked upon each other one third of the pixel is white and hence appears light gray to human eye. However, the subpixels of the black pixel are so arranged that when 2 shares are stacked together, the resulting pixel is completely dark. Yet another way to create subpixels would be to have only one third of the subpixel colored dark. Therefore, when subpixels of a white pixel are stacked upon each other 47 (a) (b) Figure 3.8: Plot of maximum efficiency improvement factor with corresponding height h against m (a) n=3 (b) n=5. 48 Figure 3.9: Possible partitions for black and white pixels they would appear light gray and the stacking of the subpixels of a black pixel would result in dark gray. However, the human eye can perceives the difference between gray and completely dark pixels better than two different shades of gray itself. Hence our construction of subpixels in figure 3.9. As an example to make the working of the proposed scheme clear, we present in figure 3.10 the encoding of a 3×3 pixel image such that each share of the 3×3 image contains shares of a 1 pixel secret image and a 3×1 pixel secret image. The subpixels of an original pixel can be represented as a matrix. For example if the original pixel was black then the 3 shares representing it may be written as [100]T , [010]T , and [001]T . Since, these matrices can be stored as a sequence of bits; it implies that there is an expansion by a factor of 1×9=9 because the original black pixel can be represented as a single bit 1, while each of the 3 shares consists of 3 subpixels requiring 3 bits for their representation. If we were not to perform a recursive hiding, we would be creating 9×9=81 bits for each share corresponding to 9 pixels of the original image (figure 3.10). However, using recursive hiding we have been able 49 Figure 3.10: Illustration of recursive hiding of secret images in shares of larger original image using a 2outof3 threshold scheme to hide additional 1×9+3×9=9+27=36 bits of information in those 81 bits, thereby increasing the information conveyed per share of the original image. Higher efficiency could be achieved if we were to number the subpixels as 0, 1, and 2 and use prefix coding to represent these numbers and store them instead of storing the matrices or pixels. This would only lead to a per bit expansion factor of 5, instead of 9 and the efficiency improvement will be similar to that in the case of text, i.e. an improvement of 40%. Figure 3.11 shows the application of the proposed scheme to three images, smallest image being a Smiley face, next being a watermark and the third and the largest image being that of Lena. Figure 3.12 shows the reconstruction of hidden images after appropriately extracting the smaller shares from the shares of Lena. 50 Figure 3.11: Illustration of the process of recursive hiding of secrets in shares of larger original image 51 Figure 3.12: Illustration of regeneration of smaller images from the shares hidden inside the shares of the original larger image 52 3.7 Conclusions This paper has presented a recursive scheme for multisecret sharing, which in turn can be used to create shares of smaller sizes by dividing the secret into smaller pieces and then simulating multisecret sharing. The scheme forms a recursion tree and does not require any encryption key, unlike previously proposed computational secret sharing schemes. The efficiency and security of the scheme have been analyzed and it is seen that a efficiency improvement is achieved with a tradeoff against the security of the scheme and an inverse relation between the two has been established. The proposed scheme has widespread applications in secure distributed storage and information dispersal protocols by substituting the secret with the data or pieces of the data that is to be distributed. Further, it may be used as a steganographic channel to transmit hidden information in secret sharing, which may be used for authentication and verification of shares and the reconstructed secret itself. 53 CHAPTER 4 INTERNET VOTING PROTOCOL BASED ON IMPLICIT SECURITY One of the important applications of distributed data storage is in internet voting. In this chapter we see how secure distributed data storage can be leveraged to meet the requirements of internet voting systems. The basic idea involves treating a voter’s cast ballot as data and distribute pieces of this vote among different servers belonging to different authorities. The contrasting requirements of confidentiality and verifiability are fundamental to internet voting protocols. The requirements are such that they preclude a general solution but satisfactory solutions are possible for specific applications. Internet voting represents an application area where these requirements may be met using tools of cryptography and a distributed security architecture. In spite of certain limitations of current methods [86], Internet voting is expected to be used widely in the next few years [87] since it offers ease of access to senior citizens, disabled people, people traveling on electionday, citizens living abroad who are eligible to vote, and eliminates the hassle of obtaining an absentee ballot in advance. It also encourages larger participation on the part of the younger generation that has become accustomed to online banking, online shopping, secure email transactions and secure online storage. The acceptance of Internet voting will become wider if limitations of the current methods are lessened. Several voting schemes are to be found in the literature [88, 89, 90, 91, 92, 93, 94, 86]. Conventionally, blind signatures [95] are popular for achieving ballot secrecy, while anonymization is performed using MixNets that use multiple encryptions, decryptions and random permutations [89, 90, 91]. Other schemes treat votes as ecash 54 [94, 93, 96], exposing the voter and his ballot if a recast is attempted. Some schemes propose to protect intermediate election results [88] by asking the voter to encrypt the vote. However, it is not clear how the decryption key is to be provided at the time of vote counting. Often postal services are used as a part of the scheme to provide untappable channels [92], but their use would clearly defeat the purpose of Internet voting that aims at eliminating postal delays and paperwork. Further, such schemes are not suitable for large scale elections due to the overhead. In order to minimize the effects of viruses, Trojan horses and denial of service attacks [47], Internet elections are held over a period of several days, as in the case of Switzerland for two weeks [97], and in Estonia [98] for a few days. Despite the long voting periods, undecided voters and conservative people tend to wait until the last day to cast their ballots which causes the network attacks to be aimed at this last minute voting period. Further, proper design of ballots is a critical element in their effectiveness. Poorly designed interfaces and ballots can lead to voting errors (similar to errors in choosing wrong options in online bookings and purchases), with the unacceptable effect of a wrong candidate receiving the vote. Instances of such mistakes were seen in the California Recall elections [99]. Most existing protocols do not make any provisions to correct such mistakes. One could propose that voting be allowed only after the campaigning for the elections is over. But to ensure that campaigning cease on a certain date might be impossible to enforce, and the public information available on the media can lead people to change their mind. Further, such an idea would not eliminate mistakes in voting. Therefore, we believe that election systems should be designed to allow ballot recasts, i.e. a voter should be able to change his vote either during the period that the voting is open or during a specified time frame when ballot recasts are allowed. This will encourage a larger participation by voters in the early voting process because they 55 will now have an option of changing their votes if they wished to do so. We further believe that ballot recasts will act to deter vote selling, because the vote buyer cannot be sure if the seller has sold his vote to multiple parties. The scheme described in this paper allows voters to recast ballots, while protecting the intermediate election results using an implicit data security architecture. Rabin [6] spoke of distributed storage of data over different servers in order to protect the data. He, however, discounted the use of secret sharing schemes for data distribution because of their space inefficiency and resorted to an insecure method of distribution. However, since then several space efficient secret sharing schemes [23, 100, 85, 24] have been developed based on the idea of recursion. In recursive secret sharing, the original secret is broken into several pieces and the pieces are divided into shares one by one, by repeatedly using already generated shares. Since we only have a small secret, namely the cast ballot, to share between different servers, we use recursion to hide additional information within the shares to provide a mechanism for validation of shares. We call this model of security as implicit data security, where the word implicit signifies that no explicit encryption keys have been used, but the data is stored in a manner that every piece is implicitly secure. The problem of recast of ballots was considered in [101], where the intermediate election results are protected using secret sharing. However, the scheme has three shortcomings: the lack of redundancy in the system, where the loss of a single piece of the ballot results in a loss of that vote; lack of verifiability for the pieces brought together during the tallying process; and there is no mechanism for cheater identification. 4.1 Our Contribution We overcome the limitations of the scheme [101] and use the technique of generating anonymous IDs for ballot secrecy that eliminates the need for MixNets. These 56 anonymous IDs then become the anchor point for the votes and are recorded along with the cast ballot. Consequently, to change a vote, the voter only needs to send the anonymous ID and the new ballot, which replaces the old one in the system. Our contribution here is threefold: First, we propose a new recursive method for dividing the votes into n pieces such that loss of some of the pieces (less than a threshold k) does not result in the loss of the vote. Second, we present a method for validation/verification of the ballot pieces that are brought together at the time of tally. Third, a variant of the protocol is presented that can detect invalid ballot pieces as well as expose the cheating server. The improvements are implemented using recursion such that it does not lead to an increase in share size and stores only a small constant amount O(1) of information for verification. 4.2 The Proposed Protocol The protocol proceeds as follows: The voter contacts the registration authority (RA) using his real identity. He then randomly generates a number, blinds it and gets it signed by the RA. Upon receiving this signed number, he unblinds it and uses it as his anonymous ID. Consequently, to cast his ballot, the voter does not blind the ballot but simply uses the anonymous ID to vote. However, in order to protect the intermediate election results, the voter divides his ballot into several pieces and sends them to distinct servers. This provides a distributed security architecture. Before dividing the ballot, the voter encodes certain secret information within the pieces of the ballot and distributes a oneway hash of the secret information among the servers with the ballot pieces. We present a new procedure for dividing the votes into pieces such that additional 57 information may be hidden within the pieces. This hidden information can be used to validate the pieces at the time of ballot counting. Further, the additional information does not increase the share sizes and does not increase algorithmic complexity of the scheme being used. The procedure for registration and creating anonymous ID is similar to that described in [101]. Here, gx is the public key of the registration authority (RA) and x is its corresponding private key. All computations are performed modulo a prime p where g is a primitive root that is taken to be public knowledge. Further, mx denotes the signature on a message m, where 1 < m < (p − 1). A oneway hashing function h(·), used for verification, satisfies following properties [102]. 1. h(·) when applied to an argument produces a fixedsize output. 2. Given x, it is easy to compute h(x); but given h(x) it is computationally infeasible to determine x. 3. h(·) is collision free, i.e. it is computationally infeasible to find distinct x and y with h(x) = h(y). At the time of registration the voters send a oneway hash of the anonymous ID that they have generated. The registration authority maintains a list L of all the oneways hashes that have been used in order to avoid collisions between the anonymous IDs. It is assumed that the voter has n servers at disposal to cast his ballot; and any k of them are required to reconstruct the ballot, k n. Every vote V is assumed to be a number in Zp that is assigned to a candidate and is public knowledge. 4.2.1 Registration Phase 1. The voter randomly and uniformly chooses a number rid 2 Zp. 58 2. The voter randomly and uniformly picks a blinding factor b and computes u = rid · gb mod p. He sends a 3tuple to the registration authority (RA): u, Vid, h(rid); where Vid is his true identity in cleartext. 3. If the registration authority finds the voter eligible to vote (by checking his Vid, it checks if h(rid) is already present in L. If the hash is found, then the RA requests the voter to repeat steps 1 & 2. If the hash is not found, the RA adds the new h(rid) to L and sends to the voter ux. 4. The voter retrieves the signed rid as follows: (a) Using the RA’s public key compute v = (gx)b mod p and v−1. (b) Compute r = u · v−1 = (rid)x mod p. 5. The voter randomly and uniformly chooses an exponent c and generates: d = ((rid)xgx)c and sends it to the RA. 6. The RA sends back: e = d 1 x . 7. The voter compares if (rid · g)c is equal to e, he accepts (rid)x as valid signature. The voter at the end of registration phase has a valid signed rid that may be used to vote and proceeds to cast his ballot as follows. 4.2.2 Voting Phase 1. The voter contacts the online polling station using a secure shell (SSL) connection over the Internet and sends: (rid)x mod p, rid. 2. The polling station verifies the validity of rid by conducting a zeroknowledge challenge/response protocol with the RA to validate the signature. 59 3. If the signature is found valid the online polling station stores the voter’s rid and provides the voter with a secure session key that he can use to cast his ballot by contacting the n voting servers. 4. The voter generates a random secret message M. 5. The voter chooses the ballot V corresponding to the candidate that he wants to vote for and divides the ballot into n pieces using Algorithm A(V,M,k,n) (detailed below) which returns a set of n points, (j + k − 1,Dj), 8 j from 1 to n. 6. The voter sends to every server Sj : ((j + k − 1,Dj), h(M)), for all j = 1 to n. Algorithm A(V,M,k,n) takes as input: ballot V; message M; and integers k and n. Algorithm A(V,M,k,n)  Dealing Phase 1. Divide M into k − 2 pieces: m1, m2, ..., mk−2 2 Zp, such that concatenations of mis, taken in order, is equal to M. 2. Choose randomly and uniformly a number r1 and compute r2 = m1 · (r1)−1 mod p. 3. For i = 2 to k − 2. Compute ri+1 = mi · ( i j=1rj)−1 mod p. 4. Map ri as points (i, ri), 1 i k − 1. 5. Map the ballot V as point (0, V ). 6. Interpolate points (0, V ) and (i, ri), 1 i k − 1 as the polynomial f(x). 7. Evaluate n samples: Di = f(x), where x = k, k + 1, k + 2, . . . , (k + n − 1). 60 8. Output (i + k − 1,Di), 1 i n. To understand the working of Algorithm A, consider the following example. Example 1. Let the voter have n = 8 servers at his disposal to cast his ballot. Each of these servers may be at separate locations and controlled by an independent authority. Let k = 5 be the threshold number of servers that must remain honest and come together to reconstruct a vote correctly. Further, let the prime field chosen to work in be p = 257. Each candidate on the ballot is assigned a candidate ID which may be a random number from the field Zp. Consequently, to cast a ballot the voter must send this number (the candidate ID) to the servers. Assume, in order to implement authentication the voter chooses to hide a secret message M =“USA”, in his ballot. The message to be hidden is entirely a choice of the voter and is it may as well be “RUSSIA”. Let us assume that the voter wishes to cast a ballot to a candidate with candidate ID 157. Therefore, the cast ballot is V = 157. If we consider the voter’s secret message M=“USA” to be formed of ASCII characters, then M can be divided into three pieces such that m1 = 85, m2 = 83 and m3 = 65, such that each belongs to Zp. Under such conditions, the algorithm proceeds as follows, 1. Divide M into 3 pieces: m1 = 85, m2 = 83 and m3 = 65. 2. Choose randomly and uniformly a number r1 = 101 2 Zp and compute r2 m1 · r−1 1 85 · (101)−1 85 · 28 67 mod 257. 3. Compute r3 m2 · (r1 · r2)−1 83 · (101 · 67)−1 83 · 127 4 mod 257. 4. Compute r4 m3 · (r1 · r2 · r3)−1 65 · (101 · 67 · 4)−1 65 · 96 72 mod 257. 61 5. Map the ris as points (1, r1) = (1, 101), (2, r2) = (2, 67), (3, r3) = (3, 4), and (4, r4) = (4, 72). 6. Map the ballot V as point (0, V ) = (0, 157). 7. Interpolate the five points, (0, V ) and (i, ri), to generate a 4th degree polynomial, f(x) = 148x4 + 3x3 + 251x2 + 56x + 157 mod 257. 8. Evaluate f(x) at eight points (starting from x = k): D1 = f(5) = 128, D2 = f(6) = 240, D3 = f(7) = 173, D4 = f(8) = 160, D5 = f(9) = 131, D6 = f(10) = 227, D7 = f(11) = 29, and D8 = f(12) = 100. 9. Return (j + k − 1,Dj)s; where k = 5 and j = 1 to n. Note that in the above example, the voter has recursively encoded the pieces of his secret message into the shares of the ballot. Steps 24 generate 4 points that are required to simulate the 4 randomly chosen points in the Shamir’s secret sharing scheme. Step 7 executes Shamir’s scheme assuming ris are randomly chosen points and the ballot as the ycoordinate at x = 0. Finally, the algorithm returns 8 shares of the cast ballot, these shares have the secret message hidden within them. Each of these eight pieces is sent to a different voting server, along with the hash of the secret message. 4.2.3 Counting Phase Assume a threshold k, number of servers pool their pieces, (j + k − 1,Dj)s, together to reconstruct the votes. (If more than the threshold number of servers are available, then it may provide a mechanism for crosschecking and verification.) 1. Use Algorithm A  Reconstruction Phase (detailed below) to reconstruct the hidden message and the polynomial f0(x). 2. Concatenate m0 js in order to reconstruct M0. 62 3. Compute h(M0). 4. If h(M0) is equal to h(M), then the shares are valid and f(x) = f0(x). 5. Retrieve, the cast ballot as V = f0(0). Algorithm A  Reconstruction Phase takes as input a set of points (shares) and returns the pieces m0 j of the hidden message and a polynomial f0(x) constructed by interpolation of the shares. Algorithm A  Reconstruction Phase 1. Interpolate the shares (points) to reconstruct f0(x). 2. Sample f0(x) at points x = 1, 2, . . . , k − 1 to retrieve r0i , 1 i k − 1. 3. Reconstruct the hidden message as follows: (a) Do for j = 1 to k − 2: m0 j = i=j+1 i=1 r0i mod p. 4. Return m0 j , 1 j (k − 2), and f0(x). Example 2. Recall from example 1, we had created eight shares (or pieces) of the cast ballot and each of these pieces were sent to a different voting server. After the polling is closed, the servers combine their pieces to recreate the cast ballot. Since only k = 5 pieces are required for recreation, assume that the 5 pieces that are brought together for reconstruction of the vote are (6,D2) = (6, 240), (7,D3) = (7, 173), (9,D5) = (9, 131), (11,D7) = (11, 29), and (12,D8) = (12, 100). The reconstruction proceeds as follows, 1. Interpolate the shares to reconstruct the 4th degree polynomial f0(x) = 148x4+ 3x3 + 251x2 + 56x + 157. 63 2. Sample f0(x) at x = 0 to retrieve the ballot V 0 = f0(0). (We denote the retrieved ballot as V 0 because its validity is yet to checked, which is done in the following steps.) 3. Sample f0(x) at 4 points x = 1, 2, 3, 4: r01 = f0(1) = 101, r02 = f0(2) = 67, r03 = f0(3) = 4, and r04 = f0(4) = 72. 4. Reconstruct the pieces of the hidden message, (a) m0 1 = r01 · r02 = 85. (b) m0 2 = r01 · r02 · r03 = 83. (c) m0 3 = r01 · r02 · r03 · r04 = 65. 5. Concatenate m0 1, m0 2, and m0 3 to reconstruct M0. 6. Evaluate the hash: h(M0). 7. If h(M0)=h(M), then the shares are valid and the vote, V 0 = V , is counted in the tally. If more than 5 servers come together to recreate the ballot, then any 5 shares can be chosen at random and their validity checked. If the shares turn out to be invalid, then another set of 5 shares can be checked or the extra share can be used to cross check the reconstructed ballot. However, for detection of invalid pieces, only 5 shares are required. 4.2.4 Recasting of ballot If a voter wishes to change his vote, he may do so by contacting the online polling booth using his rid to authenticate himself and follow the procedure described above. The servers will overwrite his previously cast ballot partitions if any. 64 The protocol has been able to detect attempts of cheating by validating the pieces of the cast ballot. 4.3 Security of the proposed protocol The security of the proposed protocol depends on the security of the pieces created for the votes. First, in the dealing process we recursively encode the secret message M. We also provide each server with h(M). Hashing functions are known to be computationally secure and do not provide any information about the hidden message M, in practice. Since the message M is a randomly chosen secret message and r1 was randomly and uniformly chosen from the field, we may consider (i, ri)s as random points. We use these random points to interpolate a polynomial with the vote as the coefficient free term (as a point mapped at x = 0). This polynomial is then sampled at n new points. If k − 1 servers collude to cheat, they would have no knowledge of the kth point and by the properties of interpolation, all the polynomials will remain equally probable and therefore, all the ballots will remain equally probable, i.e. if there are m valid ballots, then each of them have a probability of 1 m. Now assume that an extremely powerful server is able to break the hashing function determining mis. The next task, in order to determine the vote, is to determine rjs. But it turns out that knowledge of M does yield all the values of rjs as shown below: m1 is chosen to be a product of two numbers r1 and r2, here r1 is randomly and uniformly chosen from the field and therefore given any number y from the field Pr(y = r1) = Pr(y = r2) = 1 p . Hence, the pieces of m1 are secure. However, ri, 3 i k − 1 may be determined, using the mis. Further, the voting protocol uses ris as points at x = 1 to k − 1 along with the ballot V at x = 0, i.e. V is the coefficient free term in the polynomial. It turns out that since we cannot determine r1 and r2 with a probability greater than a random guess, all polynomials remain equally probable, even if M is known. The 65 last step of division of ballots into pieces, although not the same, is close to Shamir’s implementation of secret sharing [3]. Figure 4.1 shows two graphs. Graph (a) illustrates the process of polynomial interpolation using (0, V ) and (i, ri), i = 1 to 8. Graph (b) depicts the fact that if M is known (by breaking of the hashing function), then r1 and r2 remain undetermined and do not reveal any information about the ballot because all the polynomials remain equally probable; a few polynomials are shown for illustration. Figure 4.1: (a) Polynomial interpolation using vote (0, V ) and (i, ri), i = 1 to 8. (b) Lack of knowledge of (1, r1) and (2, r2) leaves the polynomial undetermined; hence the vote remains secure. 66 The protocol further satisfies the various requirements of electronic voting: 1. Receipt freeness: The proposed protocol is receipt free. 2. Distributed security: It is assumed that at least one of the servers will be honest (an assumption similar to that of MixNet schemes) providing assurance to the voter that his vote will be counted as cast and any discrepancy will be detected with a very high probability. 3. Fairness: The ballot cast is divided into partitions during the voting phase and unless the partitions are known, all the ballots remain equally likely. 4. Unforgeability: An ineligible voter cannot cast a vote because the vote consists of two parts: the signed ballot and the certified rid. The ineligible voter needs to generate a random ID and forge the signature of the registration authority, i.e. solve for x, given g, p and gx mod p. This is equivalent to solving the discrete log problem, which is computationally infeasible. Leakage of information about the signature exponent is avoided by using a zeroknowledge challenge and response protocol for signature verification. 5. Unreusability: Since every eligible voter is issued only one rid, it cannot be used to cast multiple votes because if that is done the servers will simply overwrite the previous cast ballot for that rid. 4.4 A Variation for Identification of Cheating Server/s To identify the cheating server/s, we make use of the following properties of arithmetic coding (see [102]), 1. Let T = n i=1aipi−1, where 0 ai < p, then b T pj−1 cmod p aj 67 2. Let T = n i=1aip2(i−1) + n−1 i=1 cp2i−1, where −p < ai < p and 1 c < p, then b T p2(j−1) cmod p aj Following minimal changes are required in the proposed protocol to implement cheater identification. 4.4.1 Voting Phase We apply the voting phase as described before until step 5. Step 6 is replaced by the following 2 steps, 1. The voter computes T = n i=1h(Di)p2(i−1) + n−1 i=1 cp2i−1, where c 2 Zp is a random constant. 2. The voter sends to every server Sj : ((j + k − 1,Dj), h(M), T), for all j = 1 to n. Example 3. As an example to illustrate the operations described above, consider that k = n = 3 is the number of shares the voter has created for his ballot, given by (1,D1) = (1, 4), (2,D2) = (1, 2) and (3,D3) = (1, 5). Let p = 11 and let the voter choose a random number c = 7 from field Zp. Then before sending the shares of his ballot to the server the voter needs to compute T = n i=1h(Di)p2(i−1) + n−1 i=1 cp2i−1. For simplicity, consider a dummy hashing function with mapping h(x) = x + 1. Therefore, h(D1) = 4+1 = 5, h(D2) = 2+1 = 3 and h(D3) = 5+1 = 6. Consequently, T = 5 · 112·0 + 3 · 112·1 + 6 · 112·2 + 7 · (112·1−1 + 112·2−1) = 97608. The voter now deposits this value of T on every server along with his ballot. 4.4.2 Cheating Server Identification To identify a cheating server during reconstruction phase, following steps are performed, 68 1. Each server makes the above calculations and based on a majority of results, if c = 0, then Sj is honest, else Sj is cheating and its share is invalid. We see that c = 0 if b T p2(j−1) − T0 p2(j−1) c = h(Dj)−h(D0j ) = 0 mod p, which happens only when Dj=D0j , i.e. the pieces presented by the server are valid pieces. An analysis of the identification protocol can be found in [102], where the solution has been applied to secret sharing. Example 4. Continuing with example 3; assume that 3 servers have come together to reconstruct the ballot. Also assume that the reconstructed vote turns out to be invalid after checking the hash of the retrieved hidden message against the hash deposited by the voter (see example 2), then in order to determine which server is cheating, the servers perform the following. Each server presents its share. Let us assume that sever 2 has cheated and provided an incorrect share such that D02 = 3. Servers 1 and 3 have provided correct shares D01 = 4 and D03 = 5. The servers then compute, T0 = Sjh(D0j )p2(j−1) = 5 · 112·0 + 4 · 112·1 + 6 · 112·2 = 88335. Now for each server Sj , compute c = b T−T0 p2(j−1) c. Therefore, for server 1, j = 1, and c = b97608−88335 p2(1−1) c = 9273 0 mod 11. For server 2, j = 2, and c = b97608−88335 p2(2−1) c = 76 10 mod 11. For server 3, j = 3, and c = b97608−88335 p2(3−1) c 0 mod 11. From the above we observe c 6= 0 for server 2, which had provided an incorrect share, and we have successfully identified the cheating server. 4.5 Conclusions In the proposed protocol for Internet voting, the security of the cast ballot depends on numerous servers and the fairness property is satisfied by the use of a ballot partitioning scheme. No encryption/decryption key is used and there is no explicit 69 encryption of the votes. The partitions of the ballot provide implicit security. It overcomes the limitation of lack of redundancy and verifiability in our earlier scheme. Our proposed method divides the votes into pieces such that loss of some of the pieces (less than threshold) does not result in the loss of the vote. A method for validation/verification of the ballot pieces that are brought together at the time of tally, and, a variant of the protocol is presented that can detect invalid ballot pieces as well as expose the cheating server. Internet voting is just one of the areas of applications of the distributed data storage. 70 CHAPTER 5 CONCLUSIONS We have proposed new efficient data dispersal techniques that have applications in cloud computing, sensor networks and online distributed storage. Chapter 2 described the idea of implicit security. The framework of implicit security, intends to do away with traditional key based encryption systems. It combines the goal of fault tolerance with security by creating unintelligible partitions of data. We proposed efficient algorithms to create these partitions that may be sent to n independent servers for storage. The partitions were created by mapping the data as the coefficient free term of a kth degree polynomial and then computing the roots of the polynomial. These roots were then treated as the partitions of the data. However, all the roots of a polynomial are required to reconstruct the polynomial. Hence to introduce redundancy we created linear combinations of the roots which were then stored as data partitions. A compromise of less than k servers does not reveal any information about the data and n − k servers store redundant partitions. We looked at issues of data integrity once the original data is reconstructed from k partitions. We also looked at the problem of key partitioning by the means of polynomial multiplication in GF(2m). The algorithm presented has low computational overhead and is useful in sensor networks. Chapter 3 extended the scheme presented in the chapter 2 to include the paradigm of information dispersal. We proposed a recursive scheme to achieve this. Recursive schemes work by reusing the data partitions created for a data item to encode other data items. The proposed scheme increases the efficiency of data storage in the sense 71 that same number of partitions now carry more information than before. However, this comes at the cost of reduced security which, however, may be practical for many application. These recursive techniques may be used to send hidden authentication and verification information, acting as a steganographic channel. We also extended the idea of recursive hiding of data to visual cryptography wherein partitions of smaller images were stored within the partitions of a larger image. This can be used for the purpose of Digital Rights Management. Chapter 4 presented the application of algorithms developed in previous chapters to Internet voting. Secure Internet voting presents contrasting requirements of ballot secrecy and verifiability. We meet these requirements to certain extent under the stated assumptions. Tampering of recorded ballots is prevented by recursively hiding verification information within the ballot partitions. The presented techniques are applicable to cloud computing and has the potential to provide long term security of data. Since, a large number of data breaches happen due to weak passwords, unscrupulous employees and improper system configurations, distributing unintelligible partitions of the data over independently maintained servers makes the task of the attacker difficult. Further, a secure addressing scheme may be developed to store data partitions on a randomly chosen subset of servers from a server farm. We have seen that recursive techniques provide a method for verification of reconstructed data. If one recursively stores the hash of the data that is being partitioned then such a technique may be termed as embedded finger printing, in that the fingerprints (hash values) of the data is stored within the data partitions. 72 BIBLIOGRAPHY [1] J. J. Wylie, M. Bakkaloglu, V. Pandurangan, M. W. Bigrigg, S. Oguz, K. Tew, C. Williams, G. Ganger, and P. Khosla, “Selecting the right data distribution scheme for a survivable storage system,” in Technical report CMUCS01120, Carnegie Mellon University, 2001. [2] A. Leventhal, “Tripleparity raid and beyond,” Communications of the ACM, vol. 53, no. 1, pp. 58–63, 2010. [3] A. Shamir, “How to share a secret,” Commununications of the ACM, vol. 22, no. 11, pp. 612–613, 1979. [4] G. R. Blakley, “Safeguarding cryptographic keys,” in Proceedings of AFIPS National Computer Conference, pp. 313–317, 1979. [5] E. Karnin, J. Greene, and M. Hellman, “On secret sharing systems,” IEEE Transactions on Information Theory, vol. 29, pp. 35–41, January 1983. [6] M. O. Rabin, “Efficient dispersal of information for security, load balancing, and fault tolerance,” Journal of the ACM, vol. 36, no. 2, pp. 335–348, 1989. [7] G. R. Blakley and C. Meadows, “Security of ramp schemes,” in Proceedings of CRYPTO 84 on Advances in cryptology, (New York, NY, USA), pp. 242–268, SpringerVerlag New York, Inc., 1985. [8] H. Krawczyk, “Secret sharing made short,” in CRYPTO ’93: Proceedings of the 13th Annual International Cryptology Conference on Advances in Cryptology, (London, UK), pp. 136–146, SpringerVerlag, 1994. 73 [9] S. K. Mishra, S. K. Vemulapalli, and P. Mohapatra, “Dualcrosshatch disk array: A highly reliable hybridraid architecture,” in Proceedings of International Conference on Parallel Processing, pp. 146–149, 1995. [10] J.M. Fray, Y. Deswarte, and D. Powell, “Intrusiontolerance using finegrain fragmentationscattering,” IEEE Symposium on Security and Privacy, vol. 0, pp. 194–203, 1986. [11] Y. Deswarte, L. Blain, and J.C. Fabre, “Intrusion tolerance in distributed computing systems,” IEEE Symposium on Security and Privacy, vol. 0, pp. 110– 121, 1991. [12] M. Waldman, A. D. Rubin, and L. F. Cranor, “Publius: a robust, tamperevident, censorshipresistant web publishing system,” in Proceedings of the 9th conference on USENIX Security Symposium  Volume 9, (Berkeley, CA, USA), pp. 5–5, USENIX Association, 2000. [13] Y. Chen, J. Edler, A. V. Goldberg, A. Gottlieb, S. Sobti, and P. N. Yianilos, “A prototype implementation of archival intermemory,” in Proceedings of the fourth ACM conference on Digital libraries, DL ’99, (New York, NY, USA), pp. 28–37, ACM, 1999. [14] J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, C. Wells, and B. Zhao, “Oceanstore: an architecture for globalscale persistent storage,” in Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, ASPLOSIX, (New York, NY, USA), pp. 190–201, ACM, 2000. [15] W. J. Bolosky, J. R. Douceur, D. Ely, and M. Theimer, “Feasibility of a serverless distributed file system deployed on an existing set of desktop pcs,” in Pro 74 ceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, SIGMETRICS ’00, (New York, NY, USA), pp. 34–43, ACM, 2000. [16] A. Iyengar, R. Cahn, J. A. Garay, and C. Jutla, “Design and implementation of a secure distributed data repository,” Proceedings of the 14th IFIP Internet Information Security Conference, p. 123135, 1998. [17] H. Krawczyk, “Distributed fingerprints and secure information dispersal,” in Proceedings of the twelfth annual ACM symposium on Principles of distributed computing, PODC ’93, (New York, NY, USA), pp. 207–218, ACM, 1993. [18] J. A. Garay, R. Gennaro, C. Jutla, and T. Rabin, “Secure distributed storage and retrieval,” Theoretical Computer Science, vol. 243, pp. 363–389, July 2000. [19] C. Cachin and S. Tessaro, “Asynchronous verifiable information dispersal,” in 24th IEEE Symposium on Reliable Distributed Systems, 2005. SRDS 2005., pp. 191–201, October 2005. [20] A. Parakh and S. Kak, “Recursive secret sharing for distributed storage and information hiding,” in Proceedings of the 3rd international conference on Advanced networks and telecommunication systems, ANTS’09, (Piscataway, NJ, USA), pp. 88–90, IEEE Press, 2009. [21] A. Parakh and S. Kak, “Internet voting protocol based on improved implicit security,” Cryptologia, vol. 34, pp. 258–268, 2010. [22] S. Kak, “Cryptography for multilocated parties,” CoRR, vol. abs/0905.2977, 2009. 75 [23] A. Parakh and S. Kak, “A tree based recursive information hiding scheme,” in 2010 IEEE International Conference on Communications (ICC), pp. 1–5, may 2010. [24] M. Gnanaguruparan and S. Kak, “Recursive hiding of secrets in visual cryptography,” Cryptologia, vol. 26, pp. 68–76, January 2002. [25] A. Parakh and S. Kak, “Online data storage using implicit security,” Information Sciences, vol. 179, no. 19, pp. 3323–3331, 2009. [26] A. Parakh and S. Kak, “Internet voting protocol based on implicit data security,” in Proceedings of 17th International Conference on Computer Communications and Networks, 2008. ICCCN ’08., pp. 1–4, August 2008. [27] A. Parakh and S. Kak, “A distributed data storage scheme for sensor networks,” in Security and Privacy in Mobile Information and Communication Systems, vol. 17 of Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pp. 14–22, Springer Berlin Heidelberg, 2009. [28] A. Parakh and S. Kak, “A key distribution scheme for sensor networks using structured graphs,” in International Conference on Emerging Trends in Electronic and Photonic Devices Systems, 2009. ELECTRO ’09, pp. 10 –13, December 2009. [29] E. Gehringer, “Choosing passwords: security and human factors,” in 2002 International Symposium on Technology and Society. (ISTAS’02), pp. 369–373, 2002. [30] R. Proctor, M. Lien, K. Vu, and G. Salvendy, “Improving computer security for authentication of users: Influence of proactive password restrictions,” Behavior Research Methods, Instruments and Computers, vol. 34, pp. 163–169, 2002. 76 [31] S. Riley, “Password security: What users know and what they actually do,” Usability News, vol. 8.1, 2006. [32] S. Kak, “On the method of puzzles for key distribution,” International Journal of Parallel Programming, vol. 13, pp. 103–109, 1984. [33] N. Subramanian, C. Yang, and W. Zhang, “Securing distributed data storage and retrieval in sensor networks,” Pervasive and Mobile Computing, vol. 3, pp. 659–676, December 2007. [34] F. Ye, H. Luo, J. Cheng, S. Lu, and L. Zhang, “A twotier data dissemination model for largescale wireless sensor networks,” in Proceedings of the 8th annual international conference on Mobile computing and networking, MobiCom ’02, (New York, NY, USA), pp. 148–159, ACM, 2002. [35] G. Wang, W. Zhang, G. Cao, and T. La Porta, “On supporting distributed collaboration in sensor networks,” in IEEE Military Communications Conference, 2003. MILCOM 2003, vol. 2, pp. 752–757, October 2003. [36] W. Zhang, G. Cao, and T. La Porta, “Data dissemination with ringbased index for wireless sensor networks,” IEEE Transactions on Mobile Computing, vol. 6, pp. 832–847, July 2007. [37] B. Greenstein, S. Ratnasamy, S. Shenker, R. Govindan, and D. Estrin, “Difs: a distributed index for features in sensor networks,” Ad Hoc Networks, vol. 1, no. 23, pp. 333–349, 2003. Sensor Network Protocols and Applications. [38] D. Denning, Cryptography and Data Security. AddisonWesley Publishing Company, 1982. 77 [39] A. Renvall and C. Ding, “A nonlinear secret sharing scheme,” in Proceedings of the First Australasian Conference on Information Security and Privacy, ACISP ’96, (London, UK), pp. 56–66, SpringerVerlag, 1996. [40] T. Claveirole, M. de Amorim, M. Abdalla, and Y. Viniotis, “Securing wireless sensor networks against aggregator compromises,” IEEE Communications Magazine, vol. 46, pp. 134–141, April 2008. [41] A. Eldin, A. Ghalwash, and H. ElShandidy, “Enhancing packet forwarding in mobile ad hoc networks by exploiting the information dispersal algorithm,” Communications, Computers and Applications, 2008. MICCCA 2008. Mosharaka International Conference on, pp. 20–25, August 2008. [42] T. Yuan and S. Zhang, “Secure fault tolerance in wireless sensor networks,” CITWORKSHOPS ’08: Proceedings of the 2008 IEEE 8th International Conference on Computer and Information Technology Workshops, pp. 477–482, 2008. [43] M. H. Dehkordi and S. Mashhadi, “New efficient and practical verifiable multisecret sharing schemes,” Information Sciences, vol. 178, no. 9, pp. 2262 – 2274, 2008. [44] C. KuangHui and C.WenTsuen, “A new group key generating model for group sharing,” Information Sciences, vol. 64, no. 12, pp. 83 – 94, 1992. [45] C.Y. Lee, Y.S. Yeh, D.J. Chen, and K.L. Ku, “A probability model for reconstructing secret sharing under the internet environment,” Information Sciences, vol. 116, no. 24, pp. 109 – 127, 1999. [46] S. Kak, “A cubic publickey transformation,” Circuits, Systems and Signal Processing, vol. 26, pp. 353–359, 2007. 78 [47] “Expanding the use of electronic voting technology for uocava citizens.” United States Department of Defense, May 2007. [48] R. Sinnott, T. Selker, B. Lewis, B. Whelan, J. Williams, and J. McBridem, “Evaluation of voting machine, peripherals and software.” In First Report of the Commission on Electronic Voting on the Secrecy, Accuracy and Testing of the Chosen Electronic Voting System Appendix 2C, Dublin 2004. [49] “Reuters  second life. http://secondlife.reuters.com/stories/2008/05/07/eveonline experimentswithvirtualdemocracy/.” [50] “Second life herald. http://www. secondlifeherald.com.” [51] A. BharuchaReid and M. Sambandham, Random Polynomials. New York: Academic Press, 1986. [52] A. Aho, J. Hopcroft, and J. Ullman, The Design and Analysis of Computer Algorithms. AddisonWesley, 1974. [53] D. Knuth, The Art of Computer Programming Vol 2. AddisonWesley, 1969. [54] T. Moon, Error Correction Coding: Mathematical Methods and Algorithms. WileyInterscience, 2005. [55] S. Kak, “Exponentiation modulo a polynomial for data security,” International Journal of Parallel Programming, vol. 12, pp. 337–346, 1983. [56] L. Dickson, Linear Groups with an Exposition of the Galois Field Theory. Dover Publications, 1958. [57] K. Rosen, Discrete Mathematics and its Applications. McGrawHill, 2007. 79 [58] M. Naor and A. Shamir, “Visual cryptography,” in Advances in Cryptology EUROCRYPT’94, vol. 950 of Lecture Notes in Computer Science, pp. 1–12, Springer Berlin / Heidelberg, 1995. [59] P. Feldman, “A practical scheme for noninteractive verifiable secret sharing,” in 28th Annual Symposium on Foundations of Computer Science, 1987, pp. 427– 438, October 1987. [60] T. P. Pedersen, “Noninteractive and informationtheoretic secure verifiable secret sharing,” in CRYPTO ’91: Proceedings of the 11th Annual International Cryptology Conference on Advances in Cryptology, (London, UK), pp. 129–140, SpringerVerlag, 1992. [61] A. Herzberg, S. Jarecki, H. Krawczyk, and M. Yung, “Proactive secret sharing or: How to cope with perpetual leakage,” in CRYPTO ’95: Proceedings of the 15th Annual International Cryptology Conference on Advances in Cryptology, (London, UK), pp. 339–352, SpringerVerlag, 1995. [62] J. He and E. Dawson, “Multistage secret sharing based on oneway function,” Electronics Letters, vol. 30, pp. 1591–1592, Sep 1994. [63] B. Schneier, Schneier’s Cryptography Classics Library: Applied Cryptography, Secrets and Lies, and Practical Cryptography. Wiley Publishing, 2007. [64] P. Rogaway and M. Bellare, “Robust computational secret sharing and a unified account of classical secretsharing goals,” in CCS ’07: Proceedings of the 14th ACM conference on Computer and communications security, (New York, NY, USA), pp. 172–184, ACM, 2007. [65] V. Vinod, A. Narayanan, K. Srinathan, C. P. Rangan, and K. Kim, “On the power of computational secret sharing,” in INDOCRYPT, pp. 162–176, 2003. 80 [66] P. B´eguin and A. Cresti, “General short computational secret sharing schemes,” in Proceedings of the 14th annual international conference on Theory and application of cryptographic techniques, EUROCRYPT’95, (Berlin, Heidelberg), pp. 194–208, SpringerVerlag, 1995. [67] L. Harn, “Efficient sharing (broadcasting) of multiple secrets,” IEE Proceedings  Computers and Digital Techniques, vol. 142, pp. 237–240, May 1995. [68] J.K. J. H.Y. Chien and Y.M. Tseng, “A practical (t,n) multisecret sharing scheme,” IEICE transactions on fundamentals of electronics, communications and computer sciences, vol. 83, no. 12, pp. 2762–2765, 2000. [69] L.J. Pang and Y.M. Wang, “A new (t,n) multisecret sharing scheme based on shamir’s secret sharing,” Applied Mathematics and Computation, vol. 167, no. 2, pp. 840–848, 2005. [70] C.C. Yang, T.Y. Chang, and M.S. Hwang, “A (t,n) multisecret sharing scheme,” Applied Mathematics and Computation, vol. 151, no. 2, pp. 483–490, 2004. [71] C.W. Chan and C.C. Chang, “A scheme for threshold multisecret sharing,” Applied Mathematics and Computation, vol. 166, no. 1, pp. 1 – 14, 2005. [72] G. Horng, T. Chen, and D.S. Tsai, “Cheating in visual cryptography,” Designs, Codes and Cryptography, vol. 38, no. 2, pp. 219–236, 2006. [73] R. D. Prisco and M. Yung, eds., Security and Cryptography for Networks, 5th International Conference, SCN 2006, Maiori, Italy, September 68, 2006, Proceedings, vol. 4116 of Lecture Notes in Computer Science, Springer, 2006. [74] H. Luo and J.S. Pan, “Hiding multiple watermarks in transparencies of visual cryptography,” in IIHMSP ’07: Proceedings of the Third International Con 81 ference on International Information Hiding and Multimedia Signal Processing (IIHMSP 2007), (Washington, DC, USA), pp. 303–306, IEEE Computer Society, 200 



A 

B 

C 

D 

E 

F 

I 

J 

K 

L 

O 

P 

R 

S 

T 

U 

V 

W 


