Selective Disclosure of Digitally-signed XML Documents

Richard Todd Carlson and Dr. Kent Seamons, Computer Science

The Internet and related technologies allow for the transmission and consumption of electronic data with ease. Such transactions often involve the exchange of sensitive information so security and privacy concerns often limit the extent to which these technologies are used. Systems that aim to protect security and privacy during electronic transactions must allow individuals and organizations to leverage the efficiency and global opportunities that the Internet provides. This paper describes a system to allow fine-grained control over the privacy of sensitive information stored in general-purpose electronic documents.

Given the varied hardware and software configurations of machines that serve and access electronic information, it is desirable to ensure that a system dealing with on-line transactions is independent of these differences. Extensible Markup Language (XML) has become a ubiquitous standard for document exchange because it can be generated and consumed regardless of differences between computer systems. It is natural, therefore, to choose XML as the format of documents to be secured.

Suppose that Alice wishes to engage in an on-line transaction to rent a car. As part of the transaction, the car rental agency needs proof that Alice is a licensed driver and that she meets the minimum age requirements for insurance coverage. Alice has a digital driver’s license that has been electronically signed by the state DMV which she can use to establish both her right to drive and her age. However, she doesn’t wish to disclose her height, weight, eye color, donor status, or other irrelevant information to the car rental agency. Alice needs a way to choose which parts of the license to disclose.

Any system designed to allow Alice to selectively disclose the information in her driver’s license must enforce both confidentiality and integrity. In this particular case, confidentiality requires that only those whom Alice authorizes are able to view the sensitive information contained in her license. At the same time, the car rental agency, or other authorized party, needs to be sure that the values that she discloses are trustworthy.

Several steps must be taken to provide both confidentiality and integrity. Alice begins with an XML document that represents her physical driver’s license and contains all of the pertinent information. She then decides which values in the document are sensitive and need to be masked. Alice generates a series of XPath expressions that identify all values she wishes to protect. For each value, v, which is matched by the XPath expressions, she generates a cryptographically strong random number, n, that is exactly 128 bits long. This value, n, is appended to v to prevent brute-force attacks. The result is run through a collision-resistant one-way cryptographic hash function that produces a masked value, m. Alice then generates a modified document by replacing each v in the document with the corresponding m.

The resulting modified document is confidential because it is impossible to determine the original value of any of the masked attributes unless disclosed by Alice. In order to guarantee integrity, Alice needs the document to be reviewed and signed by a trusted third party. In this case, the DMV of the state in which she lives will be that third party. Alice submits the modified document, along with the original document, the XPath expressions used, and all triples, (v, n, m) that were used to mask values in the document. The DMV will need to inspect each value replaced, and, using the same one-way hash function, verify that the appropriate masked values have been stored in the modified document. Having thus verified the information in the document, the DMV will digitally sign the modified document. Assuming the car-rental agency with which Alice wishes to do business trusts the DMV that signed her license, that agency can trust the contents of the license.

When Alice then wishes to disclose a particular value in the document, she will need to establish a secure channel with the recipient and send the original value, v, and the appropriate random number, n. The recipient would then invoke the same one-way hash function on these values and compare the result with the masked value in the document. If these values match, the recipient will accept the value v. Alice now has a single document that she can distribute at will, knowing that sensitive information in that document cannot be uncovered without her authorization.

As mentioned, a random number is appended to each value to be masked in order to prevent brute-force attacks. Without this number, it would be easy to determine some of the values in the document. For example, knowing that Alice is probably between 16 and 99 years old, one could easily generate the hash values for 16 through 99 then find the one that matches the masked value in Alice’s license. Appending the random number makes it computationally infeasible to brute-force guess values that have been masked.

This technique for selective disclosure generalizes to any XML document, regardless of purpose. It is independent of specific hardware or software considerations. Further, it allows each document to be used in any appropriate scenario, regardless of the amount of sensitive information to be disclosed. Such a general-purpose solution will allow both individuals and organizations to maintain privacy and security during on-line transactions, thus enabling them to more fully realize the benefits offered by the Internet.

Brigham Young University

Journal of Undergraduate Research

Selective Disclosure of Digitally-signed XML Documents

Richard Todd Carlson and Dr. Kent Seamons, Computer Science