It's no surprise that SQL Server offers a variety of ways to encrypt the values in a column, but there are some things to keep in mind when deciding whether column encryption is the right path to choose. I did some testing a while ago, but the findings still hold true and as such, I find myself referring back to the results more frequently than I would have thought so decided to put it out there for friends and colleagues as well as making it easier for myself to find next time I need to.
For some fields (a password field is a good example) you might not need to ever decrypt the information since getting the plain-text value isn't necessary as you can simply encrypt the input value and compare it to the stored value to determine a match. For times when the user forgets the password, the system should encrypt a new, randomly generated value and send that to them rather than decrypting the stored value and sending that to them in plain text (I hate web sites that do that).
My testing involved encrypting a field by passphrase, 64-bit DES, and 256-bit AES on a table with close to 1 Billion records and I have outlined my findings below:
Initial encryption of plain-text field:
- An index was required on the column in order to optimize exclusion of null values and empty strings, this took approximately 45 minutes to complete (based on 3 tests).
- Once the index was created, setting an encrypted value only took around 15-20 minutes depending on the type and encryption algorithm used (looping through 500,000 records at a time).
- CPU spiked periodically, but usually stayed around 40-60% throughout the process (I don't recall the hardware, but as I said, this was several years ago so YMMV).
Some things to keep in mind when deciding between passphrase and key (if you go with SQL encryption at all vs. hash/ compare at the code-level):
- With any encryption function (passphrase or key) if the passphrase is incorrect or the key is not available or open, the return result will just be null, no error will be thrown.
- If using replication or any HA solution with data available for access on a separate server, symmetric keys need to be synchronized in all locations for encryption/ decryption to work; if these are not in sync, no error will be thrown, but encryption and decryption will fail so it may not be readily apparent.
- If the symmetric key is not open, EncryptByKey() and DecryptByKey() functions will complete successfully, but inserted/ encrypted value will be null for EncryptByKey() and selected values will be null for DecryptByKey() which again, may not be readily apparent. This can be safe-guarded against with a query to sys.OpenKeys ensuring that the key is open, but this adds more overhead and room for errors.
- Null values in the field will be ignored, but an empty string will still be encrypted adding some overhead.
- VarBinary columns for encrypted passwords must be created with “Ansi_Padding = On” or any trailing 0’s will be removed causing invalid data when converted back to VarChar.
- Published testing results from other companies’ tests have shown an average increase of 3-5 (but up to 7) times the resource hit (all CPU and I/O) using symmetric key encryption compared to no encryption (completely anecdotal and I found that when implemented correctly, there was hardly any noticeable increase in resource utilization).
The biggest drawback was the silent errors that it throws if the wrong keys are opened or if there is no key opened at all and there is no exceptionally good checking available to make that determination due to increased overhead of every call.