Verifying File Uploads with MD5 Hashes in Terraform
When uploading files to cloud storage services like AWS S3 using Terraform, it's crucial to ensure the integrity of the uploaded data. One common method is to use MD5 hashes to verify that the file was uploaded correctly. This guide will show you how to implement this verification in Terraform.
Calculating MD5 Hash
First, you need to calculate the MD5 hash of your local file. Terraform provides the filemd5()
function for this purpose:
locals {
file_path = "${path.module}/files/example.zip"
file_md5 = filemd5(local.file_path)
}
Uploading File to S3
Next, you can use this MD5 hash when uploading the file to S3:
resource "aws_s3_object" "example_upload" {
bucket = "my-bucket"
key = "uploads/example.zip"
source = local.file_path
etag = local.file_md5
}
The etag
attribute is set to the MD5 hash of the file. S3 will use this to verify the integrity of the uploaded file.
Verifying the Upload
After the upload, you can use a null_resource
with a local-exec
provisioner to verify the MD5 hash of the uploaded file:
resource "null_resource" "verify_upload" {
triggers = {
file_md5 = local.file_md5
}
provisioner "local-exec" {
command = <<EOT
UPLOADED_MD5=$(aws s3api head-object --bucket my-bucket --key uploads/example.zip --query ETag --output text | tr -d '"')
if [ "$UPLOADED_MD5" != "${local.file_md5}" ]; then
echo "MD5 mismatch: expected ${local.file_md5}, got $UPLOADED_MD5"
exit 1
else
echo "MD5 verified successfully"
fi
EOT
}
depends_on = [aws_s3_object.example_upload]
}
This null_resource
will run after the S3 object is created. It uses the AWS CLI to retrieve the ETag (which is the MD5 hash for objects uploaded in a single part) and compares it to the local MD5 hash.
Handling Large Files
For files larger than 5GB, S3 uses multipart uploads, and the ETag is not a simple MD5 hash. In these cases, you'll need to implement a more complex verification process:
resource "aws_s3_object" "large_file_upload" {
bucket = "my-bucket"
key = "uploads/large_file.zip"
source = local.large_file_path
# For multipart uploads, we can't use etag for verification
}
resource "null_resource" "verify_large_upload" {
provisioner "local-exec" {
command = <<EOT
LOCAL_MD5=$(md5sum ${local.large_file_path} | cut -d ' ' -f 1)
REMOTE_MD5=$(aws s3api get-object --bucket my-bucket --key uploads/large_file.zip /tmp/downloaded_file && md5sum /tmp/downloaded_file | cut -d ' ' -f 1)
if [ "$LOCAL_MD5" != "$REMOTE_MD5" ]; then
echo "MD5 mismatch for large file"
exit 1
else
echo "Large file MD5 verified successfully"
fi
rm /tmp/downloaded_file
EOT
}
depends_on = [aws_s3_object.large_file_upload]
}
This method downloads the file from S3 and calculates its MD5 hash locally for comparison.
Best Practices
- Use
filemd5()
for consistent hash calculation in Terraform. - Always verify uploads, especially for critical files.
- Be aware of the 5GB limit for simple ETag verification.
- For large files, consider implementing chunked verification to avoid downloading the entire file.
- Use Terraform's
depends_on
to ensure verification happens after upload.
By implementing these practices, you can ensure the integrity of your file uploads when using Terraform with cloud storage services.