Skip to content

Verifying File Uploads with MD5 Hashes in Terraform

When uploading files to cloud storage services like AWS S3 using Terraform, it's crucial to ensure the integrity of the uploaded data. One common method is to use MD5 hashes to verify that the file was uploaded correctly. This guide will show you how to implement this verification in Terraform.

Calculating MD5 Hash

First, you need to calculate the MD5 hash of your local file. Terraform provides the filemd5() function for this purpose:

locals {
  file_path = "${path.module}/files/example.zip"
  file_md5  = filemd5(local.file_path)
}

Uploading File to S3

Next, you can use this MD5 hash when uploading the file to S3:

resource "aws_s3_object" "example_upload" {
  bucket = "my-bucket"
  key    = "uploads/example.zip"
  source = local.file_path

  etag = local.file_md5
}

The etag attribute is set to the MD5 hash of the file. S3 will use this to verify the integrity of the uploaded file.

Verifying the Upload

After the upload, you can use a null_resource with a local-exec provisioner to verify the MD5 hash of the uploaded file:

resource "null_resource" "verify_upload" {
  triggers = {
    file_md5 = local.file_md5
  }

  provisioner "local-exec" {
    command = <<EOT
      UPLOADED_MD5=$(aws s3api head-object --bucket my-bucket --key uploads/example.zip --query ETag --output text | tr -d '"')
      if [ "$UPLOADED_MD5" != "${local.file_md5}" ]; then
        echo "MD5 mismatch: expected ${local.file_md5}, got $UPLOADED_MD5"
        exit 1
      else
        echo "MD5 verified successfully"
      fi
    EOT
  }

  depends_on = [aws_s3_object.example_upload]
}

This null_resource will run after the S3 object is created. It uses the AWS CLI to retrieve the ETag (which is the MD5 hash for objects uploaded in a single part) and compares it to the local MD5 hash.

Handling Large Files

For files larger than 5GB, S3 uses multipart uploads, and the ETag is not a simple MD5 hash. In these cases, you'll need to implement a more complex verification process:

resource "aws_s3_object" "large_file_upload" {
  bucket = "my-bucket"
  key    = "uploads/large_file.zip"
  source = local.large_file_path

  # For multipart uploads, we can't use etag for verification
}

resource "null_resource" "verify_large_upload" {
  provisioner "local-exec" {
    command = <<EOT
      LOCAL_MD5=$(md5sum ${local.large_file_path} | cut -d ' ' -f 1)
      REMOTE_MD5=$(aws s3api get-object --bucket my-bucket --key uploads/large_file.zip /tmp/downloaded_file && md5sum /tmp/downloaded_file | cut -d ' ' -f 1)
      if [ "$LOCAL_MD5" != "$REMOTE_MD5" ]; then
        echo "MD5 mismatch for large file"
        exit 1
      else
        echo "Large file MD5 verified successfully"
      fi
      rm /tmp/downloaded_file
    EOT
  }

  depends_on = [aws_s3_object.large_file_upload]
}

This method downloads the file from S3 and calculates its MD5 hash locally for comparison.

Best Practices

  1. Use filemd5() for consistent hash calculation in Terraform.
  2. Always verify uploads, especially for critical files.
  3. Be aware of the 5GB limit for simple ETag verification.
  4. For large files, consider implementing chunked verification to avoid downloading the entire file.
  5. Use Terraform's depends_on to ensure verification happens after upload.

By implementing these practices, you can ensure the integrity of your file uploads when using Terraform with cloud storage services.