AWS S3 Glacier Vault Deletion

April 6th 2020

AWS S3 Glacier provides a cost effective way to store large amounts of data in a cloud infrastructure that delivers 99.999999999% durability, and provide comprehensive security and compliance capabilities. Since S3 Glacier cost are low – up too $1 per terabyte per month – the retrieval for archived data is limited and throttled.

Standard retrievals typically complete between 3-5 hours, and work well for less time-sensitive needs like backup data, media editing, or long-term analytics. Bulk retrievals are the lowest-cost retrieval option, returning large amounts of data within 5-12 hours. 

https://aws.amazon.com/glacier/

The AWS Developer Console does provide some high level access to your Glacier vaults but it doesn’t allow you to access the vaults “inventory” which are typically archives of data. One common issue is that when it comes to deleting a vault, the vault has to be empty and all archives have to first be deleted before you can delete a vault. Unfortunately, you can’t use the AWS developer console to manage or delete your archives so we need to turn to another method:

You cannot delete an archive using the Amazon S3 Glacier (S3 Glacier) management console. To delete an archive you must use the AWS Command Line Interface (CLI) or write code to make a delete request using either the REST API directly or the AWS SDK for Java and .NET wrapper libraries. 

docs.aws.amazon.com

In this tutorial we will look at using the aws cli to get our glacier vault’s inventory by initiating a job of type inventory-retrieval, get the output of our job we initiated in JSON format, extract our archive ID’s from the JSON output file, and create a bash script to delete all the archives in our vault. Finally, we’re able to delete the vault once the inventory (archives) have been deleted.

Setting Up AWS CLI

The AWS CLI has good documentation and package installation scripts for the major operating systems. Navigate to https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html and select the type of operating system you are installing the cli. Once you are done with the installation we can configure the aws cli. Before we can configure we will need to log on to our aws console and create a root access key ID and secret access key. You can login to your aws at console.aws.amazon.com and then navigate to https://console.aws.amazon.com/iam/home?#/security_credentials and you can toggle “Access Keys” dropdown to see a list of your root access key ID and secret access keys. Once you have obtained your access key ID and secret you will want to also take note of the region that your glacier vault is located under. In your terminal you can type: aws configure and you will be prompt with the configuration options for your cli:

AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]: json

You can fill in the credentials using your access key ID secret access key and the default region. You can double check the configuration by typing: aws glacier help to see the information and command options for the aws glacier service. Now that we have the aws cli configured we can move on to initiating a job to retrieve our inventory for our vault.

AWS Initiate Job Vault Inventory Retrieval

To keep cost low for storage S3 Glacier requires you to queue jobs that will be run at specific interval times (usually between 3 – 5 hours). To get a list of our archives on the vault we will initiate a job of type “inventory-retrieval” which will be queued up for us. We can initiate a job with the command:

aws glacier initiate-job --account-id - --vault-name my-vault --job-parameters '{"Type": "inventory-retrieval"}'

For account id you can use a - for aws cli to use the default account that is associated with the credentials otherwise you can find your account id by logging in to the aws developer console and navigating to: https://console.aws.amazon.com/billing/home?#/account. You will also need the name of the vault that you want to delete. Once the job is initiated we can copy the jobId that is returned to us in our response object from the cli call. Now that we have aa job initiated we can check on the status of the job by using the command:

aws glacier describe-job --account-id - --vault-name my-vault --job-id zbxcm3Z_3z5UkoroF7SuZKrxgGoDc3RloGduS7Eg-RO47Yc6FxsdGBgf_Q2DK5Ejh18CnTS5XW4_XqlNHS61dsO4CnMW

Make sure you replace the --job-id with the jobId field that was returned in the initate-job call. You should see an output like this:

{
    "InventoryRetrievalParameters": {
        "Format": "JSON"
    },
    "VaultARN": "arn:aws:glacier:us-west-2:0123456789012:vaults/my-vault",
    "Completed": false,
    "JobId": "zbxcm3Z_3z5UkoroF7SuZKrxgGoDc3RloGduS7Eg-RO47Yc6FxsdGBgf_Q2DK5Ejh18CnTS5XW4_XqlNHS61dsO4CnMW",
    "Action": "InventoryRetrieval",
    "CreationDate": "2019-07-17T20:23:41.616Z",
    "StatusCode": "InProgress"
}

If we wait for 3 – 5 hours we can run this command again to check the Completed field to see if it changed to true and the StatusCode is Succeeded. Once those field show that our job is completed we’re ready to use the aws glacier get-job-output to get the output file in JSON format for our inventory retrieval job:

aws glacier get-job-output --account-id - --vault-name my-vault --job-id zbxcm3Z_3z5UkoroF7SuZKrxgGoDc3RloGduS7Eg-RO47Yc6FxsdGBgf_Q2DK5Ejh18CnTS5XW4_XqlNHS61dsO4CnMW archives.json

We are calling the output file archives.json since we are getting the list of archives. You should now have a file that looks similar to:

{
   "VaultARN":"arn:aws:glacier:us-west-2:0123456789012:vaults/my-vault",
   "InventoryDate":"2015-04-07T00:26:18Z",
   "ArchiveList":[
      {
         "ArchiveId":"kKB7ymWJVpPSwhGP6ycSOAekp9ZYe_--zM_mw6k76ZFGEIWQX-ybtRDvc2VkPSDtfKmQrj0IRQLSGsNuDp-AJVlu2ccmDSyDUmZwKbwbpAdGATGDiB3hHO0bjbGehXTcApVud_wyDw",
         "ArchiveDescription":"multipart upload test",
         "CreationDate":"2015-04-06T22:24:34Z",
         "Size":3145728,
         "SHA256TreeHash":"9628195fcdbcbbe76cdde932d4646fa7de5f219fb39823836d81f0cc0e18aa67"
      }
   ]
}

You can see we have an key called "ArchiveList" which is an array of archives. If you had automated backups for your glacier vault your archive list will probably be in the thousands of archives. We are going to extract the values of "ArchiveList" into a separate file with an ArchiveId each on there separate line. To do this we’re going to use a utility called jq

The jq utility is included in all the major Linux distributions repositories, therefore installing it is very easy: we just need to use our favorite package manager. If we are using Debian, or a Debian-based distribution such as Ubuntu or Linux Mint, we can use apt.

You can find installation instructions by going to: https://stedolan.github.io/jq/download/ and selecting your operating system. Once jq is installed navigate to the folder that you created your archives.json file in. We are going to extract all the ArchiveId fields into their own file by using the jq utility to create a new file called archiveIds.txt. In the command line type:

jq '.ArchiveList[] | .ArchiveId' archives.json > archiveIds.txt

We should now have a file archiveIds.txt that should look similar to:

"-VyqkL3K_tsjzq6rQtW6Jn7hH9iWLP0JK0bZ1vCbN9BMQMOUbC47v3kNXXA7G1-TEX0GZX7PaswKaapcC5vd1p-Qd90I7eUS4OaV9PcRcKOmxsW0-oSuhTIcw7_iAQtz5nnwrbng6g"
"JVQ_3z1gnaO8426cVldSNgiq9bxU_DVTXAg__fHTkW_ltI1m8sTpsveCjptOB1b1jacffTD4U9zVee3CsIprirxskeX9He6zEwyN6QjOwipE8Un4wxgyMD0t20bdkKyxD-TG8XkzFQ"
"bOVmyfhfsmFEoB28mPVcoFB4FKypobGWCpjuTlDj9Vn6a-eOhUDGbz7Yl3-SDPnifDMjrEZC5Ry6FCA4oCPjpVlAySs8KFDeSvuDiq_UL8zvLCOSgN9hQA_fAedDZv_H8KGgsysmZA"
"pHfxHLLYB3Oa8ofDJVgwXuo81P3l6juPXLmK4oBVs4oY2XeQmTXc2uPVO4baI4JTqb-yURiNi_4AHywHOT5NKBUWCXN3SjcMY05jWVOwdKschqaii8g1t-BED29njf5C5I02UdWzww"
"4dO9q8NITDhTPhWK9-SSlZzRCFikGfzMR0ebfRWR5p67OOZXVBlnYFt0h-Y-NS88ZUDHtbnYIEdoGG-4B4GsdAbGED338N1thgyMQFN_4jGhy39GPyShbpgD1LjRcLxoDnJrMphy_Q"
"6v8fsOair7bqXVRbCkZRw69cGnNCCxa6D_Ar1QlNVj0pfxuo5UpWrkVfvelWzsiqyHFiX1DSjOcJO6F0_zJXHowqowS37kZaRyTt5Lf7cK_NFWXBeEJGH4x5Rx7IIBqIvRzEIUfRPQ"
"69IefXL6q6w7VnrJLTBRyV2C3d-lLv6x5jIgb6VCNuT6oFQsaCI3UIJpCs_m-XimG2q8lrLtQIkV3RITKHs3rVbbCP43xNJoa7SrABSebcrPq09YUm1ByRtm0zj3qkKbn3nsbgMdJQ"
"KGrvf3t8_vh7AipaL3Cn_c1NzzZ7ODNbOrvYpnzOpnUJgFvcTNOwC3d1cKI3_MbZ085Ps0uO2q2hGENVBpmoSu11KUjkLujd_AeSLAEwCrbTdj0y1js-FrUlDB5tO1OgHqS7whUJwQ"
"j-l-EqNZgmT0voRWNFZVlKXYZZWMTt3Pcr2JeKqa95lEJdrIURDsN7ZiOjLyPAPJEK-Y0IxzmNGx9COjZ2fgNaQkgDFkmOtUitKSp-b0Fv5aFOOaXTGC1-XhV9_UzF5yiY8EWi2Ucw"
"3DOKmIAqcEFw4burpvoY5d8t_feNy7raN2ZOunTDTO4bw1zgBtLlERdbk_BkFUO_s8ho2IXcC1vSFOcNCFVokNsZdYH5cItMIikySa0vj-mXmOv9ciVS-f3O9N0rbTWTtpkohuZ2oQ"

We can now use this file to loop through all of the lines of the file using each line to make a call to aws to delete the archive using aws glacier delete-archive. Let’s use a bash script to loop through the lines in our file and initiate the call to aws to delete our archive. In the same folder that archiveIds.txt is in you can create file called delete.sh

#!/bin/bash

vault="$1"
file="$2"
count=1

# If file exists
if [[ -f "$file" ]]
then
  while IFS= read -r line
  do
    echo "Deleting Archive $line"
    echo "Count: $count"
    aws glacier delete-archive --account-id - --vault-name $vault --archive-id=$line 
    count=$((count+1))
  done < "$file"
fi

We are storing the first argument $1 in a variable called vault which is our vault name and the second argument $2 the file that we are using to loop through – in our case archiveIds.txt. We are then making sure that the file exists and if so we are looping through the file using the Internal Field Separator (IFS) to read each line and storing the line as a variable accessible via $line. We are then using the command: aws glacier delete-archive --account-id - --vault-name $vault --archive-id=$line which will call the command we want to delete our inventory. We can now use this bash script by navigating to the folder archiveIds.txt is in and running:

bash delete.sh my-vault $(pwd)/archiveIds.txt 

Make sure you are replacing my-vault with the name of your vault. Depending on how many archives you need to delete this command can take awhile to loop through all of the ArchiveId fields and you can monitor the output of the file and the count of how many archives were deleted in the terminal. Once the script is complete you will have to wait 3 – 5 hours for the inventory in the AWS Developer Console to update for your vault.

Deleting the AWS Glacier Vault

Once all of the archives for the vault are deleted and there is no more data on the vault you are able to delete your vault. You can do this through the AWS Developer Console by navigating to Amazon S3 Glacier Vaults in your console and selecting the vault you want to delete before clicking “Delete Vault”. You can also use the aws cli to delete the vault by running: aws glacier delete-vault --vault-name my-vault --account-id -

It is a lot of steps to take to delete a vault on AWS S3 Glacier but hopefully this tutorial helps walk you through the steps to remove the archives and delete the vault from your AWS account. Until next time, stay curious, stay creative!