Tfstate.json in /tmp is empty sometimes

(apologies in advance for the length of this post)

I have two stacks: application depends on peered-vnets.

When I execute a terraspace all plan, I notice that some of my variables in the application stack are not being resolved. After some sleuthing, I discovered that the downloaded peered-vnet tfstate files is zero bytes.

Here’s what I see when I enable debug logging:

Downloading tfstate for stack: peered-vnets
=> terraform init -get -input=false >> /tmp/terraspace/log/init/peered-vnets.log
=> cd /home/me/Projects/cloud-infrastructure/.terraspace-cache/canadacentral/dev/stacks/peered-vnets && terraform state pull > /tmp/terraspace/remote_state/stacks/peered-vnets/state.json

Notice the resulting file is zero bytes:

$ ls -l /tmp/terraspace/remote_state/stacks/peered-vnets/state.json
-rw-rw-r--. 1 me me 0 Jan 26 15:13 /tmp/terraspace/remote_state/stacks/peered-vnets/state.json

However… I can do the following without any issues:

$ export ARM_TENANT_ID=00000000-0000-0000-0000-000000000000
$ export ARM_SUBSCRIPTION_ID=00000000-0000-0000-0000-000000000000
$ export ARM_CLIENT_ID=00000000-0000-0000-0000-000000000000
$ export ARM_CLIENT_SECRET=00000000-0000-0000-0000-000000000000
$ cd /home/me/Projects/cloud-infrastructure/.terraspace-cache/canadacentral/dev/stacks/peered-vnets
$ terraform init
...
Terraform has been successfully initialized!
...
$ terraform state pull
{
  "version": 4,
  "terraform_version": "1.1.4",
  ...
}

I’m wondering if perhaps the terraform init in the debug output is being executed before the cd /home/.../peered-vnets command, resulting in an uninitialized backend? The debug output seems to suggest that, but I really don’t know for sure.

Any help would be greatly appreciated.

Just for completion sake, here’s what I was seeing in the terraspace output:

DEBUG: (Output vnet_subnets was not found for the application tfvars file. Either peered-vnets stack has not been deployed yet or it does not have this output: vnet_subnets)

I’m still debugging this, and I haven’t really made much progress… However, I’m wondering if the reason my pulled tfstate file is zero bytes is because the backend of each stack is different…

My application’s backend is:

terraform {
  backend "azurerm" {
    resource_group_name  = "rg-terraform-canadacentral"
    storage_account_name = "{removed}"
    container_name       = "terraform-state"
    key                  = "dev/stacks/application/terraform.tfstate"
  }
}

And my peered-vnet backend is:

terraform {
  backend "azurerm" {
    resource_group_name  = "rg-terraform-canadacentral"
    storage_account_name = "{removed}"
    container_name       = "terraform-state"
    key                  = "common/stacks/peered-vnets/terraform.tfstate"
  }
}

Is it possible that when trying to pull the peered-vnets tfstate file, it’s erroneously using the application’s backend configuration? Just a thought.

Dug into this. Here’s a repo put together to help debug: https://github.com/tongueroo/terraspace-graph-demo-azurerm

Here’s the terraspace all up

Was able to see a zero-byte file, but only briefly.

$ find /tmp/terraspace/remote_state/stacks -name state.json | xargs ls -lh
-rw-rw-r-- 1 ec2-user ec2-user 156 Jan 26 23:52 /tmp/terraspace/remote_state/stacks/b1/state.json
-rw-rw-r-- 1 ec2-user ec2-user 156 Jan 26 23:52 /tmp/terraspace/remote_state/stacks/b2/state.json
-rw-rw-r-- 1 ec2-user ec2-user 156 Jan 26 23:52 /tmp/terraspace/remote_state/stacks/b3/state.json
-rw-rw-r-- 1 ec2-user ec2-user 796 Jan 26 23:52 /tmp/terraspace/remote_state/stacks/c1/state.json
-rw-rw-r-- 1 ec2-user ec2-user 156 Jan 26 23:52 /tmp/terraspace/remote_state/stacks/c2/state.json
-rw-rw-r-- 1 ec2-user ec2-user 804 Jan 26 23:52 /tmp/terraspace/remote_state/stacks/c3/state.json
-rw-rw-r-- 1 ec2-user ec2-user 156 Jan 26 23:52 /tmp/terraspace/remote_state/stacks/d1/state.json
-rw-rw-r-- 1 ec2-user ec2-user 156 Jan 26 23:52 /tmp/terraspace/remote_state/stacks/e1/state.json
$ find /tmp/terraspace/remote_state/stacks -name state.json | xargs ls -lh
-rw-rw-r-- 1 ec2-user ec2-user 156 Jan 26 23:52 /tmp/terraspace/remote_state/stacks/b1/state.json
-rw-rw-r-- 1 ec2-user ec2-user 156 Jan 26 23:52 /tmp/terraspace/remote_state/stacks/b2/state.json
-rw-rw-r-- 1 ec2-user ec2-user 156 Jan 26 23:52 /tmp/terraspace/remote_state/stacks/b3/state.json
-rw-rw-r-- 1 ec2-user ec2-user 796 Jan 26 23:52 /tmp/terraspace/remote_state/stacks/c1/state.json
-rw-rw-r-- 1 ec2-user ec2-user 156 Jan 26 23:52 /tmp/terraspace/remote_state/stacks/c2/state.json
-rw-rw-r-- 1 ec2-user ec2-user 804 Jan 26 23:52 /tmp/terraspace/remote_state/stacks/c3/state.json
-rw-rw-r-- 1 ec2-user ec2-user   0 Jan 26 23:52 /tmp/terraspace/remote_state/stacks/d1/state.json
-rw-rw-r-- 1 ec2-user ec2-user 156 Jan 26 23:52 /tmp/terraspace/remote_state/stacks/e1/state.json
$ find /tmp/terraspace/remote_state/stacks -name state.json | xargs ls -lh
-rw-rw-r-- 1 ec2-user ec2-user  804 Jan 26 23:54 /tmp/terraspace/remote_state/stacks/b1/state.json
-rw-rw-r-- 1 ec2-user ec2-user 1.4K Jan 26 23:54 /tmp/terraspace/remote_state/stacks/b2/state.json
-rw-rw-r-- 1 ec2-user ec2-user  796 Jan 26 23:54 /tmp/terraspace/remote_state/stacks/b3/state.json
-rw-rw-r-- 1 ec2-user ec2-user  796 Jan 26 23:54 /tmp/terraspace/remote_state/stacks/c1/state.json
-rw-rw-r-- 1 ec2-user ec2-user  798 Jan 26 23:54 /tmp/terraspace/remote_state/stacks/c2/state.json
-rw-rw-r-- 1 ec2-user ec2-user  804 Jan 26 23:54 /tmp/terraspace/remote_state/stacks/c3/state.json
-rw-rw-r-- 1 ec2-user ec2-user  792 Jan 26 23:54 /tmp/terraspace/remote_state/stacks/d1/state.json
-rw-rw-r-- 1 ec2-user ec2-user  802 Jan 26 23:54 /tmp/terraspace/remote_state/stacks/e1/state.json
$ 

Basically, had to keep running the find command until spot a zero-byte file. At least in the debugging session, believe what is happening is at the beginning of the terraspace state pull command it will zero-byte the file until terraform actually pulls the state info and then updates the state.json.

terraform state pull > /tmp/terraspace/remote_state/stacks/d1/state.json

When the terraform state pull command finishes, always got a non-zero-byte file though. So believe that’s what’s happening.

RE: DEBUG: (Output vnet_subnets was not found for the application tfvars file. Either peered-vnets stack has not been deployed yet or it does not have this output: vnet_subnets)

Will sometimes get that warning because terraspace actually does 2 passes of all the tfvars files to figure out the dependency graph. It’s covered here: https://terraspace.cloud/docs/dependencies/tfvars/considerations/#dependency-resolution-internals

Dug through the source and think remember this. Originally, attempted to only build and deploy each terraspace stack files as needed for each batch pass. But think it was quite complex with the forking and parallelization of the terraspace up calls. Particularly, the multiple terraspace up processes would clean up and overwrite the same files. There were race conditions since the filesystem is shared. To simplify the parallelization logic, implemented it so that terraspace would write all the files at once for each batch pass. This results in this annoying call to fetch the terraform state for all stacks right now though. Think will revisit this and improve the implementation. Unsure exactly when will get to this though :face_with_monocle: Will consider PRs and no sweat either way of course :+1:

Also, somewhat related: https://github.com/boltops-tools/terraspace_plugin_azurerm/issues/7 Would like to get to this also. Also, will consider PRs.

Hope at least that helps answer the question.

I experimented a little bit this morning and noticed some divergent behavior when running terraspace vs running terraform directly.

When enabling terraform debug logging via TF_LOG=DEBUG, I can see the following output from terraform state pull when I run it from .terraspace-cache/canadacentral/dev/stacks/peered-vnets/:

2022-01-27T07:27:50.316-0330 [INFO]  Terraform version: 1.1.4
2022-01-27T07:27:50.316-0330 [INFO]  Go runtime version: go1.17.2
2022-01-27T07:27:50.316-0330 [INFO]  CLI args: []string{"terraform", "state", "pull"}
2022-01-27T07:27:50.316-0330 [DEBUG] Attempting to open CLI config file: /home/me/.terraformrc
2022-01-27T07:27:50.316-0330 [DEBUG] File doesn't exist, but doesn't need to. Ignoring.
2022-01-27T07:27:50.316-0330 [DEBUG] ignoring non-existing provider search directory terraform.d/plugins
2022-01-27T07:27:50.316-0330 [DEBUG] ignoring non-existing provider search directory /home/me/.terraform.d/plugins
2022-01-27T07:27:50.316-0330 [DEBUG] ignoring non-existing provider search directory /home/me/.local/share/terraform/plugins
2022-01-27T07:27:50.316-0330 [DEBUG] ignoring non-existing provider search directory /home/me/.local/share/flatpak/exports/share/terraform/plugins
2022-01-27T07:27:50.316-0330 [DEBUG] ignoring non-existing provider search directory /var/lib/flatpak/exports/share/terraform/plugins
2022-01-27T07:27:50.316-0330 [DEBUG] ignoring non-existing provider search directory /usr/local/share/terraform/plugins
2022-01-27T07:27:50.316-0330 [DEBUG] ignoring non-existing provider search directory /usr/share/terraform/plugins
2022-01-27T07:27:50.316-0330 [INFO]  CLI command args: []string{"state", "pull"}
2022-01-27T07:27:50.318-0330 [INFO]  Testing if Service Principal / Client Certificate is applicable for Authentication..
2022-01-27T07:27:50.318-0330 [INFO]  Testing if Multi Tenant Service Principal / Client Secret is applicable for Authentication..
2022-01-27T07:27:50.318-0330 [INFO]  Testing if Service Principal / Client Secret is applicable for Authentication..
2022-01-27T07:27:50.318-0330 [INFO]  Using Service Principal / Client Secret for Authentication
2022-01-27T07:27:50.318-0330 [INFO]  Getting OAuth config for endpoint https://login.microsoftonline.com/ with  tenant {removed}
2022-01-27T07:27:50.318-0330 [DEBUG] Obtaining a ADAL / Azure Active Directory Graph token for Resource Manager..
2022-01-27T07:27:50.709-0330 [DEBUG] checking for provisioner in "."
2022-01-27T07:27:50.713-0330 [DEBUG] checking for provisioner in "/usr/bin"
2022-01-27T07:27:50.713-0330 [DEBUG] Building the Blob Client from an Access Token (using user credentials)
2022-01-27T07:27:50.714-0330 [DEBUG] backend/remote-state/azure Request:

I can then see terraform send multiple requests to the backend to fetch the state.

However, when I use the same TF_LOG=DEBUG flag in terraspace, this is the output:

Downloading tfstate for stack: peered-vnets
=> terraform init -get -input=false >> /tmp/terraspace/log/init/peered-vnets.log
=> cd /home/me/Projects/cloud-infrastructure/.terraspace-cache/canadacentral/dev/stacks/peered-vnets && terraform state pull > /tmp/terraspace/remote_state/stacks/peered-vnets/state.json
2022-01-27T07:21:55.674-0330 [INFO]  Terraform version: 1.1.4
2022-01-27T07:21:55.674-0330 [INFO]  Go runtime version: go1.17.2
2022-01-27T07:21:55.674-0330 [INFO]  CLI args: []string{"terraform", "state", "pull"}
2022-01-27T07:21:55.674-0330 [DEBUG] Attempting to open CLI config file: /home/me/.terraformrc
2022-01-27T07:21:55.674-0330 [DEBUG] File doesn't exist, but doesn't need to. Ignoring.
2022-01-27T07:21:55.674-0330 [DEBUG] ignoring non-existing provider search directory terraform.d/plugins
2022-01-27T07:21:55.674-0330 [DEBUG] ignoring non-existing provider search directory /home/me/.terraform.d/plugins
2022-01-27T07:21:55.674-0330 [DEBUG] ignoring non-existing provider search directory /home/me/.local/share/terraform/plugins
2022-01-27T07:21:55.674-0330 [DEBUG] ignoring non-existing provider search directory /home/me/.local/share/flatpak/exports/share/terraform/plugins
2022-01-27T07:21:55.674-0330 [DEBUG] ignoring non-existing provider search directory /var/lib/flatpak/exports/share/terraform/plugins
2022-01-27T07:21:55.674-0330 [DEBUG] ignoring non-existing provider search directory /usr/local/share/terraform/plugins
2022-01-27T07:21:55.674-0330 [DEBUG] ignoring non-existing provider search directory /usr/share/terraform/plugins
2022-01-27T07:21:55.674-0330 [INFO]  CLI command args: []string{"state", "pull"}
2022-01-27T07:21:55.674-0330 [DEBUG] New state was assigned lineage "b3221820-a4c2-c93e-bd72-04da02443571"
2022-01-27T07:21:56.067-0330 [DEBUG] checking for provisioner in "."
2022-01-27T07:21:56.071-0330 [DEBUG] checking for provisioner in "/usr/bin"
DEBUG: (Output vnet_subnets was not found for the dol-adl tfvars file. Either peered-vnets stack has not been deployed yet or it does not have this output: vnet_subnets)

There is clearly something different happening when terraspace tries to pull the state. I will keep trying to figure out why, but if you have any suggestions on a direction to take, please let me know.

One more thing: this problem doesn’t occur when I do a terraspace plan application; only when I do terraspace all plan (if that helps).

When doing terraspace plan application, I see some extra terraform logging that isn’t present when I do a terraspace all plan:

2022-01-27T07:42:33.685-0330 [TRACE] Meta.Backend: BackendOpts.Config not set, so using settings loaded from backend.tf:4,3-20
2022-01-27T07:42:33.685-0330 [TRACE] Meta.Backend: built configuration for "azurerm" backend with hash value 1593754220
2022-01-27T07:42:33.686-0330 [TRACE] Preserving existing state lineage "50e150f7-fb64-1062-561c-d00c662f5b44"
2022-01-27T07:42:33.686-0330 [TRACE] Preserving existing state lineage "50e150f7-fb64-1062-561c-d00c662f5b44"
2022-01-27T07:42:33.686-0330 [TRACE] Meta.Backend: working directory was previously initialized for "azurerm" backend
2022-01-27T07:42:33.686-0330 [TRACE] Meta.Backend: using already-initialized, unchanged "azurerm" backend configuration

Then then precedes to download the state file from my Azure storage container.

It sounds like you see the issue consistently vs intermittently which what was guessing was happening above.

terraform state pull > /tmp/terraspace/remote_state/stacks/d1/state.json

IE: The file is only zero-byte at the very beginning until the terraform state pull command remotely fetches the info.

If it’s consistently a zero-byte file, unsure why then :thinking: Since Ruby is an interpreted language, you can add debugging puts in the library code itself: https://github.com/boltops-tools/terraspace/blob/master/lib/terraspace/terraform/remote_state/fetcher.rb#L66 Maybe double-check the backend.tf and add a debugging puts like this:

system "cd #{@child.cache_dir} && cat backend.tf" # <= ADD THIS LINE
command = "cd #{@child.cache_dir} && terraform state pull > #{state_path}"

To figure out where terraspace is installed:

bundle info terraspace --path

Also, enable log debug level in

config/app.rb

Terraspace.configure do |config|
  config.logger.level = :debug
end

Also, hoping to get a repo with a reproduction of the issue like in this post: Plan stack trace error after making change to existing infra That’ll give us a better fighting chance to figure out what’s going on.

Also, made some pretty big changes to how terraspace all building and processing works in terrspace 1.1. Wondering when you get a chance you can also give terraspace 1.1 a try. So:

Hi Tung… I tried version 1.1.0 and my problem no longer exists. Thanks so much for whatever you did to fix it.

Awesome :confetti_ball::champagne: The main change is this PR https://github.com/boltops-tools/terraspace/pull/196 Details in there

I did run into a new issue with v1.1.0… I will create another thread to explain.