What is the expected behavior when Terraspace needs to destroy a resource another stack depends on

I have 2 stacks in my project:

  • VPC - Creates VPC related resources, subnets, nat gateways, etc
  • EC2 - Creates EC2 instances that depend on the subnets of the VPC stack.

When drastically changing the VPC stack and running terraspace all up, terraform wants to destroy the VPC and an rebuild it. However, terraspace doesn’t seem to destroy the EC2 stack first before making these changes resulting in DependencyViolation…

╷
│ Error: error deleting EC2 Subnet (subnet-0b15a9d49f45e0863): DependencyViolation: The subnet 'subnet-0b15a9d49f45e0863' has dependencies and cannot be deleted.
│ 	status code: 400, request id: 869e8c21-54ca-4f78-8450-8850bbeb53bc
│ 
│ 
╵

Is this the expected behavior?

Interesting.

So Terraspace only resolves the dependency graph based on the wiring of inputs and outputs. It cannot see whether or not a resource drastically changes and requires full replacement. This is because the dependency graph resolution happens at terraspace compile-time before terraform runtime.

Within the terraform runtime, terraform performs its own graph resolution and then applies with its own terraform runtime. Something like this:

terraspace compile time -> terrspace runtime
       |                            |
  terraspace graph         terraform runtime
                                    |
                           terraform graph -> terraform apply

Some details are also covered here: Dependencies Tfvars Considerations - Terraspace

Unsure if there’s a way to determine the terraform graph resolution from the “outside”. So don’t think this will be done and can be done easily. Open to thoughts and if there is a way, though.

Thanks for the above information :slightly_smiling_face:

My lack of knowledge on the code base and the language might negatively impact my idea.

Doesn’t terraspace keep track of the outputs from each stack and how they are being inputted into another?

Couldn’t detection of destroyed resources and tacking their outputs to know what upstream stacks need to be destroyed?

Maybe not even automatically destroying them but suggesting what stacks need to be destroyed before UP can be applied?

  • VPC outputs subnets
  • EC inputs subnets

I guess this also means I have a follow up question.

If I had 3 stacks?

  • VPC
  • EC2
  • ALB (load balancer)

And the number of EC2 changed, the ALB takes a input of the EC2 instance ids, will the plan only show the change in EC2 stack? But because the outputs "have potential to change surely any dependencies on the stacks output must need to know that they may not have the correct information yet?

I understand that terraspace only tracks dependency on compile time at the moment, but I think having a few hooks between each run of a batch during ‘terraspace all’ would help resolve this problem.

For example: check if there had been changes to the expected outputs that are hooked up to inputs as dependencies after a batch is finished, but before starting the next. the dependencies would need to be tracked after compile time for this. If a dependency changed (due to a redeploy for example) either warn the user and stop the second batch or adjust to the new input automatically by recompiling the stack of the second batch (should be a choice based on argument).

I think the easiest way to do this is to create a map of expected outputs of a stack and their current values. Then after the deploy of this stack, check again if the values changed and if so, recompile the following stacks. Rince and repeat for each stack/batch run.

This definitily is not trivial, but I think it’s something I would consider a requirement for the long-term success of terraspace. The stack dependency system is basically the biggest strength of terraspace and the Nr 1 feature we use it for, but right now it has quite some limitations unfortunately.

@tung After some more thinking I think this becomes a even bigger problem.

If all dependencies are resolved before the deployment starts and it only has the same value as before until you run it again.

What’s to stop Terraspace from reporting a change in 1 stack and no changes in dependent stacks, this could be very dangerous if you are not aware that change will impact a stack that has a resource hosting non-ephemeral data like DB/RDS for example.

Obviously this only a problem because we are not told about upstream changes.

Contextually this may be unhelpful but I figured I will include it to help visualise the issue. Please see the below screenshots from a same CI/CD pipeline run:

Terraspace all plan:
Imgur

Terraspace all deploy:
Imgur

Ultimately this means we can’t not trust the plan, which I think is dangerous because we can’t see any upstream changes, in this case with a load balancer it doesn’t matter but This could matter with something like a RDS / DB instance.

@tung I am interested if you have an opinion on this problem or what your thoughts are?

Generally I come from a operational background and feedback about your stack changes and how it affects upstream stacks would be very helpful. It would also allow for possible features to automatically destroy dependable stacks in the correct order.

Short answer: Unsure.

Longer answer:

Dug into it a bit. Considered a simple vpc and a security group. Here’s an example repo to help: https://github.com/tongueroo/infra-replace

Here’s some debugging to look at the plan. Deployed everything. And then changed the VPC CIDR and ran these commands to see what info was available in the plan.

terraspace build vpc
cd .terraspace-cache/us-west-2/dev/stacks/vpc
terraform plan --out plan.binary
terraform show -json plan.binary | jq -r '.' > plan.json

cd - # back to terraspace project

terraspace build sg
cd .terraspace-cache/us-west-2/dev/stacks/sg
vim 1-dev.auto.tfvars # change to "(known after apply)"
terraform show -json plan.binary | jq -r '.' > plan.json

It looks like the plan.json has info about resource that will need to be “replaced”. So Terraspace would be able to see, to a degree, if resources require “replacement”, IE: ["delete", "create"].

Note: The terraspace dependency calculation currently happens as part of the compile phase, and outputs are recalculated as part of each batch run and “refeed” into each batch since outputs are not known until after apply. So there is some runtime processing element as part of terraspace all already.

Thinking as part of the terraspace dependency graph calculation would have to do some additional passes.

  • Pass 1: Resolve the graph based on inputs and outputs wiring. The way it’s currently done.
  • Pass 2: Additional pass does a plan within the order from the first resolved pass 1.
  • Pass 3: Using the additional info, recalculate the final graph with additional terraspace down operations at the beginning.

Terraspace could run terraspace down on the child sg stack first and then run terraspace up on the vpc and then sg.

However, Terraspace might have make some additional assumptions. The example repo has some random_pet resources to kind of help illustrate this. Let’s say those resources need to be replaced. Terraspace might have to assume any resources within the stack that require replacement will result in the whole stack being flagged as a possible candidate for a terraspace down. Unsure if it’s possible to know to which resource is being replaced and consider that without making it even more complicated.

Would like to attempt this. Unsure when, though. It’s a matter of time :clock1: Attempted terraspace all several times before eventually figuring it out. :tada: Hoping it’s possible without it being too complex :crossed_fingers: Unsure.

Wondering if there are examples from other tools or someone that has done something similar that could help. I don’t believe there are, but if so, it might be worth studying. Maybe terraform source itself? Will review and consider PRs. Of course. No sweat either way. :+1:

1 Like