Hello BoltOps community,
I’m migrating Terraform to Terraspace and my end goal is to organize multiple stacks with reusable modules and these stacks are separated by teams and individual services in a team.
For example, a Team A contains 5 services (containing 10 up to 50 AWS resources for each service) used to be managed by Terraform and they are organized by the environment: development, staging, and production where each environment contains all 5 services. It is more or less the 2nd approach of this: Terraform Statefile Approaches and Thoughts - Terraspace.
The goal is to separate each 5 services in a team into multiple stacks and reusable modules and also each logical separation would be managed by individual Terraform states, so that each logical separation by stack could be planned and upped by Terraspace separately. For example, a team A - service 1 can separately make changes to the Stack 1A (called 1A for the stack of the team 1 for service 1) while the Stack 1B can be applied in parallel. Following the “general approaches” in the document, it is to go for the option 3 or 4.
Now, I have around 44 stacks (each stack has it owns the state - hosted Terraform state in Gitlab Terraform State) and there are more to come and I would expect to end up with 100+ stacks.
There are dependencies among some of these stacks as well such as a common stack to be used by other stacks.
I did test to see how Terraspace scales and works for all these stacks. I clean up cache and try to initialize “terraspace all init” and “terraform all plan” and I found that I observed an error message “Error Could not load plugin” and I found that CPU utilization reached 100% and memory consumption was significantly increased as well. My machine is MacBook Air M2 with 16GB or RAM and I see at some points that swap was used up to nearly 600MB during the terraspace initializing and planning (before running Terraspace, it showed 0 Swap used).
In addition, before observing “Error: Could not load plugin”, I see 10+ processes of "terraform-provider-aws_v4.53.0_x5 and terraform processes and once the error was shown, all of these provider-aws process gone and some terraform processes were gone and gradually loading up again.
Once I observed this error, it will slow down and attempt to initialize and load the failed plugin again. Eventually, it could finish “init” and “plan”.
For more information, this migration is based on an old Terraform version and I have a plan to life it up later.
Terraform version: 0.14.9
Ruby: 3.1.5
Terraspace version: 2.2.16
Provider - AWS: 4.53.0
MacOS: 13.4
I read Config Reference - Terraspace and see the “all.concurrency” default to 5. However, I’m not quite sure that this is something to do with adjusting more suitable all.concurrency with many stacks or not?
My question is what else could I do in order to make sure that I can resolve this kind of errors or rather how to optimize terraspace to run 100+ stacks and dependencies?
Any suggestion and recommendation would be much appreciated.
Thank you very much.
Best regards,
Sunsern