Thanks for the continued work on this @tung! I started looking at updating my Jets apps to Ruby runtime 3.2 after seeing the end-of-support notice from Amazon. I’m getting hung up on the blue-green deploy process though and not sure where I got off track.
First upgraded Ruby locally and updated my Gemfile to include jets 4.0, easy enough. I followed the blue-green deployment doc and added config.extra = 1
to my application.rb
. Deployed, and it looks like it didn’t create a new environment, but updated my existing lambda functions to 3.2 runtime. Hmm, is the blue-green deploy even needed? Was config.extra
the wrong place to set the extra env?
Then I checked my exception reporting tool and saw some errors:
Aws::Lambda::Errors::AccessDeniedException User: arn:aws:sts::652695076726:assumed-role/bot-dev-JetsPreheatJob-1O7VL-JetsPreheatJobIamRole-1HDTBRWMP1UOK/bot-dev-jets-preheat_job-torch is not authorized to perform: lambda:InvokeFunction on resource: arn:aws:lambda:us-east-1:652695076726:function:bot-dev-jets-preheat_job-warm because no identity-based policy allows the lambda:InvokeFunction action
I’m guessing this is related to not doing the blue-green deploy, but I’m not sure how exactly? Is the issue just in the PreheatJob?
Next, I tried deploying with JETS_EXTRA=1 bundle exec jets deploy
but that failed quickly:
Building CloudFormation templates.
bundler: failed to load command: jets (/Users/nate/.rbenv/versions/3.2.2/bin/jets)
/Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/aws-sdk-core-3.181.0/lib/aws-sdk-core/param_validator.rb:35:in `validate!': missing required parameter params[:stack_name] (ArgumentError)
raise ArgumentError, error_messages(errors) unless errors.empty?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
from /Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/aws-sdk-core-3.181.0/lib/aws-sdk-core/param_validator.rb:15:in `validate!'
from /Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/aws-sdk-core-3.181.0/lib/aws-sdk-core/plugins/param_validator.rb:25:in `call'
from /Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/aws-sdk-core-3.181.0/lib/seahorse/client/plugins/raise_response_errors.rb:16:in `call'
from /Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/aws-sdk-core-3.181.0/lib/aws-sdk-core/plugins/checksum_algorithm.rb:111:in `call'
from /Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/aws-sdk-core-3.181.0/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:16:in `call'
from /Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/aws-sdk-core-3.181.0/lib/aws-sdk-core/plugins/idempotency_token.rb:19:in `call'
from /Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/aws-sdk-core-3.181.0/lib/aws-sdk-core/plugins/param_converter.rb:26:in `call'
from /Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/aws-sdk-core-3.181.0/lib/seahorse/client/plugins/request_callback.rb:89:in `call'
from /Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/aws-sdk-core-3.181.0/lib/aws-sdk-core/plugins/response_paging.rb:12:in `call'
from /Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/aws-sdk-core-3.181.0/lib/seahorse/client/plugins/response_target.rb:24:in `call'
from /Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/aws-sdk-core-3.181.0/lib/seahorse/client/request.rb:72:in `send_request'
from /Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/aws-sdk-cloudformation-1.88.0/lib/aws-sdk-cloudformation/client.rb:2859:in `describe_stack_resource'
from /Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/jets-4.0.4/lib/jets/cfn/builders/api_gateway_builder.rb:81:in `existing_domain_name_on_stack?'
from /Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/jets-4.0.4/lib/jets/cfn/builders/api_gateway_builder.rb:63:in `create_domain_name'
from /Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/jets-4.0.4/lib/jets/cfn/builders/api_gateway_builder.rb:47:in `add_domain_name'
from /Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/jets-4.0.4/lib/jets/cfn/builders/api_gateway_builder.rb:42:in `add_custom_domain'
from /Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/jets-4.0.4/lib/jets/cfn/builders/api_gateway_builder.rb:16:in `compose'
from /Users/nate/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/jets-4.0.4/lib/jets/cfn/builders/interface.rb:14:in `build'
How should blue-green deploys work with custom domain names? Why would stack_name be missing here?
Looking at my cloudwatch logs I seem to be missing logs from my scheduled job functions, but not seeing errors anywhere. I suspect things are partially working but something seems off. Can you help me figure out how to get this app back on track?
Hi Nate,
Bummer to hear. Thanks for the stack trace.
It seems like it’s a Jets bug as a result of AWS changing the Exception from Aws::CloudFormation::Errors::ValidationError
to ArgumentError
?
Looking at the stack trace and see that it leads to this line
But it’s only leading to that line because the existing_domain_name_on_stack?
call should had rescued Aws::CloudFormation::Errors::ValidationError
but AWS SDK error seem to have changed to an ArgumentError
. Unsure. It’s a guess.
Can you try hacking the gem source directly and adding this:
def existing_domain_name_on_stack?
return false if api_gateway_physical_resource_id.nil? # ADD THIS LINE
cfn.describe_stack_resource(
stack_name: api_gateway_physical_resource_id,
logical_resource_id: "DomainName"
)
true
# IE: Aws::CloudFormation::Errors::ValidationError (Resource DomainName does not exist for stack demo-dev)
rescue Aws::CloudFormation::Errors::ValidationError
false
end
At least it’s a first step.
Also, looked at the Jets.config.extra
setting.
Looks like that is a bug and the docs need to be updated and or it’ll be fixed in Jets v5.
Ok some sorta success here. Adding the early return line:
return false if api_gateway_physical_resource_id.nil?
In both existing_domain_name_on_stack?
and existing_dns_record_on_stack?
methods allowed me to deploy a new env with JETS_EXTRA=1 bundle exec jets deploy
. But, the new env doesn’t seem to be working any differently.
I immediately got the access denied error in the new env PreheatJob Aws::Lambda::Errors::AccessDeniedException User: arn:aws:sts::652695076726:assumed-role/bot-dev-1-JetsPreheatJob-2PE-JetsPreheatJobIamRole-1DZCAB8MAXB6R/bot-dev-1-jets-preheat_job-warm is not authorized to perform: lambda:InvokeFunction on resource: arn:aws:lambda:us-east-1:652695076726:function:bot-dev-1-admin_controller-db because no identity-based policy allows the lambda:InvokeFunction action
Also not seeing any cloudwatch logs generated for the new stack functions, however they are running (because I see external evidence of them doing stuff).
So I think where we are is:
- seems like the blue-green deploy wasn’t really needed, or the benefit is not obvious to me, because both new and updated-in-place stacks are behaving the same. Jets had no problem updating the existing stack in-place to runtime 3.2.
- Neither stack on runtime 3.2 is writing to CloudWatch logs. I’m not sure yet how to debug this.
- Functions do seem to be running (but maybe in duplicate bc I have two stacks now)
- Odd AWS permissions errors on the Preheat job, but that’s not really critical to me at this point
What’s the way to delete the bot-dev-1 duplicate stack?
Any thoughts on how to debug the lack of cloudwatch logs on the new runtime?
Thanks!
Nate
RE: 1. seems like the blue-green deploy wasn’t really needed, or the benefit is not obvious to me, because both new and updated-in-place stacks are behaving the same. Jets had no problem updating the existing stack in-place to runtime 3.2.
It may not be needed for this app. Refreshing my memory on what was changed from the Upgrading Docs. If you’re not using iam_policies, believe an in-place deploy is all that is needed and there will not be a rollback.
RE: 2. Neither stack on runtime 3.2 is writing to CloudWatch logs. I’m not sure yet how to debug this.
This is weird. Maybe check on the IAM permissions.
RE: 3. Functions do seem to be running (but maybe in duplicate bc I have two stacks now)
Might be because if it’s an event based system, the triggers may be firing to both stacks functions.
RE: 4. Odd AWS permissions errors on the Preheat job, but that’s not really critical to me at this point
There was a preheat IAM permission issue. Details here: https://github.com/boltops-tools/jets/pull/660 It was an edge case that should had been fixed. It sounds like it might be another edge case.
Can you check the preheat IAM permissions with the new stack when you get a chance.
RE: What’s the way to delete the bot-dev-1 duplicate stack?
JETS_EXTRA=1 jets delete
RE: Any thoughts on how to debug the lack of cloudwatch logs on the new runtime?
Thinking maybe try trigging the Lambda function directly in the console and see if that shows any more info and maybe some info about the CloudWatch log permissions.
Update!
Ok so first of all I was wrong about the Cloudwatch logs not being written. I was looking for logs from some of my periodic (scheduled) Jets jobs that I knew should be executing, but apparently when I deployed the new environment with JETS_EXTRA=1, the existing environment stopped running those jobs! They were in fact running in the new environment, but it took a few minutes for the new cloudwatch log groups to show up, and I assumed that they weren’t there. So … all good on the logging side and my functions are executing normally. I was surprised that the scheduled jobs only ran in the latest deployed environment.
I re-deployed the original environment, then did JETS_EXTRA=1 jets delete
to nuke the new duplicate environment, and all jobs are executing and logging normally again now on runtime 3.2.
As for the permissions errors on the JetsPreheatJob, I just worked around this by disabling prewarming with config.prewarm.enable = false
. I suspected for a while that prewarming wasn’t actually helping at all (still got timeout errors on cold start). I will look into enabling provisioned concurrency for select functions that need to be always hot.
As it turns out, I don’t think I really needed the blue-green deployment from the start anyway. The upgrading guide mentions that it’s needed if you use iam_policies, which I am using in a couple places: iam_policy
and class_iam_policy
, but it deployed in-place just fine regardless. For my other apps I’ll try an in-place deployment first, and only go with the blue-green approach if it fails.
So, all is well for now. Thanks again for jumping in quickly @tung. If I run into snags updating my other Jets apps I will let you know.
Just a quick chime in (for search reasons) since i had the same error i fixed it by adding the lambda permission app wide
Jets.application.configure do |config|
config.iam_policy = ["lambda"]
end
https://rubyonjets.com/docs/iam-policies/
Im sure you could scope this down to the prewarm lambda and give it lambda:InvokeFunction instead of lambda:* if you are worried about it being overly permissive.
What I am doing since I do not want a global prewarm is switching
config.prewarm.enable = false
Then adding a PreWarm job that has all the functions i want to be warm
class PrewarmJob < ApplicationJob
class_timeout 30
class_memory 512
rate '30 minutes'
def hot_page
functions = [
"guitar-dev-revise_auth-registrations_controller-new",
"guitar-dev-revise_auth-registrations_controller-create",
"guitar-dev-revise_auth-sessions_controller-new",
"guitar-dev-revise_auth-sessions_controller-create",
"guitar-dev-revise_auth-sessions_controller-destroy",
"guitar-dev-api-v1-...",
"guitar-dev-api-v1-...",
"...",
"guitar-dev-api-v1-..."
]
functions.each do |function_name|
threads = []
10.times do
threads << Thread.new do
Jets::Preheat.warm(function_name)
end
end
threads.each { |t| t.join }
"Finished prewarming #{function_name}."
end
end
end
Awesome. I was inspired today to take another look at this and I fixed the underlying IAM permissions issue here: https://github.com/boltops-tools/jets/pull/670
@tung can you take a look when you get a sec?
Awesome! Was going to do this after my app was released on android, thank you!