AWS RDS Connection limit

diegoizidoro · January 29, 2020, 2:13pm

Hi!

I know It’s not exactly a question about jets, but maybe you can help me…

What happens when a lambda run can’t provision a DB connection because the connection limit was reached?? Some of my ApplicationJobs take sometime because they have to contact an external service and there are peaks of demand…

Does it consider a function error??

I’m from Brazil and at our region sa-east-1 (São Paulo/Brazil) we don’t have RDS proxy or Aurora serverless available yet to help with that problem.

Thanks!!

panbanda · February 3, 2020, 9:34am

Could you setup read only instances or use the serverless RDS?

diegoizidoro · February 5, 2020, 11:14am

Serverless RDS is not available at the region i deployed. Read only instances doesn’t make much sense for my application.

diegoizidoro · February 5, 2020, 2:23pm

I “solved” it by creating an application job that retries processing every 10 minutes all my Models that are in a certain state. Not ideal, but it’s what i can do for now, i believe.

panbanda · February 8, 2020, 3:33am

I don’t know your particular use case but you could possibly offload it to kinesis, redshift or SNS depending on how quickly you need to show this data to the user? 10 minutes is a long time so I am guessing real time wouldn’t be a priority?

diegoizidoro · February 10, 2020, 1:33pm

To explain a little bit of the use case…

I’m using lambda to do post processing. Which means receiving the data from different servers/services, packaging it to send the data to a third party and dealing with possible errors. The result of that is asynchronously sent to the origin server as a callback. I don’t need to show the user anything. I just need this done until the a certain time of the day (“closing time”). This closing time varies. But, it is better if done quickly so my client’s back-office can deal with certain kind of errors manually before the closing time.

Until now i was only considering problems with sending the data to the third party. To deal with errors like third party timeouts or other temporary errors i do have lambda set to 2 retries and a 10 minute retry as a guarantee.

I wasn’t considering the possibility of reaching the RDS connection limit.

This processing is done in a multiple step process:

First of all, when the origin server sends the data, minor validation is done, the request is marked as accepted and sent to another lambda (application job). It’s not a problem If i have a connection limit error on receiving the origins server request. The origin server has the responsibility of retrying it.

On the application job that packages and sends the data, if any temporary error happens, i mark the request for retry. Here is the problem. If don’t have a database connection, i cannot even mark it for retry. Which means my request marked as accepted will stay on a limbo.

I know this retry is better to have anyway as a guarantee. But, i would like to solve the connection limit problem if possible.

Is it clear now?

panbanda · February 10, 2020, 1:48pm

Yeah totally. Thats annoying sorry to hear about the frustrations with RDS connections. Hopefully their recent announcement of millions being poured into your region will come quicky

Honestly, it kinda sounds like a classic queuing situation… SQS may do the trick because it will keep it in queue if it failed. You would also be able to do more batching which would reduce the number of connections. If your load is really high, I mean really high, you could use kinesis and write a consumer or poll it for changes and handle them in batches. But I think I would say SQS would fit your needs quite well. Have you already tried that?

diegoizidoro · February 10, 2020, 2:20pm

I think that is a next step. It certainly would help, but it doesn’t exactly solve my problem and would make my architecture a little bit more complex because of other business requirements that i won’t delve into. I do need to architecture it in the future for a reeeeaaaally high demand. That’s why we went with lambda in the first place.

What would really solve my problem is Aurora serverless. I guess it won’t take long for that. But not an option right now.

Thanks!