2 webservers active upon upgrade using helm chart and startupprobe keeps failing. #37831
Replies: 5 comments
-
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
Beta Was this translation helpful? Give feedback.
-
You must have done some typo while installing airlfow - for example they are likely running with a different namespace. Without having logs and more details, it's difficult to figure out what kind of mistakes were done. I believe the scenario you described is actually tested i our CI, so it should work, but if you change crucial configuration (for example postgres configuration) - the recommended way (similarly as for other charts) is to delete and redeploy airflow - it could be that in some scenarios installation and upgrade when crucial parameters are changed, things will not work - this is because Helm does not have a full "remove" capabilities - when you upgrade chart and change significantly what is deployed, it's not fully guaranteed to work. However, this is also not a problem and that's why all the "state" data shoudl be kept in the database outside of the chart / Kubernetes - so you should be able to nuke and redeploy Airflow (or any other stateless application) from the scratch when things like that happens. That would be my recommendation for your case. Unless there are some logs and details - we can't help with this too much. Also remember that the Chart and airlfow is provided "as is" and this forum is mostly to help people who have problems - but only in the free time, so there are no "experts" here that could be called for - however if you provide more details, it's possible that someone will help you to investigate further. Converted it into a discusion, as it's not really something that indicates of a specific issue. |
Beta Was this translation helpful? Give feedback.
-
Makes sense, I will delete and recreate instead of upgrade for db instance. However it goes for timeout of 5m. I am just upgrading to a standalone db. Is there a way we can increase timeout ? When running db migration |
Beta Was this translation helpful? Give feedback.
-
No idea what timeout you are talking about. But maybe you have resource problems? Have you looked at the docs? https://airflow.apache.org/docs/helm-chart/stable/index.html ? |
Beta Was this translation helpful? Give feedback.
-
Hello there facing the same issue while deploying with fluxcd (I have disabled hooks and custom envs) when upgrade is performed the 2nd webserver will fail forever while the old pod remains. I manage to complete the release by deleting pods and restarting deployment but not in consistent way. |
Beta Was this translation helpful? Give feedback.
-
Apache Airflow version
2.8.2
If "Other Airflow 2 version" selected, which one?
No response
What happened?
I am using latest helm chart. i am running into several problems with helm upgrade.
started with inbuilt postgresql and redis. -- looks good here but startup probe fails continously in webserver.
after i upgraded with standalone postgresql -- here i see 2 webservers, old and new together also doesnt allow login. additionally times out during helm upgrade with post upgrade hook.
I am unable to make this production ready. Need expertise to sort these issues.
What you think should happen instead?
No response
How to reproduce
step 1 : - helm install airflow apache-airflow/airflow --namespace airflow --create-namespace --debug
Step 2: - helm upgrade --install airflow apache-airflow/airflow --namespace airflow --create-namespace --debug -f values.yaml
Only change in values yaml are below
metadataConnection:
user: xxxxx
pass: xxxxx
protocol: postgresql
host: xxxxxx
port: 5432
db: airflow
sslmode: disable
enabled: true
enabled: false
Operating System
azure kubernetes
Versions of Apache Airflow Providers
None
Deployment
Official Apache Airflow Helm Chart
Deployment details
helm chart running in azure kubernetes
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions