Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hubot doesn't reconnect to the stream on API restart #77

Open
emedvedev opened this issue Dec 9, 2015 · 14 comments
Open

Hubot doesn't reconnect to the stream on API restart #77

emedvedev opened this issue Dec 9, 2015 · 14 comments
Labels

Comments

@emedvedev
Copy link
Contributor

hm, so after I restart with st2ctl restart, my bot stops receiving events from st2 :simple_smile:
until I restart the bot too

A community report. I've run into that issue a couple times, too. Not sure if we should solve in st2client or here.

@emptywee
Copy link

emptywee commented Dec 9, 2015

Yes, I reported that the other day. Let me know if you need any info from my end if you can't repro it.

@emedvedev
Copy link
Contributor Author

@emptywee thanks! I can, although not always. This is probably an eventstream module issue: by default a client should attempt reconnects indefinitely, and it looks like the module we use doesn't always do that. Even if it's the case, I think we'll be able to fix it or implement an additional layer of checks on our level.

@ticean
Copy link

ticean commented Mar 2, 2016

Hi there. I'm running a new installation to trial StackStorm and I think this issue biting us fairly hard. Anytime the our Hubot loses connection to the StackStorm API it doesn't attempt reconnect and is then running lame. No more StackStorm goodness. I have to manually restart Hubot :(

Is this on the priority to fix? The functionality I've been able to implement quickly is great! But the reliability here is a big deal.

We're running Hubot independently. I installed this script into our previously existing bot.

Thanks for the help!

@dzimine
Copy link

dzimine commented Mar 2, 2016

Thanks for the report! we will be looking into it, your +1 increments the priority but no committed fix yet.

@emedvedev
Copy link
Contributor Author

@enykeev was looking into it some time ago, but it would require a fairly difficult module rewrite. @enykeev: sup?

@emedvedev
Copy link
Contributor Author

@ticean: did you install with packages or AIO installer? We still have this issue on packages, but AIO should be good.

In short, this error is caused by the stream consumer module not recognizing error 5xx as a reason for reconnect. In AIO we apply a special fix to give stream errors special treatment: https://github.com/StackStorm/st2workroom/pull/303/files

If you chose packages as your install method, right now you can apply it manually, and in the future we'll hopefully have a better fix.

@ticean
Copy link

ticean commented Mar 3, 2016

I installed with AIO installer, but I configured this hubot-stackstorm plugin into a pre-existing, non-AIO Hubot installation.

As an underlying problem, I'm finding that the stackstorm nginx instance is stopping (or going zombie) every night. I haven't been able to find out what's scheduled to cause that, but it definitely seems like periodic task. The host is dedicated to stackstorm with AIO. If I could find and fix this, then it would definitely lower the urgency of the issue.

For now, though, I find our bot disconnected each morning and have to restart nginx and then the bot.

@ticean
Copy link

ticean commented Mar 3, 2016

I should also note that I've customized the HTTPS certs using letsencrypt. I modified the paths to the certs in /etc/st2/st2.conf. At first, I installed a cron to renew the letsencrypt cert. I though this might have caused the issue of nginx stopping. But nginx zombies - even after the letsencrypt task is removed.

I mention this because I haven't used puppet. Maybe there's a convergence scheduled nightly? Any recommendations?

@jjm
Copy link
Member

jjm commented Mar 23, 2016

I ran into this failure to reconnect too just changed over to the new packages and running st2chatops on the same server as the rest of stackstorm. In the short term could be may be make st2ctl also restart st2chatops too?

@jjm
Copy link
Member

jjm commented May 31, 2016

As discussed on slack yesterday. I ran into this (again), but this time seems to have been caused by the st2stream process having a traceback during log rotation.

👍

@blag
Copy link
Contributor

blag commented Jun 6, 2019

@armab I just stumbled across reconnecting-eventsource. It looks like it has most of the logic we would need to implement to have hubot consistently reconnect to st2stream. What do you think of using that instead of the built-in EventSource in stackstorm.js?

@arm4b
Copy link
Member

arm4b commented Jun 6, 2019

@blag Seems like eventsource is used on st2client side.
I would try to reproduce and debug the issue itself first (restart st2 services + debug st2chatops) , trying to understand what's going on behind the hood.

For example, https://github.com/fanout/reconnecting-eventsource#when-does-the-normal-eventsource-not-reconnect advertises to reconnect on 5XX errors, while in our nginx.conf for st2stream we exclusively added a hack to not return such errors, workarounding described eventsource limitation.

But if you'll catch the root cause, understand what happens at a deeper level (is it missing closed connections in original eventsource or was it specific HTTP code or anything else), - that would be great. I think it's all doable, just a matter of dedication and time spent on troubleshooting. If finding that reconnecting-eventsource or any other fix would solve the root cause, - that's 💯

@jinpingh
Copy link
Contributor

jinpingh commented Jun 10, 2019

@armab @blag Installed system st2 3.0.0, on Python 2.7.12.
With commands

root@ewc:/opt/stackstorm/chatops# service st2api stop
root@ewc:/opt/stackstorm/chatops# service st2api start

root@ewc:/opt/stackstorm/chatops# service st2stream stop
root@ewc:/opt/stackstorm/chatops# service st2stream start

root@ewc:/opt/stackstorm/chatops# service nginx stop
root@ewc:/opt/stackstorm/chatops# service nginx start

Or stop all above services at once then start, st2chatops is reconnected without issue.
Will do more investigation on this issue.

@arm4b
Copy link
Member

arm4b commented Jun 25, 2019

@jinpingh Take a look at one edge case example of this: #157 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants