-
Notifications
You must be signed in to change notification settings - Fork 48
Fix failed during restart in Ambari 2.5 #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,14 +13,17 @@ | |
# limitations under the License. | ||
|
||
import uuid | ||
import time | ||
import os.path as path | ||
|
||
from resource_management.libraries.script.script import Script | ||
from resource_management.core.resources.system import Execute | ||
from common import PRESTO_RPM_URL, PRESTO_RPM_NAME, create_connectors,\ | ||
from resource_management.core.exceptions import ExecutionFailed, ComponentIsNotRunning | ||
from common import PRESTO_RPM_URL, PRESTO_RPM_NAME, create_connectors, \ | ||
delete_connectors | ||
from presto_client import smoketest_presto, PrestoClient | ||
|
||
|
||
class Coordinator(Script): | ||
def install(self, env): | ||
from params import java_home | ||
|
@@ -39,17 +42,22 @@ def start(self, env): | |
Execute('{0} start'.format(daemon_control_script)) | ||
if 'presto_worker_hosts' in host_info.keys(): | ||
all_hosts = host_info['presto_worker_hosts'] + \ | ||
host_info['presto_coordinator_hosts'] | ||
host_info['presto_coordinator_hosts'] | ||
else: | ||
all_hosts = host_info['presto_coordinator_hosts'] | ||
smoketest_presto( | ||
PrestoClient('localhost','root', | ||
config_properties['http-server.http.port']), | ||
all_hosts) | ||
# sleep some seconds for waiting coordinator start finished | ||
time.sleep(30) | ||
smoketest_presto(PrestoClient('localhost', 'root', config_properties['http-server.http.port']), all_hosts) | ||
|
||
def status(self, env): | ||
from params import daemon_control_script | ||
Execute('{0} status'.format(daemon_control_script)) | ||
try: | ||
Execute('{0} status'.format(daemon_control_script)) | ||
except ExecutionFailed as ef: | ||
if ef.code == 3: | ||
raise ComponentIsNotRunning("ComponentIsNotRunning") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the advantage of doing this? Will Ambari recognize the exception and do something special as a result? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. Ambari only accept zero value as normal return, any other return value will cause an ExecutionFailed exception thrown. As Also In Ambari 2.5, Ambari will handle ComponentIsNotRunning exception as a normal logic procession during restart like below: So. TO AVOID THE EXCEPTION
raise a ComponentIsNotRunning exception should be the right way. I only test it in Ambari 2.5. I'm not sure is any older version play the same behavior. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the explanation, that makes sense. |
||
else: | ||
raise ef | ||
|
||
def configure(self, env): | ||
from params import node_properties, jvm_config, config_properties, \ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,10 @@ | ||
class ClientComponentHasNoStatus(Exception): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for fixing the unit tests! The Ambari SDK doesn't make it easy to test your code so I had to resort to this hacky solution. |
||
pass | ||
pass | ||
|
||
|
||
class ExecutionFailed(Exception): | ||
pass | ||
|
||
|
||
class ComponentIsNotRunning(Exception): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the sleep and the comment as they're unnecessary. The
PrestoClient
code has a retry with timeout mechanism to ensure the server has started.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I have seen it. To add this code just because some errors will be raised (just like ExecutionFailed mentioned blew) . And the retries will be interrupted, Never retry again.
Add some seconds for sleeping will skip the ExecutionFailed exception in most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@petroav As I said above, I think I have fixed the root cause of the interruption of the error, so this code can now be removed. It may also be possible to keep this code, and the sleep time setting is shorter, just if there are other exceptions that are thrown and then interrupted? What is your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that you fixed the underlying problem yes, I think this sleep should be removed.