Skip to content

Update fgt-asg-lambda.py#16

Open
cathyrox wants to merge 5 commits into
fortinetdev:mainfrom
cathyrox:main
Open

Update fgt-asg-lambda.py#16
cathyrox wants to merge 5 commits into
fortinetdev:mainfrom
cathyrox:main

Conversation

@cathyrox
Copy link
Copy Markdown

@cathyrox cathyrox commented May 4, 2026

Related to issue #15 regarding the fgt-asg-lambda.py.  We have made following modifications  for the fgt-asg-lamba.py script to address several operational issues identified during deployment. These changes improve reliability, error handling, and lifecycle hook management for FortiGate instances within an Auto Scaling Group (ASG).

Proposed Changes

  1. Restrict Lifecycle Hook Handling
    The script was updated to handle only the fgt_asg_lambda lifecycle hook, ignoring all other lifecycle events triggered by the ASG. This prevents unintended processing of hooks that are outside the script’s scope.

  2. Add Success Validation for NetworkInterface and FGTConfig Operations
    Explicit success checks were introduced for both the NetworkInterface configuration and the FGTConfig operation to ensure each step completes successfully before proceeding.

  3. Conditional Lifecycle Hook Action Based on Operation Results
    The lifecycle hook action (CONTINUE or ABANDON) will be determined by the outcome of the NetworkInterface and FGTConfig success checks. If either operation fails, the lifecycle hook will be abandoned to prevent a misconfigured instance from entering service.

  4. Default Lifecycle Action Set to ABANDON
    The default behavior of the lifecycle function is updated to issue an ABANDON action. This acts as a safe fallback to ensure that no instance is placed InService unless all configuration steps have been explicitly verified as successful.

  5. Top-Level Exception Handling in Main Function
    A try/except block will be added to the main function to capture any unexpected runtime errors. Upon catching an exception, the script will immediately issue an ABANDON lifecycle action to prevent a potentially misconfigured FortiGate instance from becoming InService.

Issues Resolved:

  1. Unintended Lifecycle Hook Processing 
     The script was previously responding to “ALL”  lifecycle hooks within the ASG, unintentionally completing them on behalf of other hooks.  This fix ensures only the intended hook is handled.

  2. Premature Configuration Attempt 
    The script was previously waiting for the EC2 Instance Launch Successful event before initiating FortiGate configuration, causing timing conflicts with other lifecycle hooks. The updated flow now initiates configuration at the EC2 Instance-Launch Lifecycle Action stage.

  3. Instance Marked InService Before Configuration Completes 
    Previously, the instance would transition to InService before the FortiGate was fully configured, potentially resulting in a faulty instance being placed into service. The updated logic now executes FGTConfig function first and only sends a CONTINUE action upon confirmed success.

  4. Unhandled Runtime Errors 
     Any unexpected errors in the main function were not being caught, potentially leaving lifecycle hooks in an incomplete state. The added exception handling ensures that any error results in an ABANDON action, maintaining ASG integrity.

I would like to bring the above proposed changes to your attention for review. These modifications have been successfully validated in our environment, and initial testing results are promising. Please let us know if the changes are acceptable . Thank you!

Comment thread modules/fortigate/fgt_asg/fgt-asg-lambda.py Outdated
self.logger.info("Attach the interface to FortiGate VM instance")
if attach_id == None:
result = self.delete_interface(cur_intf_id) #Need to delete the interface if not able to attach
if not result: #if error occurs on the delete_interface function
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return False condition should not be the interface not been deleted. Should be something like this:

if attach_id == None:
    result = self.delete_interface(cur_intf_id)
    if not result:
        # some function to check whether the interface be deleted
    return False

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Lix the delete_interface function is from:

def delete_interface(self, cur_intf_id):
    self.logger.info(f"Delete interface: {cur_intf_id}.")
    response = ""
    try:
        response = self.ec2_client.delete_network_interface(NetworkInterfaceId=cur_intf_id)
        return True
    except ClientError as e:
        self.logger.error(f"Error deleting network interface {cur_intf_id}: {e.response['Error']['Code']}, response: {response}")
        return False

I just added return True/False
True - if the interface got deleted.
False - if the interface had an error when deleting the interface

as of now (original code), when there's an error it will do nothing. just continue. I am not sure what function we need for this one "# some function to check whether the interface be deleted" can you please recommend? thank you!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main point is that it should return False when attach_id is None, not when deleting the interface fails. The logic should be

if attach_id == None:
    result = self.delete_interface(cur_intf_id)
    return False

As for the scenario of deleting the interface failing, we can leave it right now. In the future, we can add some functionality to handle it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you Xing, I did create below since the variable result isnt used, I removed it..

            if attach_id == None: 
                self.delete_interface(cur_intf_id) #Need to delete the interface if not able to attach
                return False
            continue

self.associate_pub_ip(cur_intf_id, intf_conf)
associate_pub_ip_id = self.associate_pub_ip(cur_intf_id, intf_conf)
if associate_pub_ip_id == None: #add a check if we have successfully associate the public Interface
return False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can set to continue if associate public IP fails, since it is not required for the internal function.

Copy link
Copy Markdown
Author

@cathyrox cathyrox May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you corrected! I bring back to original code

        if "enable_public_ip" in intf_conf and intf_conf["enable_public_ip"] :
            self.associate_pub_ip(cur_intf_id, intf_conf)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @lix-fortinet
could you explain why we dont need to check if associate public IP fails. why it is not required for the internal function. :) thank you!

Comment thread modules/fortigate/fgt_asg/fgt-asg-lambda.py Outdated
}]
)
# Get private IP
b_succ = False # Initialize b_succ value
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial value should be True.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Lix, not want argue here much but, isnt it safer to initialize the value as False? and it will only be true if b_succ sends True value :) but you know this better for sure.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, in the current code, we do not need this initialization. Since the change password part will initialize the b_succ. But we modified the logic to change password to when self.fgt_lic_mgmt != "fmg", not released yet. So, when self.fgt_lic_mgmt == "fmg", then some of the following conditions will not be triggered, and b_succ will not be overwritten. Then everything passed but b_succ is false, which should return true.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hello @lix-fortinet,

ok we can remove the initialize b_succ value here. thanks!

# event["detail"]["LifecycleActionToken"] - Token for completing the hook (only in lifecycle events)
logger.info("=" * 60)
logger.info("fgt-asg-lambda invoked")
logger.info(f"Event detail-type: {event.get('detail-type', 'N/A')}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate log with the following logs.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Lix, the logs were added by my colleague to see what is happening.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the detail-type was logged again in the following log.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see what you mean :) I have remove the duplicate one thank you!

logger.info(f" detail_type: {detail_type}")
logger.info(f" EC2InstanceId: {fgt_vm_id}")
logger.info(f" AutoScalingGroupName: {event_detail.get('AutoScalingGroupName', 'N/A')}")
logger.info(f" LifecycleHookName: {event_detail.get('LifecycleHookName', 'N/A')}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LifecycleHookName used multiple times. Could create a variable for it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hello @lix-fortinet I have modified this part. changes:

move hook_name variable above managed_hooks variable

        # create variable for hook_name
        hook_name = event_detail.get('LifecycleHookName', '')
        # create variable for managed_hooks
        managed_hooks = ['fgt_asg_launch_hook', 'fgt_asg_terminate_hook']

remove the hook_name on the complete lifecycle action part as we already have defined it.

        if detail_type in ["EC2 Instance-launch Lifecycle Action", "EC2 Instance-terminate Lifecycle Action"]:
            logger.info(f"Completing lifecycle hook: {hook_name}")
            complete_lifecycle(logger, event_detail,result="CONTINUE" )

lifecycle_token = event_detail.get('LifecycleActionToken', '')
logger.info(f" LifecycleActionToken: {lifecycle_token[:20] + '...' if lifecycle_token else 'N/A'}")

# --- EARLY EXIT: Skip unmanaged lifecycle hooks ---
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this early exit? What other hooks does it have besides managed hooks? Also, the hook name may contain a prefix.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Lix, On our environment we added some hooks that will check threat feeds. And for some reason, this hook also receives the event from threat feeds hook. We added this so that it will not process Life cycle action for threat feeds hook.

Copy link
Copy Markdown
Author

@cathyrox cathyrox May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the suggestion I will check with my colleague as he is the one that modifies it :)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. You can check main.tf line 141.

@cathyrox
Copy link
Copy Markdown
Author

Hello @lix-fortinet

I have updated the fgt-asg-lambda.py based on our discussion . kindly please check if it looks good. thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants