Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTBR docker image using simulation RCP crashed due to socat #2719

Open
Irving-cl opened this issue Feb 17, 2025 · 0 comments
Open

OTBR docker image using simulation RCP crashed due to socat #2719

Irving-cl opened this issue Feb 17, 2025 · 0 comments

Comments

@Irving-cl
Copy link
Contributor

Describe the bug A clear and concise description of what the bug is.
When starting OTBR docker with simulation RCP using socat 1.8.0.2, after some time, the host program will crash.

To Reproduce Information to reproduce the behavior, including:

  • Have socat 1.8.0.2 installed. (On our PCs, apt-get install socat will install this version)
  • Modify the ncp_mode script to run otbr docker easier. (attached below)
  • Build the simulation ot-cli-ftd: mkdir -p build/temp && sudo top_builddir=./build/temp bash ./tests/scripts/ncp_mode build_ot_sim
  • Build otbr docker: sudo top_builddir=./build/temp bash ./tests/scripts/ncp_mode build_otbr_docker
  • Use an expect script to start the otbr docker image: sudo top_builddir=./build/temp bash ./tests/scripts/ncp_mode expect docker_debug.exp (the script attached below)
    • Join the node into network using dataset: 0e080000000000010000000300001435060004001fffe002087d61eb42cdc48d6a0708fd0d07fca1b9f0500510ba088fc2bd6c3b3897f7a10f58263ff3030f4f70656e5468726561642d353234660102524f04109dc023ccd447b12b50997ef68020f19e0c0402a0f7f8
  • Start a simulation cli ftd node and attach the network using the same dataset.
  • Use the cli ftd node to ping the br node. After about 20 pings, the otbr in docker will crash.
  1. Git commit id: e2f3c1f
  2. IEEE 802.15.4 hardware platform: simulation
  3. Build steps: described above
  4. Network topology: Leader (docker otbr) -- router (simulation cli)

Expected behavior A clear and concise description of what you expected to happen.
The ping can work as expected

Console/log output If applicable, add console/log output to help explain your problem.
OTBR log before crash.

Feb 17 02:33:09 11d0d0ac6a01 otbr-agent[109]: 00:03:43.525 [D] RadioSelector-: RadioSelector: UpdateOnTxSucc 15.4 - neighbor:[6e43b8173a1e8adc rloc16:0xe400 radio-pref:{15.4:255} state:Valid]
Feb 17 02:33:09 11d0d0ac6a01 otbr-agent[109]: 00:03:43.525 [I] MeshForwarder-: Sent IPv6 ICMP6 msg, len:196, chksum:8f65, ecn:no, to:0xe400, sec:yes, prio:low, radio:15.4
Feb 17 02:33:09 11d0d0ac6a01 otbr-agent[109]: 00:03:43.525 [I] MeshForwarder-:     src:[fd7d:61eb:42cd:8d6a:42:58ff:fe0c:64c4]
Feb 17 02:33:09 11d0d0ac6a01 otbr-agent[109]: 00:03:43.525 [I] MeshForwarder-:     dst:[fd1b:8173:6f2a:1:2c50:7562:b3a6:a5f6]
Feb 17 02:33:10 11d0d0ac6a01 otbr-agent[109]: 00:03:44.520 [D] P-SpinelDrive-: Received spinel frame, flg:0x2, iid:0, tid:0, cmd:PROP_VALUE_IS, key:STREAM_RAW, len:125, rssi:-20 ...
Feb 17 02:33:10 11d0d0ac6a01 otbr-agent[109]: 00:03:44.520 [D] P-SpinelDrive-: ... noise:-128, flags:0x0010, channel:20, lqi:0, timestamp:875784317018, rxerr:0
Feb 17 02:33:10 11d0d0ac6a01 otbr-agent[109]: 00:03:44.520 [D] RadioSelector-: RadioSelector: UpdateOnRx 15.4 - neighbor:[6e43b8173a1e8adc rloc16:0xe400 radio-pref:{15.4:255} state:Valid]
Feb 17 02:33:13 11d0d0ac6a01 otbr-agent[109]: 00:03:47.503 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 ICMP6 msg, len:148, chksum:9eb8, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-20.0, radio:15.4
Feb 17 02:33:13 11d0d0ac6a01 otbr-agent[109]: 00:03:47.503 [N] MeshForwarder-:     src:[fd1b:8173:6f2a:1:2c50:7562:b3a6:a5f6]
Feb 17 02:33:13 11d0d0ac6a01 otbr-agent[109]: 00:03:47.503 [N] MeshForwarder-:     dst:[fdde:ad00:beef:0:c9cf:f625:504c:e0c6]
Feb 17 02:33:19 11d0d0ac6a01 otbr-agent[109]: 00:03:53.371 [I] Mle-----------: Send Advertisement (ff02:0:0:0:0:0:0:1)
Feb 17 02:33:19 11d0d0ac6a01 otbr-agent[109]: 00:03:53.371 [D] P-SpinelDrive-: Sent spinel frame, flg:0x2, iid:0, tid:13, cmd:PROP_VALUE_SET, key:STREAM_RAW, len:70, channel:20, maxbackoffs:4, maxretries:15 ...
Feb 17 02:33:19 11d0d0ac6a01 otbr-agent[109]: 00:03:53.371 [D] P-SpinelDrive-: ... csmaCaEnabled:1, isHeaderUpdated:0, isARetx:0, skipAes:0, txDelay:0, txDelayBase:0
Feb 17 02:33:24 11d0d0ac6a01 otbr-agent[109]: 00:03:58.371 [W] P-RadioSpinel-: radio tx timeout
Feb 17 02:33:24 11d0d0ac6a01 otbr-agent[109]: 00:03:58.375 [C] P-RadioSpinel-: Failed to communicate with RCP - no response from RCP during initialization
Feb 17 02:33:24 11d0d0ac6a01 otbr-agent[109]: 00:03:58.375 [C] P-RadioSpinel-: This is not a bug and typically due a config error (wrong URL parameters) or bad RCP image:
Feb 17 02:33:24 11d0d0ac6a01 otbr-agent[109]: 00:03:58.375 [C] P-RadioSpinel-: - Make sure RCP is running the correct firmware
Feb 17 02:33:24 11d0d0ac6a01 otbr-agent[109]: 00:03:58.375 [C] P-RadioSpinel-: - Double check the config parameters passed as `RadioURL` input
Feb 17 02:33:24 11d0d0ac6a01 otbr-agent[109]: 00:03:58.375 [C] Platform------: HandleRcpTimeout() at radio_spinel.cpp:2035: RadioSpinelNoResponse

Additional context Add any other context about the problem here.
Updated ncp_mode script:

#!/bin/bash
#
#  Copyright (c) 2024, The OpenThread Authors.
#  All rights reserved.
#
#  Redistribution and use in source and binary forms, with or without
#  modification, are permitted provided that the following conditions are met:
#  1. Redistributions of source code must retain the above copyright
#     notice, this list of conditions and the following disclaimer.
#  2. Redistributions in binary form must reproduce the above copyright
#     notice, this list of conditions and the following disclaimer in the
#     documentation and/or other materials provided with the distribution.
#  3. Neither the name of the copyright holder nor the
#     names of its contributors may be used to endorse or promote products
#     derived from this software without specific prior written permission.
#
#  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
#  AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
#  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
#  ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
#  LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
#  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
#  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
#  INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
#  CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
#  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
#  POSSIBILITY OF SUCH DAMAGE.
#
# Test basic functionality of otbr-agent under NCP mode.
#
# Usage:
#   ./ncp_mode
set -euxo pipefail

SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
readonly SCRIPT_DIR
EXPECT_SCRIPT_DIR="${SCRIPT_DIR}/expect"
readonly EXPECT_SCRIPT_DIR

#---------------------------------------
# Configurations
#---------------------------------------
OT_CLI="${OT_CLI:-ot-cli-ftd}"
readonly OT_CLI

OT_NCP="${OT_NCP:-ot-ncp-ftd}"
readonly OT_NCP

OT_RCP="${OT_RCP:-ot-rcp}"
readonly OT_RCP

OTBR_DOCKER_IMAGE="${OTBR_DOCKER_IMAGE:-otbr-ncp}"
readonly OTBR_DOCKER_IMAGE

ABS_TOP_BUILDDIR="$(cd "${top_builddir:-"${SCRIPT_DIR}"/../../}" && pwd)"
readonly ABS_TOP_BUILDDIR

ABS_TOP_SRCDIR="$(cd "${top_srcdir:-"${SCRIPT_DIR}"/../../}" && pwd)"
readonly ABS_TOP_SRCDIR

ABS_TOP_OT_SRCDIR="${ABS_TOP_SRCDIR}/third_party/openthread/repo"
readonly ABS_TOP_OT_SRCDIR

ABS_TOP_OT_BUILDDIR="${ABS_TOP_BUILDDIR}/../simulation"
readonly ABS_TOP_BUILDDIR

OTBR_COLOR_PASS='\033[0;32m'
readonly OTBR_COLOR_PASS

OTBR_COLOR_FAIL='\033[0;31m'
readonly OTBR_COLOR_FAIL

OTBR_COLOR_NONE='\033[0m'
readonly OTBR_COLOR_NONE

readonly OTBR_VERBOSE="${OTBR_VERBOSE:-0}"

#----------------------------------------
# Helper functions
#----------------------------------------
die()
{
    exit_message="$*"
    echo " *** ERROR: $*"
    exit 1
}

exists_or_die()
{
    [[ -f $1 ]] || die "Missing file: $1"
}

executable_or_die()
{
    [[ -x $1 ]] || die "Missing executable: $1"
}

write_syslog()
{
    logger -s -p syslog.alert "OTBR_TEST: $*"
}

#----------------------------------------
# Test constants
#----------------------------------------
TEST_BASE=/tmp/test-otbr
readonly TEST_BASE

OTBR_AGENT=otbr-agent
readonly OTBR_AGENT

STAGE_DIR="${TEST_BASE}/stage"
readonly STAGE_DIR

BUILD_DIR="${TEST_BASE}/build"
readonly BUILD_DIR

OTBR_DBUS_CONF="${ABS_TOP_BUILDDIR}/src/agent/otbr-agent.conf"
readonly OTBR_DBUS_CONF

OTBR_AGENT_PATH="${ABS_TOP_BUILDDIR}/src/agent/${OTBR_AGENT}"
readonly OTBR_AGENT_PATH

# The node ids
LEADER_NODE_ID=1
readonly LEADER_NODE_ID

# The TUN device for OpenThread border router.
TUN_NAME=wpan0
readonly TUN_NAME

#----------------------------------------
# Test steps
#----------------------------------------
do_build_ot_simulation()
{
    sudo rm -rf "${ABS_TOP_OT_BUILDDIR}/ncp"
    sudo rm -rf "${ABS_TOP_OT_BUILDDIR}/cli"
    sudo rm -rf "${ABS_TOP_OT_BUILDDIR}/rcp"
    OT_CMAKE_BUILD_DIR=${ABS_TOP_OT_BUILDDIR}/ncp "${ABS_TOP_OT_SRCDIR}"/script/cmake-build simulation \
        -DOT_MTD=OFF -DOT_RCP=OFF -DOT_APP_CLI=OFF -DOT_APP_RCP=OFF \
        -DOT_BORDER_ROUTING=ON -DOT_NCP_INFRA_IF=ON -DOT_SIMULATION_INFRA_IF=OFF \
        -DOT_SRP_SERVER=ON -DOT_SRP_ADV_PROXY=ON -DOT_PLATFORM_DNSSD=ON -DOT_SIMULATION_DNSSD=OFF -DOT_NCP_DNSSD=ON \
        -DBUILD_TESTING=OFF
    OT_CMAKE_BUILD_DIR=${ABS_TOP_OT_BUILDDIR}/cli "${ABS_TOP_OT_SRCDIR}"/script/cmake-build simulation \
        -DOT_MTD=OFF -DOT_RCP=OFF -DOT_APP_NCP=OFF -DOT_APP_RCP=OFF \
        -DOT_BORDER_ROUTING=OFF \
        -DBUILD_TESTING=OFF
    OT_CMAKE_BUILD_DIR=${ABS_TOP_OT_BUILDDIR}/rcp "${ABS_TOP_OT_SRCDIR}"/script/cmake-build simulation \
        -DOT_FTD=OFF -DOT_MTD=OFF -DOT_APP_CLI=OFF -DOT_APP_NCP=OFF \
        -DBUILD_TESTING=OFF
}

do_build_otbr_docker()
{
    otbr_docker_options=(
        "-DOT_THREAD_VERSION=1.4"
        "-DOTBR_DBUS=ON"
        "-DOTBR_FEATURE_FLAGS=ON"
        "-DOTBR_TELEMETRY_DATA_API=ON"
        "-DOTBR_TREL=ON"
        "-DOTBR_LINK_METRICS_TELEMETRY=ON"
        "-DOTBR_SRP_ADVERTISING_PROXY=ON"
        "-DOTBR_BACKBONE_ROUTER=ON"
    )
    sudo docker build -t "${OTBR_DOCKER_IMAGE}" \
        -f ./etc/docker/Dockerfile . \
        --build-arg NAT64=0 \
        --build-arg NAT64_SERVICE=0 \
        --build-arg DNS64=0 \
        --build-arg WEB_GUI=0 \
        --build-arg REST_API=0 \
        --build-arg FIREWALL=0 \
        --build-arg OTBR_OPTIONS="${otbr_docker_options[*]}"
}

setup_infraif()
{
    if ! ip link show backbone1 >/dev/null 2>&1; then
        echo "Creating backbone1 with Docker..."
        docker network create --driver bridge --ipv6 --subnet 9101::/64 -o "com.docker.network.bridge.name"="backbone1" backbone1
    else
        echo "backbone1 already exists."
    fi
    sudo sysctl -w net.ipv6.conf.backbone1.accept_ra=2
    sudo sysctl -w net.ipv6.conf.backbone1.accept_ra_rt_info_max_plen=64
}

test_setup()
{
    executable_or_die "${OTBR_AGENT_PATH}"

    # Remove flashes
    sudo rm -vrf "${TEST_BASE}/tmp"
    # OPENTHREAD_POSIX_DAEMON_SOCKET_LOCK
    sudo rm -vf "/tmp/openthread.lock"

    ot_cli=$(find "${ABS_TOP_OT_BUILDDIR}" -name "${OT_CLI}")
    ot_ncp=$(find "${ABS_TOP_OT_BUILDDIR}" -name "${OT_NCP}")
    ot_rcp=$(find "${ABS_TOP_OT_BUILDDIR}" -name "${OT_RCP}")
    executable_or_die "${ot_cli}"
    executable_or_die "${ot_ncp}"
    executable_or_die "${ot_rcp}"

    export EXP_OTBR_AGENT_PATH="${OTBR_AGENT_PATH}"
    export EXP_OT_CLI_PATH="${ot_cli}"
    export EXP_OT_NCP_PATH="${ot_ncp}"
    export EXP_OT_RCP_PATH="${ot_rcp}"

    # We will be creating a lot of log information
    # Rotate logs so we have a clean and empty set of logs uncluttered with other stuff
    if [[ -f /etc/logrotate.conf ]]; then
        sudo logrotate -f /etc/logrotate.conf || true
    fi

    # Preparation for otbr-agent
    exists_or_die "${OTBR_DBUS_CONF}"
    sudo cp "${OTBR_DBUS_CONF}" /etc/dbus-1/system.d

    write_syslog "AGENT: kill old"
    sudo killall "${OTBR_AGENT}" || true

    setup_infraif

    # From now on - all exits are TRAPPED
    # When they occur, we call the function: output_logs'.
    trap test_teardown EXIT
}

test_teardown()
{
    # Capture the exit code so we can return it below
    EXIT_CODE=$?
    readonly EXIT_CODE
    write_syslog "EXIT ${EXIT_CODE} - output logs"

    sudo pkill -f "${OTBR_AGENT}" || true
    sudo pkill -f "${OT_CLI}" || true
    sudo pkill -f "${OT_NCP}" || true
    wait

    echo 'clearing all'
    sudo rm /etc/dbus-1/system.d/otbr-agent.conf || true
    sudo rm -rf "${STAGE_DIR}" || true
    sudo rm -rf "${BUILD_DIR}" || true

    exit_message="Test teardown"
    echo "EXIT ${EXIT_CODE}: MESSAGE: ${exit_message}"
    exit ${EXIT_CODE}
}

otbr_exec_expect_script()
{
    local log_file="tmp/log_expect"

    for script in "$@"; do
        echo -e "\n${OTBR_COLOR_PASS}EXEC${OTBR_COLOR_NONE} ${script}"
        sudo killall ot-rcp || true
        sudo killall ot-cli || true
        sudo killall ot-cli-ftd || true
        sudo killall ot-cli-mtd || true
        sudo killall ot-ncp-ftd || true
        sudo killall ot-ncp-mtd || true
        sudo rm -rf tmp
        mkdir tmp
        {
            sudo -E expect -df "${script}" 2>"${log_file}"
        } || {
            local EXIT_CODE=$?

            echo -e "\n${OTBR_COLOR_FAIL}FAIL${OTBR_COLOR_NONE} ${script}"
            cat "${log_file}" >&2
            return "${EXIT_CODE}"
        }
        echo -e "\n${OTBR_COLOR_PASS}PASS${OTBR_COLOR_NONE} ${script}"
        if [[ ${OTBR_VERBOSE} == 1 ]]; then
            cat "${log_file}" >&2
        fi
    done
}

do_expect()
{
    if [[ $# != 0 ]]; then
        otbr_exec_expect_script "$@"
    else
        mapfile -t test_files < <(find "${EXPECT_SCRIPT_DIR}" -type f -name "ncp_*.exp")
        otbr_exec_expect_script "${test_files[@]}" || die "ncp expect script failed!"
    fi

    exit 0
}

print_usage()
{
    cat <<EOF
USAGE: $0 COMMAND

COMMAND:
    build_ot_sim         Build simulated ot-cli-ftd and ot-ncp-ftd for testing.
    build_otbr_docker    Build otbr docker image for testing.
    expect               Run expect tests for otbr NCP mode.
    help                 Print this help.

EXAMPLES:
    $0 build_ot_sim build_otbr_docker expect
EOF
    exit 0
}

main()
{
    if [[ $# == 0 ]]; then
        print_usage
    fi

    export EXP_TUN_NAME="${TUN_NAME}"
    export EXP_LEADER_NODE_ID="${LEADER_NODE_ID}"
    export EXP_OTBR_DOCKER_IMAGE="${OTBR_DOCKER_IMAGE}"

    while [[ $# != 0 ]]; do
        case "$1" in
            build_ot_sim)
                do_build_ot_simulation
                ;;
            build_otbr_docker)
                do_build_otbr_docker
                ;;
            expect)
                shift
                test_setup
                do_expect "$@"
                ;;
            help)
                print_usage
                ;;
            *)
                echo
                echo -e "${OTBR_COLOR_FAIL}Warning:${OTBR_COLOR_NONE} Ignoring: '$1'"
                ;;
        esac
        shift
    done
}

main "$@"

docker_debug.exp:

source "tests/scripts/expect/_common.exp"

set ptys [create_socat 1]
set pty1 [lindex $ptys 0]
set pty2 [lindex $ptys 1]

set container "otbr-ncp"

start_otbr_docker $container $::env(EXP_OT_RCP_PATH) 2 $pty1 $pty2
spawn_node 3 otbr-docker $container
interact

exec sudo docker stop $container
exec sudo docker rm $container
dispose_all

We have proved that this is related to socat version. If we use an older version of socat (1.7.4.4), this issue doesn't happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant