Skip to content

Support bundles should include log files #7973

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

papertigers
Copy link
Contributor

@papertigers papertigers commented Apr 15, 2025

Take two of adding log files into support bundles.

This PR leverages oxlog to find all of the zones/services on a sled and then through a series of zfs snapshots will collect those logs and expose them as a zip file that nexus will slurp down and stick into the support bundle that is being collected.

Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
@papertigers papertigers marked this pull request as ready for review April 17, 2025 02:50
Created using spr 1.3.6-beta.1
@papertigers
Copy link
Contributor Author

papertigers commented Apr 17, 2025

Using a4x2 I was able to successfully capture a support bundle that includes logs. We may want to consider the use of zstd for the log zips as it appears the macOS and illumos unzip command don't support decompressing those zip files directly. The other option is providing a cli tool similar to an omdb that we can use on a support bundle zip file.

unzip -l bundle.zip
❯ unzip -l /var/tmp/bundle.zip
Archive:  /var/tmp/bundle.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
       36  01-01-1980 00:00   bundle_id.txt
        0  01-01-1980 00:00   rack/
        0  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/
        0  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/
        0  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/
    20643  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/dladm.json
     4871  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/ipadm.json
        0  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/logs/
   303006  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/logs/global.zip
   441411  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/logs/oxz_cockroachdb_7a93b3e3-f836-48b6-bd83-3c932a99f30a.zip
   475537  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/logs/oxz_cockroachdb_d7311c48-731b-4374-be81-7f4961e66997.zip
     2500  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/logs/oxz_crucible_03379444-72de-4be3-9650-dc0a3ff5cbcd.zip
     2493  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/logs/oxz_crucible_72225329-7194-4e55-8273-ff0a58e0f9a7.zip
     2492  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/logs/oxz_crucible_d4a857fc-67bc-48bb-a359-bbda5411d27c.zip
     2494  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/logs/oxz_crucible_e4c0cff6-47ca-4add-83ec-d9f757321523.zip
     2495  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/logs/oxz_crucible_f041e3b6-f9b5-43c9-a4fe-fd36e03f4334.zip
    53881  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/logs/oxz_crucible_pantry_3b4f637b-cbd6-448c-a27c-bd560f9a9295.zip
  1483832  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/logs/oxz_nexus_42ce28b6-7af2-45d6-8770-54deb3e3b0e7.zip
     7853  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/logs/oxz_ntp_68ea984d-a459-43cf-aa4d-9bad4ac99aaa.zip
  4637421  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/logs/oxz_switch.zip
      120  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/nvmeadm.json
    30149  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/pargs.json
   103248  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/pfiles.json
   285574  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/pstack.json
      676  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/sled.txt
    18124  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/zfs.json
     2460  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/zoneadm.json
     3126  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/9cabd278-f14f-4512-af17-e4ae84e42974/zpool.json
        0  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/
    20945  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/dladm.json
     4935  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/ipadm.json
        0  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/logs/
   334310  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/logs/global.zip
   318708  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/logs/oxz_cockroachdb_e0641e8a-1c3e-4607-ad16-ff8d8b735f18.zip
     2490  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/logs/oxz_crucible_1c9cfb8e-a1b2-4982-9307-113b641e719c.zip
     2496  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/logs/oxz_crucible_5ece1589-0f1e-4540-a97e-32e2be8c664c.zip
     2492  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/logs/oxz_crucible_69e65e0b-1440-4924-833c-d4c7a03fd924.zip
     2489  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/logs/oxz_crucible_76e2e803-8c72-4c21-9d0d-887e6e6e9424.zip
     2491  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/logs/oxz_crucible_ff1b8eb5-b577-4484-9576-1c1f0eeaddb5.zip
    53940  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/logs/oxz_crucible_pantry_1e558a00-7c62-4afd-85d1-3717df81386b.zip
    16311  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/logs/oxz_external_dns_2cbede84-a054-4d8d-b444-3477b4457b4f.zip
    59443  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/logs/oxz_internal_dns_feebeaab-450d-4b4a-a926-32d2d41dc36a.zip
  1688487  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/logs/oxz_nexus_4945b33b-1d80-499f-8edd-2533ab5b7f47.zip
     8934  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/logs/oxz_ntp_4c9635f6-83cb-43ed-978a-825ec869de71.zip
  4363136  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/logs/oxz_switch.zip
      120  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/nvmeadm.json
    33273  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/pargs.json
    92712  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/pfiles.json
   340933  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/pstack.json
      682  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/sled.txt
    18916  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/zfs.json
     2691  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/zoneadm.json
     3126  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/a57b75f4-4bfd-49a0-8dc3-704da6993510/zpool.json
        0  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/
    54697  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/dladm.json
     4935  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/ipadm.json
        0  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/logs/
   252406  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/logs/global.zip
   478755  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/logs/oxz_cockroachdb_12b3e3e5-4d76-491b-8d95-afdf54ee5417.zip
     2488  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/logs/oxz_crucible_1d24d5b1-ef45-4ea5-b08e-e2aa1c9371ee.zip
     2488  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/logs/oxz_crucible_3c27e3cf-f809-4849-b91e-b8ace5c613ce.zip
     2488  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/logs/oxz_crucible_525a7cfe-6947-4ab9-8f3d-de5d1c118501.zip
     2494  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/logs/oxz_crucible_902b8bad-71ad-4051-a81b-16899309ecc7.zip
     2493  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/logs/oxz_crucible_e15258e1-b788-46dc-99e0-8fdd1747f346.zip
    53568  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/logs/oxz_crucible_pantry_d821ad83-3645-4b8d-8cd7-6965bbbf052b.zip
    16259  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/logs/oxz_external_dns_f8926330-440a-44f6-9253-ac60dfd2566e.zip
    59949  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/logs/oxz_internal_dns_b0fb2b62-74a2-4815-81fc-3fdaa7aabcbe.zip
     8952  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/logs/oxz_ntp_a9e7d80b-50f5-4b48-ad66-83e7eda3698a.zip
  1096726  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/logs/oxz_oximeter_4910ec17-83ab-48de-bc48-5efa16dbaf96.zip
       22  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/logs/oxz_switch.zip
      120  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/nvmeadm.json
    22311  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/pargs.json
    78162  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/pfiles.json
   199984  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/pstack.json
      683  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/sled.txt
    18388  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/zfs.json
     2606  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/zoneadm.json
     3126  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/bc804c55-1f72-4eaf-80b2-84fdb9aff26b/zpool.json
        0  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/
    19961  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/dladm.json
     4935  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/ipadm.json
        0  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/logs/
   246200  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/logs/global.zip
  2807939  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/logs/oxz_clickhouse_c972d77c-7af1-48ed-a536-f7e763a5f8b0.zip
   307336  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/logs/oxz_cockroachdb_f356bc9a-74ff-4b61-8f71-281b1e8d0081.zip
     2493  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/logs/oxz_crucible_66ddbb13-3243-4434-adf4-8a2ba69cbc84.zip
     2490  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/logs/oxz_crucible_8373dc7c-1608-427e-97de-7212eea26a4a.zip
     2492  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/logs/oxz_crucible_b1ef8452-318b-49f7-ab19-ba70405467d6.zip
     2497  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/logs/oxz_crucible_dc6792b5-7119-49f6-b8a2-a3ded7f5ea99.zip
     2495  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/logs/oxz_crucible_e3f0db30-2e77-4577-86ad-63bf98945d10.zip
    59894  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/logs/oxz_internal_dns_ac03e4e1-06fa-4ca4-9e69-37d1d093b39a.zip
  1546627  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/logs/oxz_nexus_243b7b6d-b1e4-477e-a83c-e3d99123ad42.zip
     7764  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/logs/oxz_ntp_1c0590a8-ca6f-4e7d-afaa-c1850043aaa6.zip
       22  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/logs/oxz_switch.zip
      120  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/nvmeadm.json
    22022  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/pargs.json
   115075  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/pfiles.json
   171953  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/pstack.json
      683  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/sled.txt
    17716  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/zfs.json
     2361  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/zoneadm.json
     3126  01-01-1980 00:00   rack/fe7a6eee-8e71-47e2-aaef-1a0e7ef0f3ab/sled/c6d9aca2-5796-4d70-ba40-11780edc6ee3/zpool.json
---------                     -------
 22968803                     102 files


Created using spr 1.3.6-beta.1
Copy link
Collaborator

@bnaecker bnaecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @papertigers, this is great! I've got a few small comments and suggestions, though overall it looks good to me. Thanks for adding the tests, those are excellent.

@@ -1222,8 +1222,8 @@ impl Zfs {
snap_name: &'a str,
properties: &'a [(&'a str, &'a str)],
) -> Result<(), CreateSnapshotError> {
let mut command = std::process::Command::new(ZFS);
let mut cmd = command.arg("snapshot");
let mut command = std::process::Command::new(PFEXEC);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why did we not need pfexec before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because as far as I can tell zone bundles and the sled diagnostics creates are the only consumers. I ran into the issue because I was running cargo nextest run and I saw permission denied, where as sled-agent is running as root.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this may be a "usable in tests" things -- adding the pfexec is harmless, but without it, we cannot run integration tests

Comment on lines 1318 to 1319
/// Note that if this is called on the root dataset such as
/// `rpool/ROOT/<BE>` it will return "legacy/.zfs/snapshot/<SNAP_NAME>".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe provide a bit more context here, like why this is important or what one does with that information. Is the point that this isn't a full path anymore?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: What is a "root dataset"? This is on an impl Snapshot, so which snapshots is this true / not true for?

pub fn full_path(&self) -> Result<Utf8PathBuf, GetValueError> {
// TODO:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to track this in an issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed as #8023

method = GET,
path = "/support/logs/zones",
}]
async fn support_logs(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about what this does based on the name, given that it differs from the API path. Is this returning the names of the zones that have logs? Maybe add a docstring to help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 23430d7

method = GET,
path = "/support/logs/download/{zone}",
}]
async fn support_logs_download(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here, a docstring would be very helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 23430d7

Comment on lines 498 to 501
// 5 is an arbitrary amount of logs to grab, if we find that we are
// missing important information while debugging we should bump this
// value.
for file in archived.iter().rev().take(5) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we provide this as an argument to this function? It's probably worthwhile to plumb that all the way to the API as well. It could default to 5, but I'm sure we're going to find situations where we're missing data. It'd be unfortunate to require a redeployment of sled-agent to get at that data.

snapshot_logfile: &Utf8Path,
) -> Result<(), LogError> {
let Some(log_name) = snapshot_logfile.file_name() else {
debug!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a warning or error condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a warning. This is not a hard error as I want to collect as many logs as we can in a support bundle rather than bailing out early.

zip_path,
FullFileOptions::default()
.compression_method(zip::CompressionMethod::Zstd)
.compression_level(Some(3)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason not to maximize the compression level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some very minor comparisons locally that took into account speed vs size. So we are optimizing for speed here as the final compression doesn't matter as we are unpacking this zip on the nexus side before nexus assembles the final support bundle.

};

// We grab the first part of a log file which is prefixed with the log
// type so that we gague our interest.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling nit: "gauge"

if cockroach_log_prefix.contains(prefix) {
let entry = interested
.entry(prefix)
.or_insert(CockroachExtraLog::default());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: or_default() should work here.

Created using spr 1.3.6-beta.1
Copy link
Collaborator

@smklein smklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - I echo all of @bnaecker 's comments!

@@ -1222,8 +1222,8 @@ impl Zfs {
snap_name: &'a str,
properties: &'a [(&'a str, &'a str)],
) -> Result<(), CreateSnapshotError> {
let mut command = std::process::Command::new(ZFS);
let mut cmd = command.arg("snapshot");
let mut command = std::process::Command::new(PFEXEC);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this may be a "usable in tests" things -- adding the pfexec is harmless, but without it, we cannot run integration tests

Comment on lines 1318 to 1319
/// Note that if this is called on the root dataset such as
/// `rpool/ROOT/<BE>` it will return "legacy/.zfs/snapshot/<SNAP_NAME>".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: What is a "root dataset"? This is on an impl Snapshot, so which snapshots is this true / not true for?

})??;

// Cleanup the zip file since we no longer need it
if let Err(e) = tokio::fs::remove_file(&output_file).await {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm: We clean this up so it doesn't get included in the final zipfile, right?

(I'm just confirming the "cancel safety" of collect_bundle_as_file - even if we left this file around and cancelled the future, we'd still end up deleting the whole directory, so that would be fine?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that's the idea.

@@ -790,6 +816,80 @@ async fn sha2_hash(file: &mut tokio::fs::File) -> anyhow::Result<ArtifactHash> {
Ok(ArtifactHash(digest.as_slice().try_into()?))
}

/// For a given zone, save a zip of its log files into a support bundle path.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document the "path" argument here, as the path to a per-sled directory that will end up zipfile.

"{SLED_DIAGNOSTICS_SNAPSHOT_PREFIX}{}",
thread_rng()
.sample_iter(Alphanumeric)
.take(6)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So:

  1. We aren't retrying here
  2. Multiple zones are taking snapshots concurrently
  3. https://en.wikipedia.org/wiki/Birthday_problem exists

What do you think about bumping up this number slightly higher from 6 to like, 12?

I think the odds here are still pretty unlikely to collide (I think there are 56800235584 possible file names with this current config) but increasing the length here from 6 -> 12 would make a collision have roughly the same likelihood as a UUIDv4 collision, which we consider "unlikely enough to not care about".

/// e.g. `/pool/ext/<UUID>/crypt/zone/<ZONE_NAME>/root/var/log/svc/<LOGFILE>`
Current,
/// Logs that have been archived by sled-agent into a debug dataset.
/// e.g. `/pool/ext/<UUID>/crypt/debug/<ZONE_NAME/<LOGFILE>`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// e.g. `/pool/ext/<UUID>/crypt/debug/<ZONE_NAME/<LOGFILE>`
/// e.g. `/pool/ext/<UUID>/crypt/debug/<ZONE_NAME>/<LOGFILE>`

&name,
&[SLED_DIAGNOSTICS_ZFS_PROPERTY_NAME],
Some(illumos_utils::zfs::PropertySource::Local),
) else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was copied from the zone bundle code, but we probably want to match on this result and log the error, in this "else" case.

}

#[derive(Debug)]
struct SnapshotPermit {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: Why are these called permits?

let snapshots = get_sled_diagnostics_snapshots(zfs_filesystem);
assert_eq!(snapshots.len(), 1, "duplicate snapshots not taken");

// // Free all of the permits
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// // Free all of the permits
// Free all of the permits

Copy link
Contributor

@wfchandler wfchandler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just a couple nits

// We log any errors saving the zip file to disk and
// continue on.
if let Err(e) = log_collection_result {
error!(&self.log, "failed to write logs outut: {e}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
error!(&self.log, "failed to write logs outut: {e}");
error!(&self.log, "failed to write logs output: {e}");

}
}
Err(err) => {
tokio::fs::write(path.join("{zone}.logs.err"), err.to_string())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will generate a literal {zone}.

Suggested change
tokio::fs::write(path.join("{zone}.logs.err"), err.to_string())
tokio::fs::write(path.join(format!("{zone}.logs.err")), err.to_string())

/// Cleanup snapshots that may have been left around due to unknown
/// circumstances such as a crash.
pub fn cleanup_snapshots(&self) {
let diagnostic_snapshots = Zfs::list_snapshots().unwrap().into_iter().filter (|snap| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to assume Zfs::list_snapshots will never return an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jinx! This was copied from zone bundles which also calls unwrap. I am not opposed to logging an error and allowing sled-agent to still come up in the face of errors, this seems like the right approach.

Created using spr 1.3.6-beta.1
@@ -1222,8 +1222,8 @@ impl Zfs {
snap_name: &'a str,
properties: &'a [(&'a str, &'a str)],
) -> Result<(), CreateSnapshotError> {
let mut command = std::process::Command::new(ZFS);
let mut cmd = command.arg("snapshot");
let mut command = std::process::Command::new(PFEXEC);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because as far as I can tell zone bundles and the sled diagnostics creates are the only consumers. I ran into the issue because I was running cargo nextest run and I saw permission denied, where as sled-agent is running as root.

/// Cleanup snapshots that may have been left around due to unknown
/// circumstances such as a crash.
pub fn cleanup_snapshots(&self) {
let diagnostic_snapshots = Zfs::list_snapshots().unwrap().into_iter().filter (|snap| {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this as it was when I pulled it in from the zone bundle code but do we want to block sled-agent from starting up with the unwrap() or should we change this to a error!() statement and carry on?

pub fn full_path(&self) -> Result<Utf8PathBuf, GetValueError> {
// TODO:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed as #8023

})??;

// Cleanup the zip file since we no longer need it
if let Err(e) = tokio::fs::remove_file(&output_file).await {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that's the idea.

method = GET,
path = "/support/logs/zones",
}]
async fn support_logs(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 23430d7

method = GET,
path = "/support/logs/download/{zone}",
}]
async fn support_logs_download(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 23430d7

let mut archived: Vec<_> = service_logs
.archived
.into_iter()
.filter(|log| log.path.as_str().contains("crypt/debug"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 23430d7

I added a comment about why we are searching for only crypt/debug.

snapshot_logfile: &Utf8Path,
) -> Result<(), LogError> {
let Some(log_name) = snapshot_logfile.file_name() else {
debug!(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a warning. This is not a hard error as I want to collect as many logs as we can in a support bundle rather than bailing out early.

zip_path,
FullFileOptions::default()
.compression_method(zip::CompressionMethod::Zstd)
.compression_level(Some(3)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some very minor comparisons locally that took into account speed vs size. So we are optimizing for speed here as the final compression doesn't matter as we are unpacking this zip on the nexus side before nexus assembles the final support bundle.

Created using spr 1.3.6-beta.1
Comment on lines +1319 to +1324
/// NB: Be careful when calling this method as it may return a `Utf8PathBuf`
/// that does not actually map to a real filesystem path. On helios systems
/// `rpool/ROOT/<BE>` for example will will return
/// "legacy/.zfs/snapshot/<SNAP_NAME>" because the mountpoint of the dataset
/// is a "legacy" mount. Additionally a fileystem with no mountpoint will
/// have a zfs mountpoint property of "-".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this API, it seems really easy to misuse, even with this comment. I suppose we have filed an issue here, but I think we should prioritize fixing this quickly, rather than leaving it as tech debt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can pick this up right after we land this. I will probably steal the implementation from here and just update call sites

@@ -66,20 +65,25 @@ pub enum LogError {
Zip(#[from] ZipError),
}

///A ZFS snapshot that is taken by the `sled-diagnostics` crate and handles
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
///A ZFS snapshot that is taken by the `sled-diagnostics` crate and handles
/// A ZFS snapshot that is taken by the `sled-diagnostics` crate and handles

Created using spr 1.3.6-beta.1
Copy link
Contributor

@wfchandler wfchandler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing further from me, thanks!

@papertigers papertigers merged commit 81bfe57 into main Apr 24, 2025
19 checks passed
@papertigers papertigers deleted the spr/papertigers/support-bundles-should-include-log-files branch April 24, 2025 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants