Skip to content

Commit

Permalink
Telemetry doc updates (#52)
Browse files Browse the repository at this point in the history
Updated documentation around enabling and disabling experiment and database telemetry.

[ reviewed by @amandarichardsonn ]
[ committed by @AlyssaCote ]
  • Loading branch information
AlyssaCote authored Apr 26, 2024
1 parent 8c4d4b7 commit 2e43fde
Show file tree
Hide file tree
Showing 3 changed files with 61 additions and 17 deletions.
38 changes: 30 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,36 @@
# SmartDashboard

SmartDashboard is an add-on to SmartSim that provides a dashboard to help users understand and monitor their SmartSim experiments in a visual way. Configuration, status, and logs are available for all launched entities within an experiment for easy inspection.
SmartDashboard is an add-on to SmartSim that provides a dashboard to help users understand and monitor their SmartSim experiments in a visual way. Configuration, status, and logs are available for all launched entities within an experiment for easy inspection, along with memory and client data per shard for launched orchestrators.

A ``Telemetry Monitor`` is a background process that is launched along with the experiment
that produces the data displayed by SmartDashboard. The ``Telemetry Monitor`` can be disabled by
adding ``export SMARTSIM_TELEMETRY_ENABLE=0`` as an environment variable. When disabled, SmartDashboard
will not display any data. To re-enable, set the ``SMARTSIM_TELEMETRY_ENABLE`` environment variable to ``1``
with ``export SMARTSIM_TELEMETRY_ENABLE=1``.
A ``Telemetry Monitor`` is a background process that is launched alongside the experiment.
It is responsible for generating the data displayed by SmartDashboard. The ``Telemetry Monitor`` can be disabled globally by
adding ``export SMARTSIM_FLAG_TELEMETRY=0`` as an environment variable. When disabled, SmartDashboard
will not display entity status data. To re-enable, set the ``SMARTSIM_FLAG_TELEMETRY`` environment variable to ``1``
with ``export SMARTSIM_FLAG_TELEMETRY=1``. For workflows involving multiple experiments, SmartSim provides the attributes
``Experiment.telemetry.enable`` and ``Experiment.telemetry.disable`` to manage the enabling or disabling of telemetry on a per-experiment basis.

Experiment metadata is also stored in the ``.smartsim`` directory, a hidden folder for internal api use and used by the dashboard.
Deletion of the experiment folder will remove all experiment metadata.
`Orchestrator` memory and client data can be collected by enabling database telemetry. To do so, add ``Orchestrator.telemetry.enable``
after creating an `Orchestrator` within the driver script. Database telemetry is enabled per `Orchestrator`, so if there are multiple
`Orchestrators` launched, they will each need to be enabled separately in the driver script.

```python
# enabling telemetry example

from smartsim import Experiment

exp = Experiment("experiment", launcher="auto")
exp.telemetry.enable()

db = exp.create_database(db_nodes=3)
db.telemetry.enable()

exp.start(db, block=True)
exp.stop(db)
```

Experiment metadata is stored in the ``.smartsim`` directory, a hidden folder used by the internal api and accessed by the dashboard.
This folder can be found within the created experiment directory.
Deletion of the experiment folder will remove all associated metadata.

## Installation

Expand Down Expand Up @@ -49,6 +70,7 @@ Example workflow:
from smartsim import Experiment

exp = Experiment("hello_world_exp", launcher="auto")
exp.telemetry.enable()
run = exp.create_run_settings(exe="echo", exe_args="Hello World!")
run.set_tasks(60)
run.set_tasks_per_node(20)
Expand Down
3 changes: 2 additions & 1 deletion doc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Development branch

Description


- Add database telemetry documentation. (SmartDashboard-PR52_)
- Auto-post release PR to develop from master (SmartDashboard-PR53_)
- Decrease the pinned version of Pydantic (SmartDashboard-PR51_)
- Bump version to 0.0.4, exclude streamlit version 1.31.X (SmartDashboard-PR50_)
Expand All @@ -16,6 +16,7 @@ Description
on pull requests into develop. (SmartDashboard-PR47_)
- Add manifest file tracking. (SmartDashboard-PR46_)

.. _SmartDashboard-PR52: https://github.com/CrayLabs/SmartDashboard/pull/52
.. _SmartDashboard-PR53: https://github.com/CrayLabs/SmartDashboard/pull/53
.. _SmartDashboard-PR51: https://github.com/CrayLabs/SmartDashboard/pull/51
.. _SmartDashboard-PR50: https://github.com/CrayLabs/SmartDashboard/pull/50
Expand Down
37 changes: 29 additions & 8 deletions doc/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,38 @@ SmartDashboard

``SmartDashboard`` is an add-on to SmartSim that provides a dashboard to help users understand
and monitor their SmartSim experiments in a visual way. Configuration, status, and logs
are available for all launched entities within an experiment for easy inspection.
are available for all launched entities within an experiment for easy inspection,
along with memory and client data per shard for launched orchestrators.

A ``Telemetry Monitor`` is a background process that is launched along with the experiment
that produces the data displayed by SmartDashboard. The ``Telemetry Monitor`` can be disabled by
adding ``export SMARTSIM_TELEMETRY_ENABLE=0`` as an environment variable. When disabled, SmartDashboard
will not display any data. To re-enable, set the ``SMARTSIM_TELEMETRY_ENABLE`` environment variable to ``1``
with ``export SMARTSIM_TELEMETRY_ENABLE=1``.
A ``Telemetry Monitor`` is a background process that is launched alongside the experiment.
It is responsible for generating the data displayed by SmartDashboard. The ``Telemetry Monitor`` can be disabled globally by
adding ``export SMARTSIM_FLAG_TELEMETRY=0`` as an environment variable. When disabled, SmartDashboard
will not display entity status data. To re-enable, set the ``SMARTSIM_FLAG_TELEMETRY`` environment variable to ``1``
with ``export SMARTSIM_FLAG_TELEMETRY=1``. For workflows involving multiple experiments, SmartSim provides the attributes
``Experiment.telemetry.enable`` and ``Experiment.telemetry.disable`` to manage the enabling or disabling of telemetry on a per-experiment basis.

Experiment metadata is also stored in the ``.smartsim`` directory, a hidden folder for internal api use and used by the dashboard.
Deletion of the experiment folder will remove all experiment metadata.
``Orchestrator`` memory and client data can be collected by enabling database telemetry. To do so, add ``Orchestrator.telemetry.enable``
after creating an ``Orchestrator`` within the driver script. Database telemetry is enabled per ``Orchestrator``, so if there are multiple
``Orchestrators`` launched, they will each need to be enabled separately in the driver script.

.. code-block:: python
# enabling telemetry example
from smartsim import Experiment
exp = Experiment("experiment", launcher="auto")
exp.telemetry.enable()
db = exp.create_database(db_nodes=3)
db.telemetry.enable()
exp.start(db, block=True)
exp.stop(db)
Experiment metadata is stored in the ``.smartsim`` directory, a hidden folder used by the internal api and accessed by the dashboard.
This folder can be found within the created experiment directory.
Deletion of the experiment folder will remove all associated metadata.


Installation
Expand Down

0 comments on commit 2e43fde

Please sign in to comment.