Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Cosmos performance benchmark to analyse memory usage #1472

Open
tatiana opened this issue Jan 16, 2025 · 3 comments
Open

Extend Cosmos performance benchmark to analyse memory usage #1472

tatiana opened this issue Jan 16, 2025 · 3 comments
Labels
area:performance Related to performance, like memory usage, CPU usage, speed, etc

Comments

@tatiana
Copy link
Collaborator

tatiana commented Jan 16, 2025

In Cosmos, we have benchmark scripts that test task throughput using synthetic data and using Airflow local executor:
https://github.com/astronomer/astronomer-cosmos/blob/main/.github/workflows/test.yml#L350
https://github.com/astronomer/astronomer-cosmos/blob/main/tests/perf/test_performance.py

While those are very helpful, we currently don't analyse memory consumption.

It would be great if, from time to time, we could run a performance benchmark in Astro and we could identify the memory footprint per task,

@tatiana
Copy link
Collaborator Author

tatiana commented Jan 16, 2025

Related task: #1471

@dosubot dosubot bot added the area:performance Related to performance, like memory usage, CPU usage, speed, etc label Jan 16, 2025
@josix
Copy link

josix commented Jan 16, 2025

Would memray/pytest-memary suit our needs? Perhaps we could generate a memory footprint report alongside the performance test—it might help identify where the overhead occurs. If so, I'd be happy to help create a POC for it.

@tatiana
Copy link
Collaborator Author

tatiana commented Jan 21, 2025

Yes, @josix , I think that would be a great start!

An approach that @sbaldassin used in the past was to dynamically create N tasks and subclass Cosmos Operators so they would use psutil and somehow log the mem_info. This example is a function used by PythonOperator, but we could try to use a similar approach in Cosmos:

(credit of this code to @sbaldassin:

def write_temp_files():
    process = psutil.Process()
    temp_dir = tempfile.mkdtemp()
    print(f"Temporary directory created: {temp_dir}")

    try:
        for i in range(10):  # Create 10 temporary files
            file_path = os.path.join(temp_dir, f"temp_file_{i}.txt")
            with open(file_path, "w") as temp_file:
                temp_file.write("X" * 10**6)  # Write 1 MB of data

            mem_info = process.memory_info()
            print(f"Step {i + 1}: Memory Usage: {mem_info.rss / (1024 * 1024):.2f} MB")

            time.sleep(1)  # Simulate delay between file writes

    finally:
        for file_name in os.listdir(temp_dir):
            os.remove(os.path.join(temp_dir, file_name))
        os.rmdir(temp_dir)
        print(f"Temporary directory deleted: {temp_dir}")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:performance Related to performance, like memory usage, CPU usage, speed, etc
Projects
None yet
Development

No branches or pull requests

2 participants