This project demonstrates a complete, reproducible pipeline for generating synthetic e-commerce datasets, ingesting them into a SQLite database, and querying them using multi-table joins. The entire solution — code, structure, and SQL — was built exclusively through agentic prompt engineering inside Cursor Agentic IDE. This README documents the project, the workflow, and (most importantly) the precise prompts used to guide Cursor’s agent to build the system end-to-end.
This project simulates a small e-commerce ecosystem by generating realistic-looking customers, products, orders, order_items, and payments. Each run produces fresh CSV files, which are then loaded into a clean SQLite database. A final SQL query demonstrates multi-table joins and aggregation (e.g., revenue per customer). The goal is to showcase: Efficient prompt engineering Cursor’s agentic development workflow Clean, reproducible data + DB pipelines SQL join and reporting logic
- Python 3 (standard library only)
- SQLite for storage
- CSV files for intermediate data
- Cursor IDE for all agentic development
data/ # Auto-generated datasets customers.csv products.csv orders.csv order_items.csv payments.csv
scripts/ generate_data.py # Create synthetic CSV datasets ingest_to_db.py # Build/refresh SQLite DB using the CSVs run_queries.py # Execute SQL join query and print results
sql/ queries.sql # SQL join + aggregation logic
README.md .gitignore ecom.db # Auto-generated SQLite database
-
Generate fresh CSVs
python scripts/generate_data.py -
Rebuild the SQLite database from CSVs
python scripts/ingest_to_db.py -
Execute the sample join query and view results
python scripts/run_queries.py
- Hey Cursor — You are an expert backend engineer and Python / SQLite developer. I’m doing a timed assignment, so I want to keep things clean and simple. I need a small Python project that generates synthetic e-commerce data, loads it into SQLite, and runs a SQL query that joins multiple tables. Please set up the initial project structure with:
- data/
- scripts/ (generate_data.py, ingest_to_db.py, run_queries.py)
- sql/queries.sql
- README.md (with Overview, How to Run, Prompts Used)
- .gitignore
Just create the skeleton and short comments in each file. No logic yet.
- Cursor created the full folder structure, added placeholder Python files with comments, generated README sections, and initialized .gitignore. Files created: data/, scripts/, sql/, README.md, .gitignore.
- Now implement generate_data.py. I need 5 CSVs: customers.csv, products.csv, orders.csv, order_items.csv, payments.csv.
Use Python standard libraries. Generate around:
- 50 customers
- 20 products
- 150 orders
- 300–400 order_items
- payments for non-cancelled orders
Respect foreign keys, overwrite files each run, and print a summary after writing each CSV.
- Cursor created a complete script with helper functions, random data generation, referential integrity enforcement, CSV writing utilities, and printed row summaries. Running the script generated all 5 CSV files successfully.
- Implement ingest_to_db.py so it:
-
Creates ecom.db
-
Drops & recreates tables
-
Loads each CSV with sqlite3 + csv modules
-
Creates indexes on FK columns
-
Prints row counts inserted Script must be rerunnable cleanly.
-
Cursor generated full table DDL, drop/create logic, CSV parsing, bulk inserts, and index creation. Running the script produced a clean SQLite DB with correct row counts.
- Write a multi-table join query in sql/queries.sql that computes revenue per customer.
Then implement run_queries.py to:
-
read the SQL file
-
execute it
-
print a formatted table output
-
warn the user if the DB doesn't exist
-
Cursor wrote a LEFT JOIN–based revenue aggregation query and a Python script to read & run the SQL. Running the script displayed a sorted table of customers and their total revenue.
- Update README to look clean and professional. Include sections: Title, Overview, Tech Stack, Project Structure, How to Run, Prompts Used. Simplify tone and improve readability.
This project intentionally highlights: -How agentic prompting can build a full data pipeline -How to guide Cursor using clear, high-level instructions -How to structure iterative prompts for maximum output quality -How meaningful prompts shorten development time significantly
Each code script was entirely AI-generated using the Cursor Agentic IDE, but human-directed through concise instructions.
┌──────────────────────────┐
│ scripts/generate_data.py│
│──────────────────────────│
│ • Creates synthetic │
│ e-commerce datasets │
│ • Writes 5 CSV files │
│ (customers, products, │
│ orders, items, pay.) │
└──────────────┬───────────┘
│
▼
┌────────────────────────────────┐
│ data/ │
│────────────────────────────────│
│ • CSV intermediates used for │
│ ingestion into SQLite │
└──────────────┬─────────────────┘
│
▼
┌──────────────────────────────────┐
│ scripts/ingest_to_db.py │
│──────────────────────────────────│
│ • Drops, recreates SQLite tables│
│ • Loads CSVs with referential │
│ integrity │
│ • Builds indexes │
└──────────────┬───────────────────┘
│
▼
┌──────────────────────────────────────┐
│ ecom.db │
│──────────────────────────────────────│
│ SQLite database with: │
│ • customers │
│ • products │
│ • orders |
│ • order_items │
│ • payments │
└──────────────┬───────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ sql/queries.sql │
│──────────────────────────────────────────│
│ • Revenue aggregation query │
│ • Multi-table joins │
└──────────────┬───────────────────────────┘
│
▼
┌────────────────────────────────────────────┐
│ scripts/run_queries.py │
│────────────────────────────────────────────│
│ • Loads SQL from file │
│ • Executes query against SQLite DB │
│ • Prints formatted output table │
└────────────────────────────────────────────┘