Skip to content

Commit

Permalink
new submission
Browse files Browse the repository at this point in the history
  • Loading branch information
Guo Zhengrui committed Mar 9, 2024
1 parent edc5aba commit 37a09af
Show file tree
Hide file tree
Showing 3 changed files with 100 additions and 2 deletions.
76 changes: 76 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Dataset, model weight, source code for paper "HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-modal Context Interaction"

## Prerequisite
Follow this instruction to create conda environment and install necessary packages:
```
git clone https://github.com/dddavid4real/HistGen.git
cd HistGen
conda env create -f requirements.yml
```
## HistGen WSI-report dataset
Our curated dataset could be downloaded from <https://hkustconnect-my.sharepoint.com/:f:/g/personal/zguobc_connect_ust_hk/EhmtBBT0n2lKtiCQt97eqcEBWnB6K9-Dwr3wruaLyd_xTQ?e=IdWZmi>.

The structure of this fold is shown as follows:
```
HistGen WSI-report dataset/
|-- WSIs
| |-- slide_1.svs
| |-- slide_2.svs
| ╵-- ...
|-- dinov2_vitl
| |-- slide_1.pt
| |-- slide_2.pt
| ╵-- ...
╵-- annotation.json
```
in which **WSIs** denotes the original WSI data from TCGA, **dinov2_vitl** is the features of original WSIs extracted by our pre-trained DINOv2 ViT-L backbone, and **annotation.json** contains the diagnostic reports and case ids of their corresponding WSIs. Concretely, the structure of this file is like this:
```
{
"train": [
{
"id": "TCGA-A7-A6VW-01Z-00-DX1.1BC4790C-DB45-4A3D-9C97-92C92C03FF60",
"report": "Final Surgical Pathology Report Procedure: Diagnosis A. Sentinel lymph node, left axilla ...",
"image_path": [
"/storage/Pathology/wsi-report/wsi/TCGA-A7-A6VW-01Z-00-DX1.1BC4790C-DB45-4A3D-9C97-92C92C03FF60.pt"
],
"split": "train"
},
...
],
"val": [
{
"id": "...",
"report": "...",
"image_path": ["..."],
"split": "val"
},
...
],
"test": [
{
"id": "...",
"report": "...",
"image_path": ["..."],
"split": "test"
},
...
]
}
```
in which we have already split into train/val/test subsets with ratio 8:1:1. Besides, "id" denotes the case id of this report's corresponding WSI, "report" is the full refined text obtained after our proposed report cleaning pipeline, and "image_path" could be just ignored.

<!-- Note that before you use this json file for training, please run the `replace_pt_path.py` we provided to change the "image_path". Usage of `replace_pt_path.py` is written inside the python file. -->

## Pre-trained DINOv2 ViT-L Feature Extractor
We are organizing the training details, dataset used, and other information to release the pre-trained model. Please stay tuned for the update.

## HistGen WSI Report Generation Model
To try our model for training, validation, and testing, simply run the following commands:
```
cd HistGen
conda activate histgen
sh train_wsi_report.sh
```
Before you run the script, please set the path and other hyperparameters in `train_wsi_report.sh`.
22 changes: 22 additions & 0 deletions replace_pt_path.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import json

def update_image_path(json_file, old_path, new_path):
# Read the JSON file
with open(json_file, 'r') as file:
data = json.load(file)

# Update 'image_path' in 'train', 'val', and 'test'
for key in ['train', 'val', 'test']:
if key in data:
for item in data[key]:
item['image_path'] = [path.replace(old_path, new_path) for path in item['image_path']]

# Write the updated data back to the JSON file
with open(json_file, 'w') as file:
json.dump(data, file, indent=4)

# Usage
json_file = 'your_json_file.json' # Replace with your JSON file path
old_path = '/storage/Pathology/wsi-report/wsi'
new_path = '/new/path/here' # Replace with your new path
update_image_path(json_file, old_path, new_path)
4 changes: 2 additions & 2 deletions requirements.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: r2gen
name: histgen
channels:
- pytorch
- nvidia
Expand Down Expand Up @@ -163,4 +163,4 @@ dependencies:
- threadpoolctl==3.3.0
- tqdm==4.66.2
- tzdata==2024.1
prefix: /home/zguobc/miniconda3/envs/r2gen
prefix: /home/zguobc/miniconda3/envs/histgen

0 comments on commit 37a09af

Please sign in to comment.