Skip to content
This repository was archived by the owner on Jul 28, 2025. It is now read-only.

Commit 3f9bc68

Browse files
author
adam-sutton-1992
committed
resolves merge conflict of imports
2 parents 6a820f0 + 7fddac0 commit 3f9bc68

22 files changed

+725
-54
lines changed

.github/workflows/main.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,9 @@ jobs:
1616
max-parallel: 4
1717

1818
steps:
19-
- uses: actions/checkout@v2
19+
- uses: actions/checkout@v4
2020
- name: Set up Python ${{ matrix.python-version }}
21-
uses: actions/setup-python@v2
21+
uses: actions/setup-python@v4
2222
with:
2323
python-version: ${{ matrix.python-version }}
2424
- name: Install dependencies
@@ -48,13 +48,13 @@ jobs:
4848

4949
steps:
5050
- name: Checkout master
51-
uses: actions/checkout@v2
51+
uses: actions/checkout@v4
5252
with:
5353
ref: 'master'
5454
fetch-depth: 0
5555

5656
- name: Set up Python 3.9
57-
uses: actions/setup-python@v2
57+
uses: actions/setup-python@v4
5858
with:
5959
python-version: 3.9
6060

.github/workflows/production.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,13 @@ jobs:
1414

1515
steps:
1616
- name: Checkout production
17-
uses: actions/checkout@v2
17+
uses: actions/checkout@v4
1818
with:
1919
ref: ${{ github.event.release.target_commitish }}
2020
fetch-depth: 0
2121

2222
- name: Set up Python 3.9
23-
uses: actions/setup-python@v2
23+
uses: actions/setup-python@v4
2424
with:
2525
python-version: 3.9
2626

.readthedocs.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,13 @@ version: 2
77
build:
88
os: ubuntu-20.04
99
tools:
10-
python: "3.9"
10+
python: "3.10"
1111

1212
sphinx:
1313
configuration: docs/conf.py
1414

1515
python:
1616
install:
17+
- requirements: docs/requirements.txt
1718
- method: setuptools
18-
path: .
19-
- requirements: docs/requirements.txt
19+
path: .

README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,20 @@ To download any of these models, please [follow this link](https://uts.nlm.nih.g
3838
- **Paper**: [What’s in a Summary? Laying the Groundwork for Advances in Hospital-Course Summarization](https://www.aclweb.org/anthology/2021.naacl-main.382.pdf)
3939
- ([more...](https://github.com/CogStack/MedCAT/blob/master/media/news.md))
4040

41+
## Installation
42+
To install the latest version of MedCAT run the following command:
43+
```
44+
pip install medcat
45+
```
46+
Normal installations of MedCAT will install torch-gpu and all relevant dependancies (such as CUDA). This can require as much as 10 GB more disk space, which isn't required for CPU only usage.
47+
48+
To install the latest version of MedCAT without torch GPU support run the following command:
49+
```
50+
pip install medcat --extra_index_url https://download.pytorch.org/whl/cpu/
51+
```
4152
## Demo
4253
A demo application is available at [MedCAT](https://medcat.rosalind.kcl.ac.uk). This was trained on MIMIC-III and all of SNOMED-CT.
54+
PS: This link can take a long time to load the first time around. The machine spins up as needed and spins down when inactive.
4355

4456
## Tutorials
4557
A guide on how to use MedCAT is available at [MedCAT Tutorials](https://github.com/CogStack/MedCATtutorials). Read more about MedCAT on [Towards Data Science](https://towardsdatascience.com/medcat-introduction-analyzing-electronic-health-records-e1c420afa13a).

docs/requirements.txt

Lines changed: 102 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,104 @@
1-
Sphinx~=4.0
1+
sphinx==6.2.1
22
sphinx-rtd-theme~=1.0
33
myst-parser~=0.17
4-
sphinx-autoapi~=1.8
5-
setuptools>=60.0
6-
aiohttp==3.8.5
4+
sphinx-autoapi~=3.0.0
5+
MarkupSafe==2.1.3
6+
accelerate==0.23.0
7+
aiofiles==23.2.1
8+
aiohttp==3.8.5
9+
aiosignal==1.3.1
10+
asttokens==2.4.0
11+
async-timeout==4.0.3
12+
attrs==23.1.0
13+
backcall==0.2.0
14+
blis==0.7.11
15+
catalogue==2.0.10
16+
certifi==2023.7.22
17+
charset-normalizer==3.3.0
18+
click==8.1.7
19+
comm==0.1.4
20+
confection==0.1.3
21+
cymem==2.0.8
22+
datasets==2.14.5
23+
decorator==5.1.1
24+
dill==0.3.7
25+
exceptiongroup==1.1.3
26+
executing==2.0.0
27+
filelock==3.12.4
28+
flake8==4.0.1
29+
frozenlist==1.4.0
30+
fsspec==2023.6.0
31+
gensim==4.3.2
32+
huggingface-hub==0.17.3
33+
idna==3.4
34+
ipython==8.16.1
35+
ipywidgets==8.1.1
36+
jedi==0.19.1
37+
jinja2==3.1.2
38+
joblib==1.3.2
39+
jsonpickle==3.0.2
40+
jupyterlab-widgets==3.0.9
41+
langcodes==3.3.0
42+
matplotlib-inline==0.1.6
43+
mccabe==0.6.1
44+
mpmath==1.3.0
45+
multidict==6.0.4
46+
multiprocess==0.70.15
47+
murmurhash==1.0.10
48+
mypy==1.0.0
49+
mypy-extensions==0.4.3
50+
networkx==3.1
51+
numpy==1.25.2
52+
packaging==23.2
53+
pandas==2.1.1
54+
parso==0.8.3
55+
pathy==0.10.2
56+
pexpect==4.8.0
57+
pickleshare==0.7.5
58+
preshed==3.0.9
59+
prompt-toolkit==3.0.39
60+
psutil==5.9.5
61+
ptyprocess==0.7.0
62+
pure-eval==0.2.2
63+
pyarrow==13.0.0
64+
pycodestyle==2.8.0
65+
pydantic==1.10.13
66+
pyflakes==2.4.0
67+
pygments==2.16.1
68+
python-dateutil==2.8.2
69+
pytz==2023.3.post1
70+
pyyaml==6.0.1
71+
regex==2023.10.3
72+
requests==2.31.0
73+
safetensors==0.4.0
74+
scikit-learn==1.3.1
75+
scipy==1.9.3
76+
six==1.16.0
77+
smart-open==6.4.0
78+
spacy==3.4.4
79+
spacy-legacy==3.0.12
80+
spacy-loggers==1.0.5
81+
srsly==2.4.8
82+
stack-data==0.6.3
83+
sympy==1.12
84+
thinc==8.1.12
85+
threadpoolctl==3.2.0
86+
tokenizers==0.14.1
87+
tomli==2.0.1
88+
torch==2.1.0
89+
tqdm==4.66.1
90+
traitlets==5.11.2
91+
transformers==4.34.0
92+
triton==2.1.0
93+
typer==0.7.0
94+
types-PyYAML==6.0.3
95+
types-aiofiles==0.8.3
96+
types-setuptools==57.4.10
97+
typing-extensions==4.8.0
98+
tzdata==2023.3
99+
urllib3==2.0.6
100+
wasabi==0.10.1
101+
wcwidth==0.2.8
102+
widgetsnbextension==4.0.9
103+
xxhash==3.4.1
104+
yarl==1.9.2

medcat/cat.py

Lines changed: 60 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -271,6 +271,10 @@ def create_model_pack(self, save_dir_path: str, model_pack_name: str = DEFAULT_M
271271
cdb_path = os.path.join(save_dir_path, "cdb.dat")
272272
self.cdb.save(cdb_path, json_path)
273273

274+
# Save the config
275+
config_path = os.path.join(save_dir_path, "config.json")
276+
self.cdb.config.save(config_path)
277+
274278
# Save the Vocab
275279
vocab_path = os.path.join(save_dir_path, "vocab.dat")
276280
if self.vocab is not None:
@@ -362,6 +366,10 @@ def load_model_pack(cls,
362366
logger.info('Loading model pack with %s', 'JSON format' if json_path else 'dill format')
363367
cdb = CDB.load(cdb_path, json_path)
364368

369+
# load config
370+
config_path = os.path.join(model_pack_path, "config.json")
371+
cdb.load_config(config_path)
372+
365373
# TODO load addl_ner
366374

367375
# Modify the config to contain full path to spacy model
@@ -832,9 +840,13 @@ def add_and_train_concept(self,
832840
Refer to medcat.cat.cdb.CDB.add_concept
833841
"""
834842
names = prepare_name(name, self.pipe.spacy_nlp, {}, self.config)
843+
if not names and cui not in self.cdb.cui2preferred_name and name_status == 'P':
844+
logger.warning("No names were able to be prepared in CAT.add_and_train_concept "
845+
"method. As such no preferred name will be able to be specifeid. "
846+
"The CUI: '%s' and raw name: '%s'", cui, name)
835847
# Only if not negative, otherwise do not add the new name if in fact it should not be detected
836848
if do_add_concept and not negative:
837-
self.cdb.add_concept(cui=cui, names=names, ontologies=ontologies, name_status=name_status, type_ids=type_ids, description=description,
849+
self.cdb._add_concept(cui=cui, names=names, ontologies=ontologies, name_status=name_status, type_ids=type_ids, description=description,
838850
full_build=full_build)
839851

840852
if spacy_entity is not None and spacy_doc is not None:
@@ -1327,19 +1339,42 @@ def _save_docs_to_file(self, docs: Iterable, annotated_ids: List[str], save_dir_
13271339
pickle.dump((annotated_ids, part_counter), open(annotated_ids_path, 'wb'))
13281340
return part_counter
13291341

1342+
@deprecated(message="Use `multiprocessing_batch_char_size` instead")
13301343
def multiprocessing(self,
13311344
data: Union[List[Tuple], Iterable[Tuple]],
13321345
nproc: int = 2,
13331346
batch_size_chars: int = 5000 * 1000,
13341347
only_cui: bool = False,
1335-
addl_info: List[str] = [],
1348+
addl_info: List[str] = ['cui2icd10', 'cui2ontologies', 'cui2snomed'],
13361349
separate_nn_components: bool = True,
13371350
out_split_size_chars: Optional[int] = None,
13381351
save_dir_path: str = os.path.abspath(os.getcwd()),
13391352
min_free_memory=0.1) -> Dict:
1353+
return self.multiprocessing_batch_char_size(data=data, nproc=nproc,
1354+
batch_size_chars=batch_size_chars,
1355+
only_cui=only_cui, addl_info=addl_info,
1356+
separate_nn_components=separate_nn_components,
1357+
out_split_size_chars=out_split_size_chars,
1358+
save_dir_path=save_dir_path,
1359+
min_free_memory=min_free_memory)
1360+
1361+
def multiprocessing_batch_char_size(self,
1362+
data: Union[List[Tuple], Iterable[Tuple]],
1363+
nproc: int = 2,
1364+
batch_size_chars: int = 5000 * 1000,
1365+
only_cui: bool = False,
1366+
addl_info: List[str] = [],
1367+
separate_nn_components: bool = True,
1368+
out_split_size_chars: Optional[int] = None,
1369+
save_dir_path: str = os.path.abspath(os.getcwd()),
1370+
min_free_memory=0.1) -> Dict:
13401371
r"""Run multiprocessing for inference, if out_save_path and out_split_size_chars is used this will also continue annotating
13411372
documents if something is saved in that directory.
13421373
1374+
This method batches the data based on the number of characters as specified by user.
1375+
1376+
PS: This method is unlikely to work on a Windows machine.
1377+
13431378
Args:
13441379
data:
13451380
Iterator or array with format: [(id, text), (id, text), ...]
@@ -1523,15 +1558,35 @@ def _multiprocessing_batch(self,
15231558

15241559
return docs
15251560

1526-
def multiprocessing_pipe(self,
1527-
in_data: Union[List[Tuple], Iterable[Tuple]],
1561+
@deprecated(message="Use `multiprocessing_batch_docs_size` instead")
1562+
def multiprocessing_pipe(self, in_data: Union[List[Tuple], Iterable[Tuple]],
15281563
nproc: Optional[int] = None,
15291564
batch_size: Optional[int] = None,
15301565
only_cui: bool = False,
15311566
addl_info: List[str] = [],
15321567
return_dict: bool = True,
15331568
batch_factor: int = 2) -> Union[List[Tuple], Dict]:
1534-
"""Run multiprocessing NOT FOR TRAINING
1569+
return self.multiprocessing_batch_docs_size(in_data=in_data, nproc=nproc,
1570+
batch_size=batch_size,
1571+
only_cui=only_cui,
1572+
addl_info=addl_info,
1573+
return_dict=return_dict,
1574+
batch_factor=batch_factor)
1575+
1576+
def multiprocessing_batch_docs_size(self,
1577+
in_data: Union[List[Tuple], Iterable[Tuple]],
1578+
nproc: Optional[int] = None,
1579+
batch_size: Optional[int] = None,
1580+
only_cui: bool = False,
1581+
addl_info: List[str] = ['cui2icd10', 'cui2ontologies', 'cui2snomed'],
1582+
return_dict: bool = True,
1583+
batch_factor: int = 2) -> Union[List[Tuple], Dict]:
1584+
"""Run multiprocessing NOT FOR TRAINING.
1585+
1586+
This method batches the data based on the number of documents as specified by the user.
1587+
1588+
PS:
1589+
This method supports Windows.
15351590
15361591
Args:
15371592
in_data (Union[List[Tuple], Iterable[Tuple]]): List with format: [(id, text), (id, text), ...]

0 commit comments

Comments
 (0)