This hosts the code for the paper, based on https://github.com/j-luo93/xib.
- phonological embedding composition
- modules to compose embeddings
- demo script for composing embeddings
- pretrained phonological embedding
- core IPA helper modules
- core aligned corpus helper modules
- core training and model modules
- notebooks and scripts
- Clone this repository recursively
git clone --recursive <link>. pip install -r requirements.txtto install Python dependencies- Install
pytorch - Install
dev_miscby runningcd dev_misc & pip install -e . - Install this repository by running
pip install -e .in the root directory
- Pretrained phonological embeddings are available in
data/got.pretrained.pth. - See
scripts/demo_compose.pyfor a demo script to compose phonological embeddings based on IPA transcriptions. - See
scripts/demo_pretrained.pyfor a demo script to load a pretrained embedding layer. Once you runpython scripts/demo_pretrained.py, you would obtain two files:data/segments.emb.tsvanddata/segments.tsv. Go to the embedding projector to visualize the embeddings by uploading both files.
- Iberian data is included in the repo. Three files are included: the original
data/hesperia_epigraphy.csvthat contains the published data from Hesperia, a Jupyter notebooknotebooks/clean_iberian.ipynbthat I used to clean up the data, and finally the cleaned csvdata/iberian.csv.