DecipherUnsegmented

This hosts the code for the paper, based on https://github.com/j-luo93/xib.

Clean-up plan

Pretrained phonological embeddings are available in data/got.pretrained.pth.
See scripts/demo_compose.py for a demo script to compose phonological embeddings based on IPA transcriptions.
See scripts/demo_pretrained.py for a demo script to load a pretrained embedding layer. Once you run python scripts/demo_pretrained.py, you would obtain two files: data/segments.emb.tsv and data/segments.tsv. Go to the embedding projector to visualize the embeddings by uploading both files.

Iberian data is included in the repo. Three files are included: the original data/hesperia_epigraphy.csv that contains the published data from Hesperia, a Jupyter notebook notebooks/clean_iberian.ipynb that I used to clean up the data, and finally the cleaned csv data/iberian.csv.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
du		du
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py