Output Data - level 1

This repository contains digitised manuscripts sale catalogs encoded in XML-TEI at level 1.

The data have not been cleaned (level 2) or post-processed (level 3).

Description of the data

Basic bibliographic information for each catalogue are available here.

Schema

You can find the ODD that validates the encoding in the repository Data_extraction (folder _schemas).

Workflow

Creation of the data

The creation process is described in detail in the following repo.

Cleaning the data

Entries of catalogues look like the following:

<item n="80" xml:id="CAT_000146_e80">
   <num>80</num>
   <name type="author">Cherubini (L.),</name>
   <trait>
      <p>l'illustre compositeur</p>
   </trait>
   <desc>L. a s.; 1836, 1 p 1 /2 in8.</desc>
    <measure commodity="currency" unit="FRF" quantity="12">12</measure>
</item>

Most of the reconciliation process uses data from the <desc> element of our xml files. We therefore need to correct typos to ease further post-processing, e.g.

L. a s. -> L. a. s.
in8 -> in-8
1 /2 -> 1/2
1 p -> 1 p.

The clean_xml.py script available here tackles this problem.

Installation and use

* git clone https://github.com/katabase/1_OutputData.git
* cd 1_OutputData
* python3 -m venv my_env
* source my_env/bin/activate
* pip install -r requirements.txt
* python script/clean_xml.py -f FILENAME processes one single file
	OR
* python script/clean_xml.py -d DIRECTORY processes all the files contained in a directory

Credits

The ODD was created by Lucie Rondeau du Noyer.
clean_xml.pywas created by Simon Gabay.
The catalogs were encoded by Lucie Rondeau du Noyer, Simon Gabay, Matthias Gille Levenson, Ljudmila Petkovic and Alexandre Bartz.

Cite this repository

Alexandre Bartz, Simon Gabay, Matthias Gille Levenson, Ljudmila Petkovic and Lucie Rondeau du Noyer, Manuscript sale catalogues, Neuchâtel: Université de Neuchâtel, 2019, https://github.com/katabase/1_OutputData.

Licence

The catalogues are licensed under Creative Commons Attribution 4.0 International Licence and the code is licensed under GNU GPL-3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
1-100		1-100
101-200		101-200
201-300		201-300
301-400		301-400
401-500		401-500
output		output
script		script
.gitignore		.gitignore
Data.xpr		Data.xpr
LICENSE		LICENSE
LICENSE_CATALOGUES		LICENSE_CATALOGUES
README.md		README.md
_listDATA.csv		_listDATA.csv
listDATA.xsl		listDATA.xsl
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Output Data - level 1

Description of the data

Schema

Workflow

Creation of the data

Cleaning the data

Installation and use

Credits

Cite this repository

Licence

About

Releases

Packages

Contributors 5

Languages

License

katabase/1_OutputData

Folders and files

Latest commit

History

Repository files navigation

Output Data - level 1

Description of the data

Schema

Workflow

Creation of the data

Cleaning the data

Installation and use

Credits

Cite this repository

Licence

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages