Example Python code to transcribe tables from regulatory filings into a digital form. To run these examples you will need an Anaconda environment, a Mistral API key. In this example we transcribed the balance sheet table from Solvency and Financial Conditions reports that companies need to file every year.
For a subset we took the main 18 life insurance companies operating on the Italian market.
- Credemvita S.p.A.
- AXA MPS Assicurazioni Vita
- CRÈDIT AGRICOLE VITA
- Società Reale Mutua di Assicurazioni
- Cardif Vita S.p.A.
- MEDIOLANUM VITA S.p.A.
- Generali Italia S.p.A.
- Banco BPM Vita S.p.A.
- HDI ASSICURAZIONI S.p.A.
- Gruppo Assicurativo Poste Vita
- FIDEURAM VITA S.P.A.
- CNP Vita Assicura S.p.A.
- ITAS VITA
- Helvetia Vita S.p.A.
- Vittoria Assicurazioni S.p.A.
- GROUPAMA ASSICURAZIONI S.P.A.
- UniCredit Allianz Vita S.p.A.
- Zurich Investments Life S.p.A.
The process of extraction is performed in 5 phases.
- Identify the new SFCR report and save it into the folder Input.
- Identify the pages where the tables of interest are.
- Compile the map of the company run in the master_list.csv.
The notebook performs the following steps (with slight modifications depending on the table format):
- Save the page with the table into a separate folder Single_pdf.
- Use either a Python package or specialized LLM to create a digital equivalent of the table.
- Fix the systemic errors that prevent the table from being saved as DataFrame.
- Save the DataFrame into the Output folder.
The notebook applies fixes to the DataFrame to make the numbers closer to the reported numbers. It joins all the tables into a single dataset and saves it into the Dirty_Combined folder.
The notebook applies a series of tests that check for the internal consistency between the numbers. Flags potential errors. After the individual fixes are applied, it saves the table into the Cleaner_Combined folder.
A version of this process is used by us to extract data for our actuarial models. One of the benefits of releasing our code is the feedback and improvement ideas. If you have any, you can contact us at [email protected].
MIT license