Transcribe SFCR tables with Mistral AI

Example Python code to transcribe tables from regulatory filings into a digital form. To run these examples you will need an Anaconda environment, a Mistral API key. In this example we transcribed the balance sheet table from Solvency and Financial Conditions reports that companies need to file every year.

For a subset we took the main 18 life insurance companies operating on the Italian market.

Companies in scope

Credemvita S.p.A.
AXA MPS Assicurazioni Vita
CRÈDIT AGRICOLE VITA
Società Reale Mutua di Assicurazioni
Cardif Vita S.p.A.
MEDIOLANUM VITA S.p.A.
Generali Italia S.p.A.
Banco BPM Vita S.p.A.
HDI ASSICURAZIONI S.p.A.
Gruppo Assicurativo Poste Vita
FIDEURAM VITA S.P.A.
CNP Vita Assicura S.p.A.
ITAS VITA
Helvetia Vita S.p.A.
Vittoria Assicurazioni S.p.A.
GROUPAMA ASSICURAZIONI S.P.A.
UniCredit Allianz Vita S.p.A.
Zurich Investments Life S.p.A.

Description of the process

The process of extraction is performed in 5 phases.

Phase 0: Find the reports and identify the relevant tables (manually).

Identify the new SFCR report and save it into the folder Input.
Identify the pages where the tables of interest are.
Compile the map of the company run in the master_list.csv.

Phase 1: Run the Extraction notebook (released on 23-September-2025).

The notebook performs the following steps (with slight modifications depending on the table format):

Save the page with the table into a separate folder Single_pdf.
Use either a Python package or specialized LLM to create a digital equivalent of the table.
Fix the systemic errors that prevent the table from being saved as DataFrame.
Save the DataFrame into the Output folder.

Phase 2: Run the Processing notebook (released on 4-October-2025).

The notebook applies fixes to the DataFrame to make the numbers closer to the reported numbers. It joins all the tables into a single dataset and saves it into the Dirty_Combined folder.

Phase 3: Run the Cross-Validation notebook (released on 7-October-2025).

The notebook applies a series of tests that check for the internal consistency between the numbers. Flags potential errors. After the individual fixes are applied, it saves the table into the Cleaner_Combined folder.

Phase 4: Final modifications to the table and a manual inspection (no script for this step).

Contact

A version of this process is used by us to extract data for our actuarial models. One of the benefits of releasing our code is the feedback and improvement ideas. If you have any, you can contact us at [email protected].

License

MIT license

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Clean_Combined		Clean_Combined
Cleaner_Combined		Cleaner_Combined
Dirty_Combined		Dirty_Combined
Input		Input
Output		Output
PDFs of notebooks for a quick look		PDFs of notebooks for a quick look
Single_pdf		Single_pdf
LICENSE		LICENSE
Phase_1_Extraction_Italy_SII_demo.ipynb		Phase_1_Extraction_Italy_SII_demo.ipynb
Phase_2_Processing_Italy_SII_demo.ipynb		Phase_2_Processing_Italy_SII_demo.ipynb
Phase_3_Cross_Validation_SII_Italy_demo.ipynb		Phase_3_Cross_Validation_SII_Italy_demo.ipynb
README.md		README.md
master_list.csv		master_list.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Transcribe SFCR tables with Mistral AI

Companies in scope

Description of the process

Phase 0: Find the reports and identify the relevant tables (manually).

Phase 1: Run the Extraction notebook (released on 23-September-2025).

Phase 2: Run the Processing notebook (released on 4-October-2025).

Phase 3: Run the Cross-Validation notebook (released on 7-October-2025).

Phase 4: Final modifications to the table and a manual inspection (no script for this step).

Contact

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

open-source-modelling/SFCR_using_Mistral

Folders and files

Latest commit

History

Repository files navigation

Transcribe SFCR tables with Mistral AI

Companies in scope

Description of the process

Phase 0: Find the reports and identify the relevant tables (manually).

Phase 1: Run the Extraction notebook (released on 23-September-2025).

Phase 2: Run the Processing notebook (released on 4-October-2025).

Phase 3: Run the Cross-Validation notebook (released on 7-October-2025).

Phase 4: Final modifications to the table and a manual inspection (no script for this step).

Contact

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages