Palimpzest test case: processing materials science papers for crystal recipes #123

mikecafarella · 2025-02-12T15:50:21Z

We have a colleague who wants to process papers from the materials science domain and extract "synthesis recipes". These recipes are passages that describe fairly complicated procedures for creating a novel chemical. This is useful because the recipes are extremely intricate and hard to discover. If we can build a model that suggests high-quality recipes for novel targets, it would be a big step.

The near-term goal is simply to extract these recipes from existing papers. So we want to populate a schema that looks like this:
(PaperIdentifier, TargetChemical, RecipeText)

After that works, we can populate a structured form of the recipe description. But just getting the raw text first would be helpful.

I have some annotated data we can use, though it's not shareable via Git so please don't commit it here.

Doing a basic but good job here involves:

Certainly writing code that extracts content from PDFs
Evaluating the accuracy of the initial task

After this basic version works, we want to evaluate runtime performance and maybe consider:

Using the PZ RAG operators like retrieve, but this depends on the runtime of the basic
Building a RAG index over the input papers

Finally, we would pursue the structured form of the recipe. I can share the domain experts' proposal for this structure.

mdr223 · 2025-02-26T21:38:17Z

Next step: try to propose structured schema for info we want to extract from highlighted text

mdr223 assigned yash94404 Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Palimpzest test case: processing materials science papers for crystal recipes #123

Palimpzest test case: processing materials science papers for crystal recipes #123

mikecafarella commented Feb 12, 2025

mdr223 commented Feb 26, 2025

Palimpzest test case: processing materials science papers for crystal recipes #123

Palimpzest test case: processing materials science papers for crystal recipes #123

Comments

mikecafarella commented Feb 12, 2025

mdr223 commented Feb 26, 2025