A Tool for Reading Scientific Text and Interpreting Metadata from the PDF Documents

Academic / Journal Article

Data Science for Social Impact

Miao Zhu, Jacqueline M. Cole

Link Copied!

This article introduces PDFDataExtractor, an open-source, template-based plug-in for ChemDataExtractor that uses spatial layout and rule-based methods to extract logical text blocks and metadata from scientific PDFs. The tool is designed to extract key elements like titles, authors, abstracts, keywords, DOI, and references from scientific documents, facilitating downstream text mining and data-driven materials discovery. Evaluated on multisource chemistry journals, the tool demonstrates high precision and recall, making it a valuable resource for researchers and organizations that need to extract, organize, and analyze large volumes of scientific literature efficiently.

Click to View full content

Learn more about the future with ISDM

This is where you add description.

Click Here

Learn more about the future with ISDM

Related Resources

Development Management in Practice

Outcomes Readiness Framework

What AI Thinks About Sustainability