Skip to main contentIBM Digital Self-Serve Co-Create Experience

Scanned document Q&A


Step 1. Review the use case summary

The demo showcases how applying the Retrieval Augmented Generation (RAG) pattern enables the retrieval of answers from documents in a Q&A format. However, the quality of the answer depends on the quality of the text extracted from the documents, especially if the documents are scanned, have handwritten text and have quality issues. The demo shows how scanned PDF documents are converted to text using watsonx.ai text extraction API to support Q&A using a watsonx.ai LLM with high accuracy. It also shows a side-by-side comparison with popular open source extraction technology using the same watsonx.ai LLM for answer generation.


Step 2. Try the interactive demo application

Experience our interactive demo and explore its features using the sample data.

Architecture diagram

Architecture diagram

Step 3. Get an application code sample

Get a sample application code for the interactive demo. The code is a complete app that makes API calls to a watsonx instance in IBM Cloud using an IBM Cloud user API key. Modify the code for experimentation. It is assumed that Python3+ is installed or download from https://www.python.org/downloads/.

If you need an API key, please activate a watsonx.ai trial account

Refer to the readme.md inside the application folder to setup and run the application.


Step 4. Explore the technology powering this use case

Discover product benefits or begin your free trial

Share your experience
Share your selection with your team.

Do not input personal data, or data that is sensitive or confidential into demonstration assets.