PDF to HTML Converter Service

Overview 📖

ai-core-pdf-to-html is a service which utilizes the cognaize_pdf_to_html library for converting PDF documents into HTML. The service is using aiohttp for handling HTTP and WebSocket connections. For more details about cognaize_pdf_to_html, please refer to README.md.

Installation 🚀

The requirements can be installed using pip along with FURY_AUTH authentication token:

FURY_AUTH=${FURY_AUTH} pip install -r requirements.txt

Running the Service Locally 🛠️

Please find step by step guide how to use the service by Postman.

Launch the Server:

python app/server.py

[Optional] To monitor the service's progress in real-time via WebSocket:

Set the request type to WebSocket
Enter the WebSocket URL: ws://localhost:8000/ws
Click Connect

Uploading PDF File to Process:

In Postman,

Set the request type to http
Set the method to POST
Use the URL http://localhost:8000/pdf-to-html
Add a key with the type File and name it file
Select the PDF file from your computer that you wish to upload
Click Send to upload the file and start the HTML Conversion process

Parameters:

charLimit (int, optional): Maximum number of characters per chunk. Default is 12000. Character length must not exceed models 4096 max "token" output generation.
modelName (ModelType, optional): Claude model to use for processing. Default is ModelType.SONNET3 other options include "SONNET3_5.
ocrType (OcrType, optional): OCR engine to use for text extraction. Default is OcrType.DOCTR, other options include "AZURE".

You can specify the ocr type in the request by adding ocrType like this http://localhost:8000/pdf-to-html?ocrType=AZURE&modelName=SONNET3_5&charLimit=15000

PDF to HTML Converter Service

Overview 📖​

Installation 🚀​

Running the Service Locally 🛠️​

Overview 📖

Installation 🚀

Running the Service Locally 🛠️