PDF to HTML Converter Service
Overview 📖
ai-core-pdf-to-html
is a service which utilizes the cognaize_pdf_to_html
library for
converting PDF documents into HTML. The service is using aiohttp
for handling HTTP and
WebSocket connections. For more details about cognaize_pdf_to_html
, please refer to
README.md.
Installation 🚀
The requirements can be installed using pip along with FURY_AUTH authentication token:
FURY_AUTH=${FURY_AUTH} pip install -r requirements.txt
Running the Service Locally 🛠️
Please find step by step guide how to use the service by Postman.
Launch the Server:
python app/server.py
[Optional] To monitor the service's progress in real-time via WebSocket:
- Set the request type to
WebSocket
- Enter the WebSocket URL:
ws://localhost:8000/ws
- Click
Connect
Uploading PDF File to Process:
In Postman,
- Set the request type to
http
- Set the method to
POST
- Use the URL http://localhost:8000/pdf-to-html
- Add a key with the type
File
and name itfile
- Select the PDF file from your computer that you wish to upload
- Click
Send
to upload the file and start the HTML Conversion process
Parameters:
- charLimit (int, optional): Maximum number of characters per chunk. Default is 12000. Character length must not exceed models 4096 max "token" output generation.
- modelName (ModelType, optional): Claude model to use for processing. Default is ModelType.SONNET3 other options include "SONNET3_5.
- ocrType (OcrType, optional): OCR engine to use for text extraction. Default is OcrType.DOCTR, other options include "AZURE".
You can specify the ocr type in the request by adding ocrType like this http://localhost:8000/pdf-to-html?ocrType=AZURE&modelName=SONNET3_5&charLimit=15000