|All requests||>||What is the best PHP pdf table parser...||>||Request new recommendation||>||Featured requests||>||No recommendations|
by Francesco Facco de Lagarda - 2 years ago (2019-12-03)
I need to extract data from rows and columns of a table from a PDF file.
The PDF document contains a 5 column table. I need to extract the data from it.
All attempts with various libraries have not been able to understand the table and cant accurately extract the data contained in the individual cells.
2. by Marco van Oostende - 2 years ago (2020-02-18) Reply
It is very much depending on the quality of the PDF. It is not uncommon that cell content is cluttered around the table, or that text is gibberish. I would suggest to simply copy the text you wish from that table onto the clipboard and paste it into something like Notepad or any other text-based tool. This should give you an indication on what is actually possible: if you can find a structure in that text, the above package may work. Big chance it won't however.
1. by Manuel Lemos - 2 years ago (2020-02-18) Reply
Parsing and extracting data from PDF documents is not an easy task due to the complexity of that kind of documents.
There is this PDF document parser but I am not sure if it can handle tables well in PDF document. Can you please try it and let us know if it works well for you?