chimpanzee
Member
- Joined
- Dec 31, 2020
- Messages
- 50
Good luck. I have tried to do just that and there's no formatting information stored in the PDF in a way that the python library can understand. Unfortunately this means that you can get all the text but you can't automatically tell apart the title from the body and worse, you can't tell when sentence or a paragraph ends, they get jumbled together.I'll probably just end up learning python then making a script to help automate all of that, I'll get to that in the near future.
It's possible to hack together a script where it will extract that and you'll need some manual editing to set the title, paragraphs and references, but then it's not much different to copy paste.