Skip to main content

Thread: searching within a pdf from the command line


i want search through large number of pdf files specific phrase. command line, nothing. specifically, use:
find ./ -maxdepth 3 -name *.pdf |xargs grep phrasetosearchfor

if open pdfs in text editor, scrambled (binary) , can't search meaningful.

if open pdf in documentviewer, or acrobat, can search text , find no problems.

know how search internal text without opening pdf? otherwise can't batch it.

didn't make pdfs, , can't change how originals formatted, alternatively settle program converts them searchable format.

under both linux , windows, can use acrobat reader, has command search multiple files.

under linux, there recoll ("recoll - personal full text search package qt gui"), builds index of pdf files on first run. install it, run: " sudo apt-get install recoll " - or find in software center.

can convert files text using "pdftotext" utility directly. iit's available via apt-get or software center. use grep on text files.

code:
find -name '*.pdf' -exec pdftotext {} \; grep -r --include '*.txt' -l -f "exact phrase search" grep -r --include '*.txt' -l -e "regular expression search"
more details: command line tool search phrases in large number of pdf files


Forum The Ubuntu Forum Community Ubuntu Official Flavours Support General Help [all variants] searching within a pdf from the command line


Ubuntu

Comments

Popular posts from this blog

How to change text Component easybook reloaded *newbee* - Joomla! Forum - community, help and support

After Effect warning: A problem occurred when processing OpenGL commands

Preconditions Failed. - Joomla! Forum - community, help and support