Thread: searching within a pdf from the command line
i want search through large number of pdf files specific phrase. command line, nothing. specifically, use:
find ./ -maxdepth 3 -name *.pdf |xargs grep phrasetosearchfor
if open pdfs in text editor, scrambled (binary) , can't search meaningful.
if open pdf in documentviewer, or acrobat, can search text , find no problems.
know how search internal text without opening pdf? otherwise can't batch it.
didn't make pdfs, , can't change how originals formatted, alternatively settle program converts them searchable format.
under both linux , windows, can use acrobat reader, has command search multiple files.
under linux, there recoll ("recoll - personal full text search package qt gui"), builds index of pdf files on first run. install it, run: " sudo apt-get install recoll " - or find in software center.
can convert files text using "pdftotext" utility directly. iit's available via apt-get or software center. use grep on text files.
more details: command line tool search phrases in large number of pdf filescode:find -name '*.pdf' -exec pdftotext {} \; grep -r --include '*.txt' -l -f "exact phrase search" grep -r --include '*.txt' -l -e "regular expression search"
Forum The Ubuntu Forum Community Ubuntu Official Flavours Support General Help [all variants] searching within a pdf from the command line
Ubuntu
Comments
Post a Comment