Hi new,
Have a look at Loop, FilePattern in the AutoHotkey Help Documentation. You could use the Loop command to find all *.pdf files and use the Run or RunWait command to automate processing the files one by one.

Pdftk - the pdf toolkit [CMD]
Started by
BoBo
, Nov 29 2004 06:31 PM
32 replies to this topic
#16
-
Posted 13 August 2007 - 09:07 PM
![Pdftk - the pdf toolkit [CMD]: post #16](http://autohotkey.com/board/public/style_images/ortem/icon_share.png)
Hi there,
Further to new's problem and corrupts answer i was just wondering whether the answer had worked for new?
As i have the same problem and was just wondering whether it wos worth trying it out or not.
Further to new's problem and corrupts answer i was just wondering whether the answer had worked for new?
As i have the same problem and was just wondering whether it wos worth trying it out or not.
#17
-
Posted 20 May 2008 - 01:32 PM
![Pdftk - the pdf toolkit [CMD]: post #17](http://autohotkey.com/board/public/style_images/ortem/icon_share.png)
| - PinkBears
I was looking for a tool to split bigger pdf documents into smaller chunks like e.g. 50 pages each and this might help. Great, thanks for the link.
(For the interested: Some applications don't accept bigger .pdf documents as input, so the solution: make smaller chunks and convert each separately overcoming the tools size limitations.)
(For the interested: Some applications don't accept bigger .pdf documents as input, so the solution: make smaller chunks and convert each separately overcoming the tools size limitations.)
#19
-
Posted 09 July 2008 - 12:11 PM
![Pdftk - the pdf toolkit [CMD]: post #19](http://autohotkey.com/board/public/style_images/ortem/icon_share.png)
pdf to html
in pdftk,
is there an option to convert pdf to html ?
the pdf to text I found in xpdf lib, and it work ok.
if not ,is there other command line tool to do that ?
in pdftk,
is there an option to convert pdf to html ?
the pdf to text I found in xpdf lib, and it work ok.
if not ,is there other command line tool to do that ?
#20
-
Posted 13 April 2009 - 08:11 AM
![Pdftk - the pdf toolkit [CMD]: post #20](http://autohotkey.com/board/public/style_images/ortem/icon_share.png)
#21
-
Posted 13 April 2009 - 09:00 AM
![Pdftk - the pdf toolkit [CMD]: post #21](http://autohotkey.com/board/public/style_images/ortem/icon_share.png)
Hi HugoV
I tried to run the pdftohtml but got an error:
Page-1
'gswin32c' is not recognized as an internal or external command,
operable program or batch file.
Error: Failed to launch Ghostscript!
seems, the pdftohtml.exe is not enough, or some other missing software.
do you know on some other pdftohtml command line ?
I tried to run the pdftohtml but got an error:
Page-1
'gswin32c' is not recognized as an internal or external command,
operable program or batch file.
Error: Failed to launch Ghostscript!
seems, the pdftohtml.exe is not enough, or some other missing software.
do you know on some other pdftohtml command line ?
#22
-
Posted 13 April 2009 - 04:04 PM
![Pdftk - the pdf toolkit [CMD]: post #22](http://autohotkey.com/board/public/style_images/ortem/icon_share.png)
Ghostscript
http://pages.cs.wisc.edu/~ghost/
(I've used it and it works, but you may have to work at it)
If I recall correctly a "pdftohtml" is also included in the google desktop search application (at least it was at some point, don't know if this is still the case as I don't use it) if you have it look for pdf*.exe in the google desktop dirs, it should be there somewhere.
Note: if you want to work with pdfs:
- get pdtfk
- get xpdf
- get pdttohtml
- get ghostscript
- get PDFCreator
http://pages.cs.wisc.edu/~ghost/
(I've used it and it works, but you may have to work at it)
If I recall correctly a "pdftohtml" is also included in the google desktop search application (at least it was at some point, don't know if this is still the case as I don't use it) if you have it look for pdf*.exe in the google desktop dirs, it should be there somewhere.
Note: if you want to work with pdfs:
- get pdtfk
- get xpdf
- get pdttohtml
- get ghostscript
- get PDFCreator
#23
-
Posted 13 April 2009 - 07:48 PM
![Pdftk - the pdf toolkit [CMD]: post #23](http://autohotkey.com/board/public/style_images/ortem/icon_share.png)
PDF creator
the pdf creator convert html to PDF , as tagged or untagged PDF ?
or it converts to PDF as image ?
meas:
I can extract text from the created PDF ?
the pdf creator convert html to PDF , as tagged or untagged PDF ?
or it converts to PDF as image ?
meas:
I can extract text from the created PDF ?
#24
-
Posted 17 April 2009 - 04:04 AM
![Pdftk - the pdf toolkit [CMD]: post #24](http://autohotkey.com/board/public/style_images/ortem/icon_share.png)
PDFCreator converts anything you print to PDF, yes you can extract text later IF the source wasn't an image to begin with. Not sure what you mean
by tagged but it won't make URLs in Word documents clickable in the PDF
nor does it create PDF bookmarks or anything like that.
by tagged but it won't make URLs in Word documents clickable in the PDF
nor does it create PDF bookmarks or anything like that.
#25
-
Posted 17 April 2009 - 06:31 AM
![Pdftk - the pdf toolkit [CMD]: post #25](http://autohotkey.com/board/public/style_images/ortem/icon_share.png)
sorry for asking again,
I still didn't found the pdf2html converter,
from google, is there some google doc's api
so I can download it and make own pdf's to htmls ?
I still didn't found the pdf2html converter,
from google, is there some google doc's api
so I can download it and make own pdf's to htmls ?
#26
-
Posted 17 April 2009 - 12:46 PM
![Pdftk - the pdf toolkit [CMD]: post #26](http://autohotkey.com/board/public/style_images/ortem/icon_share.png)
If you have Google desktop installed:
c:\Program Files\Google\Google Desktop Search\pdftotext.exe
(or where ever you have installed GDS)
usage:
pdftotext -htmlmeta sample.pdf
--> will generate sample.html
follow the link on the URL I gave you before, leads to:
http://sourceforge.n...ects/pdftohtml/
download the windows binary, unpack the tar.gz file
usage:
pdftohtml.exe sample.pdf
--> will generate 3 html files (frameset, TOC and content)
read the doc for more options
c:\Program Files\Google\Google Desktop Search\pdftotext.exe
(or where ever you have installed GDS)
usage:
pdftotext -htmlmeta sample.pdf
--> will generate sample.html
follow the link on the URL I gave you before, leads to:
http://sourceforge.n...ects/pdftohtml/
download the windows binary, unpack the tar.gz file
usage:
pdftohtml.exe sample.pdf
--> will generate 3 html files (frameset, TOC and content)
read the doc for more options
#27
-
Posted 17 April 2009 - 01:02 PM
![Pdftk - the pdf toolkit [CMD]: post #27](http://autohotkey.com/board/public/style_images/ortem/icon_share.png)
I tried the (in xpdf lib):
pdftotext -htmlmeta sample.pdf -> sample.html
and got same result as :
pdftotext -layout sample.pdf -> sample.txt
means :
the html have no the same 'look' (more or less) as the original sample.pdf
no frames or some colors
pdftotext -htmlmeta sample.pdf -> sample.html
and got same result as :
pdftotext -layout sample.pdf -> sample.txt
means :
the html have no the same 'look' (more or less) as the original sample.pdf
no frames or some colors
#28
-
Posted 17 April 2009 - 01:45 PM
![Pdftk - the pdf toolkit [CMD]: post #28](http://autohotkey.com/board/public/style_images/ortem/icon_share.png)
I tried also the pdftohtml.exe
and got worser reuslts,
the text extracted is shown one under another, without keeping any original layout , of .pdf
and got worser reuslts,
the text extracted is shown one under another, without keeping any original layout , of .pdf
#29
-
Posted 17 April 2009 - 01:53 PM
![Pdftk - the pdf toolkit [CMD]: post #29](http://autohotkey.com/board/public/style_images/ortem/icon_share.png)
What do you want, the HTML to look the same as the PDF?
Again read the documentation, see the options, try them and
SEE that you can have the HTML look like the PDF, or not
if you wish. Use the -c option
pdftohtml -c sample.pdf
-> sample.html will look like sample.pdf (not 100% but pretty close)
unless you have a very complicated PDF. Again READ the documentation
As you can see even google uses it so why isn't it good enough for you :wink:
IF you need even better or more options you will have to buy something
Sourceforge version:
pdftohtml version 0.39 http://pdftohtml.sourceforge.net/,
based on Xpdf version 3.00
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2004 Glyph & Cog, LLC
Usage: pdftohtml [options] [ ]
-f : first page to convert
-l : last page to convert
-q : don't print any messages or errors
-h : print usage information
-help : print usage information
-p : exchange .pdf links by .html
-c : generate complex document
-i : ignore images
-noframes : generate no frames
-stdout : use standard output
-zoom : zoom the pdf document (default 1.5)
-xml : output for XML post-processing
-hidden : output hidden text
-nomerge : do not merge paragraphs
-enc : output text encoding name
-dev : output device name for Ghostscript (png16m, jpeg etc)
-v : print copyright and version info
-opw : owner password (for encrypted files)
-upw : user password (for encrypted files)
GOOGLE version:
pdftohtml version 0.39 http://pdftohtml.sourceforge.net/,
based on Xpdf version 3.00
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2004 Glyph & Cog, LLC
Usage: pdftohtml [options] [ ]
-f : first page to convert
-l : last page to convert
-q : don't print any messages or errors
-h : print usage information
-help : print usage information
-p : exchange .pdf links by .html
-c : generate complex document
-i : ignore images
-noframes : generate no frames
-stdout : use standard output
-zoom : zoom the pdf document (default 1.5)
-xml : output for XML post-processing
-hidden : output hidden text
-nomerge : do not merge paragraphs
-enc : output text encoding name
-dev : output device name for Ghostscript (png16m, jpeg etc)
-v : print copyright and version info
-opw : owner password (for encrypted files)
-upw : user password (for encrypted files)
Again read the documentation, see the options, try them and
SEE that you can have the HTML look like the PDF, or not
if you wish. Use the -c option
pdftohtml -c sample.pdf
-> sample.html will look like sample.pdf (not 100% but pretty close)
unless you have a very complicated PDF. Again READ the documentation
As you can see even google uses it so why isn't it good enough for you :wink:
IF you need even better or more options you will have to buy something
Sourceforge version:
pdftohtml version 0.39 http://pdftohtml.sourceforge.net/,
based on Xpdf version 3.00
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2004 Glyph & Cog, LLC
Usage: pdftohtml [options]
-f
-l
-q : don't print any messages or errors
-h : print usage information
-help : print usage information
-p : exchange .pdf links by .html
-c : generate complex document
-i : ignore images
-noframes : generate no frames
-stdout : use standard output
-zoom
-xml : output for XML post-processing
-hidden : output hidden text
-nomerge : do not merge paragraphs
-enc
-dev
-v : print copyright and version info
-opw
-upw
GOOGLE version:
pdftohtml version 0.39 http://pdftohtml.sourceforge.net/,
based on Xpdf version 3.00
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2004 Glyph & Cog, LLC
Usage: pdftohtml [options]
-f
-l
-q : don't print any messages or errors
-h : print usage information
-help : print usage information
-p : exchange .pdf links by .html
-c : generate complex document
-i : ignore images
-noframes : generate no frames
-stdout : use standard output
-zoom
-xml : output for XML post-processing
-hidden : output hidden text
-nomerge : do not merge paragraphs
-enc
-dev
-v : print copyright and version info
-opw
-upw
#30
-
Posted 17 April 2009 - 03:03 PM
![Pdftk - the pdf toolkit [CMD]: post #30](http://autohotkey.com/board/public/style_images/ortem/icon_share.png)