使用python操作pdf文件
00 安裝擴展庫
pip install pypdf2
01 提取文本
import PyPDF2
pdfobj1=open('D:\PDF_Samples\數學之美2.pdf','rb')
pdfobj2=open('D:\PDF_Samples\p19.pdf','rb')
pdffile1=PyPDF2.PdfFileReader(pdfobj1)
pdffile2=PyPDF2.PdfFileReader(pdfobj2)
print(pdffile1.numPages,pdffile2.numPages)
345 19 #查看兩個PDF文件的總頁數
pdffile1.getPage(5).extractText()
Out[12]: '' #提取PDF文件指定頁的文本內容,目前對中文字難以識別。
pdffile2.getPage(5).extractText() #成功提取出PDF的文本
Out[13]: "National Hydro Network, Data Model \n\n \nEdition 1.0\n \n2004\n-\n06\n \nGeoBase?\n \n6\n \nMandatory\n \nOptional\n \nQuantity of phenomenon\n \nNumber of characteristics\n \n \n \nOptional\n \n1\n \nOverview\n \nThe data model can (and must) extend beyond the smallest common denominator obtained with the \npartners. The model must therefore

02 旋轉頁面
原來PDF的頁面:

使用python將其旋轉90度并另存為pp11.pdf:
import PyPDF2
pdfobj=open('D:\PDF_Samples\p1.pdf','rb')
pdffile=PyPDF2.PdfFileReader(pdfobj)
pdfpage=pdffile.getPage(0)
pdfpage.rotateClockwise(90) #旋轉90度
pdffile2=PyPDF2.PdfFileWriter()
pdffile2.addPage(pdfpage)
pdfobj2=open('d:\PDF_Samples\pp11.pdf','wb')
pdffile2.write(pdfobj2)
pdfobj1.close()
pdfobj2.close()

03 合并PDF文件
將兩個PDF文件組合成一個PDF文件,
將jpeg.pdf和p19.pdf合并為glue.pdf:
import PyPDF2
pdfobj1=open('D:\PDF_Samples\jpeg.pdf','rb')
pdfobj2=open('D:\PDF_Samples\p19.pdf','rb')
pdffile1=PyPDF2.PdfFileReader(pdfobj1)
pdffile2=PyPDF2.PdfFileReader(pdfobj2)
pdffile3=PyPDF2.PdfFileWriter()
for i in range(pdffile1.numPages):
pdfpage=pdffile1.getPage(i)
pdffile3.addPage(pdfpage)
for j in range(pdffile2.numPages):
pdfpage=pdffile2.getPage(j)
pdffile3.addPage(pdfpage)
pdfobj3=open('D:\PDF_Samples\glue.pdf','wb')
pdffile3.write(pdfobj3)
pdfobj1.close()
pdfobj2.close()
pdfobj3.close()
04 疊加頁面
將一個頁面的內容作為水印疊加在另一個文件的首頁上
import PyPDF2
pdfobj1=open('D:\PDF_Samples\p6.pdf','rb')
pdffile1=PyPDF2.PdfFileReader(pdfobj1)
pdfpage1=pdffile1.getPage(0)
pdfobj2=open('D:\PDF_Samples\watermark.pdf','rb')
pdffile2=PyPDF2.PdfFileReader(pdfobj2)
pdfpage2=pdffile2.getPage(0)
pdfpage1.mergePage(pdfpage2) #疊加頁面
pdffile3=PyPDF2.PdfFileWriter()
pdffile3.addPage(pdfpage1)
for i in range(1,pdffile1.numPages):
pdffile3.addPage(pdffile1.getPage(i))
pdfobj3=open('d:\PDF_Samples\merge.pdf','wb')
pdffile3.write(pdfobj3)
pdfobj1.close()
pdfobj2.close()
pdfobj3.close()

05 加密PDF文件
在寫入文件之前,進行加密設置,密碼為leslie:


06 解密PDF文件
常規方法打開加密PDF文件:
import PyPDF2
pdfobj=open('D:\PDF_Samples\leslie.pdf','rb')
pdffile1=PyPDF2.PdfFileReader(pdfobj)
pdffile1.getPage(0)
會出現錯誤提示:
解密方法:
import PyPDF2
pdfobj=open('D:\PDF_Samples\leslie.pdf','rb')
pdffile1=PyPDF2.PdfFileReader(pdfobj)
pdffile1.decrypt('leslie') #輸入密碼
Out[35]: 1 #返回1表示密碼正確
07 加密批處理
import PyPDF2,os
pdffiles=[]
for filename in os.listdir('D:\PDF_Samples\.'):
if filename.endswith('.pdf'):
pdffiles.append(filename)
os.chdir('D:\PDF_Samples')
for pdfname in pdffiles:
pdfobj=open(pdfname,'rb')
pdffile1=PyPDF2.PdfFileReader(pdfobj)
pdffile2=PyPDF2.PdfFileWriter()
for i in range(pdffile1.numPages):
page=pdffile1.getPage(i)
pdffile2.addPage(page)
pdffile2.encrypt('leslie')
pdfobj2=open(pdfname+'_encrypted.pdf','wb')
pdffile2.write(pdfobj2)
pdfobj2.close()
pdfobj.close()

工程師必備
- 項目客服
- 培訓客服
- 平臺客服
TOP




















