新聞中心

這里有您想知道的互聯(lián)網(wǎng)營銷解決方案

用Python實現(xiàn)一款永久免費(fèi)的PDF編輯工具

前言：

為神木等地區(qū)用戶提供了全套網(wǎng)頁設(shè)計制作服務(wù)，及神木網(wǎng)站建設(shè)行業(yè)解決方案。主營業(yè)務(wù)為成都網(wǎng)站設(shè)計、網(wǎng)站制作、神木網(wǎng)站設(shè)計，以傳統(tǒng)方式定制建設(shè)網(wǎng)站，并提供域名空間備案等一條龍服務(wù)，秉承以專業(yè)、用心的態(tài)度為用戶提供真誠的服務(wù)。我們深信只要達(dá)到每一位用戶的要求，就會得到認(rèn)可，從而選擇與我們長期合作。這樣，我們也可以走得更遠(yuǎn)！

PDF（Portable Document Format），中文名稱便攜文檔格式是我們經(jīng)常會接觸到的一種文件格式，文獻(xiàn)、文檔...很多都是PDF格式。它以格式穩(wěn)定的優(yōu)勢，使得我們在打印、分享、傳輸過程中能夠最優(yōu)的保持原有色彩和格式。

[[338709]]永久免費(fèi)的PDF編輯工具">

但是在可編輯性方面卻為使用者引入了另外一個困擾。

曾經(jīng)，為了替換PDF中的一頁，我?guī)缀踉嚤榱怂惺忻嫔现髁鞯腜DF工具，最終還是不得不選擇使用付費(fèi)工具來解決問題。

事后想了想，既然這些商業(yè)化軟件不靠譜，為什么不考慮自己動手開發(fā)一款工具呢？明明幾十行代碼能夠解決的問題，為什么要費(fèi)那么多勁去下載、安裝那些沒有節(jié)操的軟件呢？

本文就來介紹一下利用Python輕松開發(fā)一款PDF編輯工具，可以用于PDF轉(zhuǎn)TxT、分割、合并、剪切、轉(zhuǎn)換。

有請主角登場 PyPDF2 和 pdfminer3k

PyPDF2

簡介：由純 Python 構(gòu)建的PDF 工具包。它能夠：

提取文檔信息（標(biāo)題、作者等）
一頁拆分文檔
按頁合并文檔
裁剪頁面
將多個頁面合并到單個頁面中
加密和解密 PDF 文件

安裝

直接使用pip安裝

  
 
   
  pip install PyPDF2

代碼操作

簡單的讀寫PDF操作

  
 
   
  from PyPDF2 import PdfFileReader, PdfFileWriter 
   
  infn = 'infn.pdf' 
   
  outfn = 'outfn.pdf' 
   
  # 獲取一個 PdfFileReader 對象 
   
  pdf_input = PdfFileReader(open(infn, 'rb')) 
   
  # 獲取PDF 的基本信息 
   
  information =pdf_input.getDocumentInfo() 
   
  print(information) 
   
  # 獲取 PDF 的頁數(shù) 
   
  page_count = pdf_input.getNumPages() 
   
  print(page_count) 
   
  # 返回一個 PageObject 
   
  page = pdf_input.getPage(i) 
   
   
   
  # 獲取一個 PdfFileWriter 對象 
   
  pdf_output = PdfFileWriter() 
   
  # 將一個 PageObject 加入到 PdfFileWriter 中 
   
  pdf_output.addPage(page) 
   
  # 輸出到文件中 
   
  pdf_output.write(open(outfn, 'wb'))

刪除PDF頁

  
 
   
  from PyPDF2 import PdfFileWriter,  PdfFileReader 
   
   
   
  # 實例化一個輸出的PDF實例 
   
  output = PdfFileWriter() 
   
  #  讀取一個PDF文件 
   
  input1 = PdfFileReader(open("example.pdf", "rb"))  
   
   
   
  # 要刪除的操作 
   
  def delete_pdf(index): 
   
              pages = input1.getNumPages()  
   
  # 循環(huán)刪除 
   
       for i in range(pages): 
   
        if i+1 in index: 
   
         continue 
   
        output.addPage(input1.getPage(i))  
   
   
   
       outputStream = open("PyPDF2-output.pdf", "wb") 
   
       output.write(outputStream)   
   
   
   
  delete_pdf([2,3,4])

合并PDF

  
 
   
  from PyPDF2 import PdfFileWriter, PdfFileReader 
   
   
   
  output = PdfFileWriter() 
   
  input1 = PdfFileReader(open("example.pdf", "rb")) 
   
  input2 = PdfFileReader(open("simple2.pdf", "rb")) // 1 
   
   
   
  def merge_pdf(add_index, origin_index): 
   
           pages = input1.getNumPages() 
   
           k = 0 
   
           for i in range(pages): 
   
            if i+1 in add_index: 
   
                 output.addPage(input2.getPage(origin_index[k])) // 2 
   
                 pages += 1 
   
                 k += 1 
   
                output.addPage(input1.getPage(i)) 
   
   
   
           outputStream = open("PyPDF2-output.pdf", "wb") 
   
           output.write(outputStream) 
   
   
   
  merge_pdf([2,3,4], [0, 0, 0])

旋轉(zhuǎn)

  
 
   
  # 旋轉(zhuǎn)90度 
   
  input1.getPage(1).rotateClockwise(90)

添加水印

  
 
   
  page = input1.getPage(3) 
   
  watermark = PdfFileReader(open("watermark.pdf", "rb")) 
   
  page.mergePage(watermark.getPage(0))

加密

  
 
   
  password = "secret" 
   
  output.encrypt(password)

解密

  
 
   
  print(output.decrypt('secret'))# secret==正確口令顯示1，其他顯示0 
   
  page_obj= output.getPage(0)# 這樣才能正確讀取 
   
  print(page_obj.extractText())

pdfminer3k

簡介

pdfminer3k 是一個 Python 3 端口的 pdfminer 。PDFMiner 是一個從 PDF 文檔中提取信息的工具。與其他與 PDF 相關(guān)的工具不同，它完全側(cè)重于獲取和分析文本數(shù)據(jù)。PDFMiner 允許獲取頁面中文本的確切位置，以及其他信息，如字體或線條。它包括一個 PDF 轉(zhuǎn)換器，可以將 PDF 文件轉(zhuǎn)換為其他文本格式（如 HTML）。它有一個可擴(kuò)展的PDF解析器，可用于其他目的，而不是文本分析

- 能夠準(zhǔn)確獲取文本的位置和布局信息；
- 可以將PDF轉(zhuǎn)換為HTML/XML等格式；
- 可以提取目錄；
- 可以提取標(biāo)簽內(nèi)容；
- 支持各種字體類型（Type1、TrueType、Type3和CID）;
- 支持中、日、韓語言和垂直書寫文本;
安裝

  
 
   
  pip install pdfminer3k

文件的操作

  
 
   
  from urllib.request import urlopen 
   
   
   
  from pdfminer.converter import PDFPageAggregator 
   
  from pdfminer.layout import LAParams 
   
  from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter 
   
  from pdfminer.pdfparser import PDFParser, PDFDocument 
   
   
   
  logging.Logger.propagate = False 
   
  logging.getLogger().setLevel(logging.ERROR) 
   
   
   
  fp = open('template/pdftest.pdf', 'rb') 
   
  # 在線 
   
  # fp = urlopen('http://---/---.pdf') 
   
   
   
  # 創(chuàng)建一個與文檔關(guān)聯(lián)的解析器 
   
  parser = PDFParser(fp) 
   
   
   
  # PDF文檔對象 
   
  doc = PDFDocument() 
   
   
   
  #創(chuàng)建pdf文檔對象，存儲文檔結(jié)構(gòu) 
   
  document = PDFDocument(parser, password) 
   
   
   
  # 鏈接解析器和文檔對象 
   
  parser.set_document(doc) 
   
  doc.set_parser(parser) 
   
   
   
  # 初始化文檔 
   
  doc.initialize("") 
   
   
   
  # 創(chuàng)建DPF資源管理器 
   
  resource = PDFResourceManager() 
   
   
   
  # 參數(shù)分析器 
   
  laparam = LAParams() 
   
   
   
  # 聚合器 
   
  device = PDFPageAggregator(resource, laparams=laparam) 
   
   
   
  # 創(chuàng)建頁面解析器 
   
  interpreter = PDFPageInterpreter(resource, device) 
   
   
   
  # 使用文檔對象從pdf中讀取內(nèi)容 
   
  for page in doc.get_pages(): 
   
      # 使用頁面解析器 
   
      interpreter.process_page(page) 
   
   
   
      # 使用聚合器獲取內(nèi)容 
   
      layout = device.get_result() 
   
   
   
      for text_obj in layout: 
   
          # 判斷是否有g(shù)et_text屬性 
   
          if hasattr(text_obj, 'get_text'): 
   
              print(text_obj.get_text())

  
 
   
  # 處理包含在文檔中的每一頁 
   
  for page in PDFPage.create_pages(document): 
   
            interpreter.process_page(page) 
   
            layout = device.get_result() 
   
            for x in layout: 
   
                # 獲取文本對象 
   
                if isinstance(x, LTTextBox): 
   
                    print(x.get_text().strip()) 
   
                # 獲取圖片對象 
   
                if isinstance(x,LTImage): 
   
                    print('這里獲取到一張圖片') 
   
                # 獲取 figure 對象 
   
                if isinstance(x,LTFigure): 
   
                    print('這里獲取到一個 figure 對象')

詳細(xì)的操作可參考官網(wǎng)：https://github.com/canserhat77/pdfminer3k

總結(jié)

通過上述2款Python庫，就可以實現(xiàn)從頁面到文本元數(shù)據(jù)的編輯，本文只是簡單的介紹了每項的基本用法。關(guān)于詳細(xì)的用法和函數(shù)列表，可以閱讀官方文檔，或者閱讀GitHub上項目源碼進(jìn)行了解。

網(wǎng)頁題目：用Python實現(xiàn)一款永久免費(fèi)的PDF編輯工具
路徑分享：http://www.5511xx.com/article/dhggcep.html

日韩无码专区无码一级三级片|91人人爱网站中日韩无码电影|厨房大战丰满熟妇|AV高清无码在线免费观看|另类AV日韩少妇熟女|中文日本大黄一级黄色片|色情在线视频免费|亚洲成人特黄a片|黄片wwwav色图欧美|欧亚乱色一区二区三区

新聞中心

PyPDF2

安裝

代碼操作

pdfminer3k

總結(jié)

其他資訊