{"id":3699,"date":"2023-10-08T12:22:52","date_gmt":"2023-10-08T12:22:52","guid":{"rendered":"https:\/\/www.copahost.com\/blog\/?p=3699"},"modified":"2023-12-02T15:13:21","modified_gmt":"2023-12-02T15:13:21","slug":"python-extpdf","status":"publish","type":"post","link":"https:\/\/www.copahost.com\/blog\/python-extpdf\/","title":{"rendered":"Python ext:pdf \u2013 PDF extensions in Python"},"content":{"rendered":"\n<p>The PDF extensions libraries in Python (ext:pdf) allow you to work with PDF files. In this way, it allows you not only to read and write PDF files, but also to manipulate their contents, such as adding, removing and changing pages, form fields and metadata. Furthermore, the library also allows us to convert PDF files into other formats, such as images and texts.<\/p>\n\n\n\n<p>PDF files are one of the most popular file forms for documents, both for personal and professional use.&nbsp;And as developers, it is often necessary to work with these files in our applications.&nbsp;Python, fortunately, has some powerful and easy-to-use libraries for dealing with PDF files.<\/p>\n\n\n\n<p>In this article, we will explore how to use libraries to work with PDF files in Python.&nbsp;Let&#8217;s see how to install and import the library, how to create and manipulate PDF files, and how to use some of the library&#8217;s more advanced features.<\/p>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_69_1 ez-toc-wrap-center counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#Popular_PDF_extension_libraries_in_Python\" title=\"Popular PDF extension libraries in Python\">Popular PDF extension libraries in Python<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#Features_and_Benefits_of_PDF_Extensions_in_Python\" title=\"Features and Benefits of PDF Extensions in Python\">Features and Benefits of PDF Extensions in Python<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#1_Create_PDF_files\" title=\"1. Create PDF files\">1. Create PDF files<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#2_Reading_PDF_files\" title=\"2. Reading PDF files\">2. Reading PDF files<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#3_Editing_PDF_files\" title=\"3. Editing PDF files\">3. Editing PDF files<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#4_PDF_file_conversion\" title=\"4. PDF file conversion\">4. PDF file conversion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#5_Integration_with_other_technologies\" title=\"5. Integration with other technologies\">5. Integration with other technologies<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#How_to_create_a_PDF_file_from_a_data_model_using_PDF_extension_libraries_in_Python\" title=\"How to create a PDF file from a data model using PDF extension libraries in Python\">How to create a PDF file from a data model using PDF extension libraries in Python<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#Adding_metadata_to_a_PDF_file_in_Python\" title=\"Adding metadata to a PDF file in Python\">Adding metadata to a PDF file in Python<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#Adding_metadata_to_a_PDF_file_in_Python-2\" title=\"Adding metadata to a PDF file in Python\">Adding metadata to a PDF file in Python<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#Reading_information_about_PDF_file_in_python\" title=\"Reading information about PDF file in python\">Reading information about PDF file in python<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#Validating_PDF_data_in_Python\" title=\"Validating PDF data in Python\">Validating PDF data in Python<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#Examples_of_how_to_convert_PDF_files_in_Python\" title=\"Examples of how to convert PDF files in Python\">Examples of how to convert PDF files in Python<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#1_Converting_PDF_to_image\" title=\"1. Converting PDF to image\">1. Converting PDF to image<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#2_Converting_PDF_to_HTML\" title=\"2. Converting PDF to HTML\">2. Converting PDF to HTML<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#3_Converting_PDF_to_Excel\" title=\"3. Converting PDF to Excel\">3. Converting PDF to Excel<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#Working_with_attachments_in_a_PDF_file_in_Python\" title=\"Working with attachments in a PDF file in Python\">Working with attachments in a PDF file in Python<\/a><\/li><\/ul><\/nav><\/div>\n\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Popular_PDF_extension_libraries_in_Python\"><\/span>Popular PDF extension libraries in Python<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>There are several popular PDF extension libraries for Python, each with its own functionality and application in different contexts.&nbsp;Here are some of the most popular libraries, remembering that we can use other libraries as we will see later:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>PyPDF2<\/strong>&nbsp;: It is a lightweight and easy-to-use library for manipulating PDFs in Python.&nbsp;Therefore, it provides functions for reading and writing PDFs, as well as adding, removing and manipulating pages.<\/li>\n\n\n\n<li><strong>pdfminer<\/strong>&nbsp;: It is a PDF processing library that allows you to extract information from PDFs, such as text, layout, images and annotations.&nbsp;Also used to identify and separate different parts of a PDF, such as covers, pages and attachments.<\/li>\n\n\n\n<li><strong>pdfquery<\/strong>&nbsp;: It is a PDF query library that allows you to perform SQL queries on PDFs.&nbsp;Thus, we convert PDFs into a form&nbsp;<code>tabular<\/code>so that queries can be performed on top of them.<\/li>\n\n\n\n<li><strong>pdfkit<\/strong>&nbsp;: This is a PDF management library that allows you to create, read, manipulate and write PDFs in Python.&nbsp;In this sense, including functions to add, remove, convert PDFs into other formats, such as Images and manipulate pages, as well as to sign and protect PDFs.<\/li>\n\n\n\n<li><strong>reportlab<\/strong>&nbsp;: It is a report generation library that allows you to create complex PDFs from dynamic data.&nbsp;Therefore, it includes functions for creating tables, graphs, images and text, as well as supporting layout and style customization.<\/li>\n\n\n\n<li><strong>pstoedit<\/strong>&nbsp;: It is a PDF editing library, including functions to add, remove and manipulate pages, as well as to change text and images in PDFs.<\/li>\n\n\n\n<li><strong>pdf-reactor<\/strong>&nbsp;: It is a PDF processing library that allows you to manipulate PDFs in Python.&nbsp;Thus, including functions for adding, removing and manipulating pages, as well as extracting information from PDFs, such as text and annotations.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Features_and_Benefits_of_PDF_Extensions_in_Python\"><\/span>Features and Benefits of PDF Extensions in Python<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>PDF extensions are one of the most popular features of Python, a&nbsp;<a href=\"https:\/\/www.copahost.com\/blog\/what-is-python\/\">high-level, interpreted programming language<\/a>&nbsp;.&nbsp;Thus, these pdf extensions in python allow developers to easily create, trim, and edit PDF files as well as convert PDF files to other file formats.<\/p>\n\n\n\n<p>Here are some of the main features and benefits of PDF extensions in Python:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"1_Create_PDF_files\"><\/span>1. Create PDF files<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>PDF extensions in python allow developers to create PDF files from scratch.&nbsp;In this sense, we use a library&nbsp;&nbsp;<code>reportlab<\/code>, which is one of the main PDF generation libraries in Python.&nbsp;With this library, developers can create pages, add text and images, define layouts and styles, among other features.<\/p>\n\n\n\n<p>Example of how to create a PDF file using a library&nbsp;&nbsp;<code>reportlab<\/code>in Python:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import reportlab.lib.pagesizes as pagesizes\nfrom reportlab.pdfgen import canvas\n\n# Create a PageSize object\npage_size = pagesizes.letter()\n\n# Create a Canvas object\ncanvas = canvas.Canvas('example.pdf')\n\n# set page size\ncanvas.setPageSize(page_size)\n\n# add text to page\ntext = 'Hello, world!'\ncanvas.drawString(100, 750, text)\n\n# add an image to the page\nimage = 'example.jpg'\ncanvas.drawImage(image, (100, 500))\n\n# add a line to the page\ncanvas.drawLine(100, 250, 300, 250)\n\n# add a rectangle to the page\ncanvas.drawRect(100, 150, 300, 50)\n\n# close the PDF file\ncanvas.showPage()\ncanvas.save()<\/code><\/pre>\n\n\n\n<p>This example creates a PDF file called \u201cexample.pdf\u201d with a letter-sized page (21.59 cm x 27.94 cm), with text, an image, and a rectangle drawn on the page.<\/p>\n\n\n\n<p>Therefore, we use the object&nbsp;&nbsp;<code>PageSize<\/code>&nbsp;to set the page size, and we use the object&nbsp;&nbsp;<code>Canvas<\/code>to create the page and add elements to it, and we use the method&nbsp;&nbsp;<code>save<\/code>to save the PDF file.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"2_Reading_PDF_files\"><\/span>2. Reading PDF files<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>PDF extensions also allow developers to read and analyze existing PDF files.&nbsp;In this sense, we can be using a library&nbsp;&nbsp;<code>pyPDF2<\/code>, which is one of the main PDF reading libraries in Python.&nbsp;With this library, developers can access and manipulate the content of a PDF file, such as text, images, and metadata.<\/p>\n\n\n\n<p>Now let&#8217;s see an example of how to read a PDF file using a library&nbsp;&nbsp;<code>pyPDF2<\/code> in Python, opening a PDF file called \u201cexample.pdf\u201d and then reading the number of pages it has.&nbsp;It then reads the contents of the first page of the PDF file and prints it to standard output.&nbsp;See below:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pyPDF2\n\n# Open a PDF file\nwith open('example.pdf', 'rb') as f:\n    # Create a PDFFile object\n    pdf_file = pyPDF2.PDFFile(f)\n\n# Read the number of pages in the PDF file\npage_count = pdf_file.getNumPages()\nprint(f'Number of pages: {page_count}')\n\n# Read the content of the first page\npage_content = pdf_file.getPage(0).extractText()\nprint(page_content)<\/code><\/pre>\n\n\n\n<p>The object&nbsp;&nbsp;<code>PDFFile<\/code> is used to open the PDF file and access its pages and content.&nbsp;The method&nbsp;&nbsp;<code>getNumPages<\/code> is used to read the number of pages in the PDF file, and the method&nbsp;&nbsp;<code>getPage<\/code> is used to read the content of the first page.&nbsp;The method&nbsp;&nbsp;<code>extractText<\/code> is used to extract text from the page.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"3_Editing_PDF_files\"><\/span>3. Editing PDF files<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>In addition to creating and reading PDF files, python PDF extensions also allow developers to edit existing PDF files.&nbsp;Therefore, we use a library&nbsp;&nbsp;<code>pdftotext<\/code>, which allows developers to extract texts from PDF files and convert them to plain text formats, such as plain text format.<\/p>\n\n\n\n<p>In the example below, we will see how to edit a PDF file using a library&nbsp;&nbsp;<code>pyPDF2<\/code> in Python.&nbsp;First, open a PDF file called \u201cexample.pdf,\u201d add a new page to the end of the file, and add text and an image to the new page.&nbsp;Then it saved the edited PDF file as \u201cedited_example.pdf\u201d.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pyPDF2\n\n# Open a PDF file\nwith open('example.pdf', 'rb') as f:\n    # Create a PDFFile object\n    pdf_file = pyPDF2.PDFFile(f)\n\n# Add a new page to the PDF file\npdf_file.addPage(pyPDF2.Page(100, 100))\n\n# Add text to new page\ntext = 'This is a new page!'\npdf_file.getPage(1).drawString(50, 50, text)\n\n# Add an image to the new page\nimage = 'example.jpg'\npdf_file.getPage(1).drawImage(image, (100, 100))\n\n# Save the edited PDF file\npdf_file.save('edited_example.pdf')<\/code><\/pre>\n\n\n\n<p>The object&nbsp;&nbsp;<code>PDFFile<\/code> is used to open the PDF file and add a new page to the end of the file.&nbsp;The method&nbsp;&nbsp;<code><strong>addPage<\/strong><\/code> is used to add a new page, and the method&nbsp;&nbsp;<code><strong>drawString<\/strong><\/code> is used to add&nbsp;<a href=\"https:\/\/www.copahost.com\/blog\/string-python\/\">text<\/a>&nbsp;to the page.&nbsp;The method&nbsp;&nbsp;<code><strong>drawImage<\/strong><\/code> is used to add an image to the page.&nbsp;Finally, the method&nbsp;&nbsp;<code><strong>save<\/strong><\/code> is used to save the edited PDF file.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"4_PDF_file_conversion\"><\/span>4. PDF file conversion<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>PDF extensions also allow developers to convert PDF files to other file formats.&nbsp;Thus, we are using a library&nbsp;&nbsp;<code>pdf2image<\/code>, which allows developers to convert pages of a PDF file into raster images, such as JPEG or PNG.<\/p>\n\n\n\n<p>In this example of how to convert a PDF file to a text file using a library\u00a0\u00a0<code>pyPDF2<\/code> in <a href=\"https:\/\/www.copahost.com\/blog\/python-ide\/\">Python<\/a>, we open a PDF file called \u201cexample.pdf\u201d and extract the text from all pages of the PDF file.\u00a0It then saves the text to a text file called \u201cexample.txt\u201d.\u00a0Look:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pyPDF2\n\n# Open a PDF file\nwith open('example.pdf', 'rb') as f:\n    # Create a PDFFile object\n    pdf_file = pyPDF2.PDFFile(f)\n\n# Extract text from PDF file\ntext = ''\nfor page in pdf_file.pages:\n    text += page.extractText()\n\n# Save the text to a text file\nwith open('example.txt', 'w') as f:\n    f.write(text)<\/code><\/pre>\n\n\n\n<p>The object&nbsp;&nbsp;<code>PDFFile<\/code> is used to open the PDF file and access its pages.&nbsp;The method&nbsp;&nbsp;<code><strong>extractText<\/strong><\/code> is used to extract the text from each page of the PDF file.&nbsp;The text is saved to a variable and then saved to a text file using the&nbsp;&nbsp;<code><strong>write<\/strong><\/code> object&nbsp; method&nbsp;<code>open<\/code>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"5_Integration_with_other_technologies\"><\/span>5. Integration with other technologies<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>We also use PDF extensions integrated with other technologies, such as&nbsp;Django&nbsp;, a web development framework for Python, or&nbsp;Selenium&nbsp;, a test automation library for Python.&nbsp;In this way, this integration allows developers to create customized solutions for their specific needs.<\/p>\n\n\n\n<p>Suppose we want to create a document management system that allows users to upload PDF files, extract information from them, and store them in a database.&nbsp;To do this, we can use a library&nbsp;&nbsp;<code>pyPDF2<\/code> to handle the PDF files and a database, such as&nbsp;MySQL&nbsp;or&nbsp;MongoDB&nbsp;, to store the extracted information.<\/p>\n\n\n\n<p>Here is an example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pyPDF2\nimport mysql.connector\n\n# Create a connection to the database\ncnx = mysql.connector.connect(\n    user='user',\n    password='password',\n    host='localhost',\n    database='data_bank'\n)\n\n# Create a table in the database to store the extracted information\ncursor = cnx.cursor()\ncursor.execute('CREATE TABLE IF NOT EXISTS pdf_info (id INT PRIMARY KEY, name VARCHAR(255), data TEXT)')\n\n# Open a PDF file and extract sensitive information\nwith open('example.pdf', 'rb') as f:\n    pdf_file = pyPDF2.PDFFile(f)\n    name = pdf_file.getTitle()\n    data = pdf_file.getPage(0).extractText()\n\n# Save the extracted information in the database\ncursor.execute('INSERT INTO pdf_info (name, data) VALUES (%s, %s)', (name, data))\ncnx.commit()\n\n# Close the connection to the database\ncnx.close()<\/code><\/pre>\n\n\n\n<p>This example applies a library&nbsp;&nbsp;<code>mysql.connector<\/code>to connect to a MySQL database and create a table to store information extracted from PDF files.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"How_to_create_a_PDF_file_from_a_data_model_using_PDF_extension_libraries_in_Python\"><\/span>How to create a PDF file from a data model using PDF extension libraries in Python<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>To create a PDF file from a data model in Python, we use the PDFkit Python library.<\/p>\n\n\n\n<p>PDF kit is a Python library that allows you to create PDFs from HTML data, text, images and other file formats.&nbsp;Thus, providing a wide range of features to customize PDF content and appearance, including support for tables, images, lines, forms, annotations, and more.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Here is an example of how to create a PDF file from a data model in Python using PDFkit:<\/h4>\n\n\n\n<ol class=\"wp-block-list\">\n<li>First, we install a PDFkit library.&nbsp;Therefore, we do this as follows with the pip command:<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>pip install pdfkit\n<\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\" start=\"2\">\n<li>Next, we need to import the PDFkit library into Python code:<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>from pdfkit import PDFKit\n<\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\" start=\"3\">\n<li>Now, we create a PDFKit object and provide the data we want to include in the PDF.&nbsp;For example, when we have a data dictionary with information about a set of products, we can create a PDFKit and add this data to the PDF:<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>import pdfkit\n\n# Create a dictionary with information about products\nproducts = {\n    \"Product 1\": {\n        \"Name\": \"Product 1\",\n        \"Price\": 19.99,\n        \"Description\": \"This is product 1\"\n    },\n    \"Product 2\": {\n        \"Name\": \"Product 2\",\n        \"Price\": 29.99,\n        \"Description\": \"This is product 2\"\n    },\n    \"Product 3\": {\n        \"Name\": \"Product 3\",\n        \"Price\": 39.99,\n        \"Description\": \"This is product 3\"\n    }\n}\n\n# Create a PDFKit\npdf = pdfkit.PDFKit()\n\n# Add a page to PDF\npdf.add_page()\n\n# Add a table to the page\ntable = pdf.add_table(10, 10, 100, 100)\n\n# Add product information to the table\nfor product, information in products.items():\n    table.add_row()\n    table.add_cell(product)\n    table.add_cell(information&#091;\"Name\"])\n    table.add_cell(information&#091;\"Price\"])\n    table.add_cell(information&#091;\"Description\"])\n\n# Save the PDF\npdf.save(\"products.pdf\")<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Now, we can add more information to the PDF such as images, links, forms, annotations, etc.<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Adding images, we apply the&nbsp;&nbsp;<code>add_image()<\/code>object&nbsp; method&nbsp;<code>PDFKit<\/code>.&nbsp;For example:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>pdf.add_image(\"path\/to\/image.jpg\")\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>With links, we use the&nbsp;&nbsp;<code>add_link()<\/code>object&nbsp; method&nbsp;<code>PDFKit<\/code>.&nbsp;For example:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>pdf.add_link(\"http:\/\/www.example.com\", \"Link to website\")\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>For forms, we use the&nbsp;&nbsp;<code>add_form()<\/code>object&nbsp; method&nbsp;<code>PDFKit<\/code>.&nbsp;For example:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>pdf.add_form(fields=&#091;\n    {\"name\": \"Name\", \"type\": \"text\"},\n    {\"name\": \"Email\", \"type\": \"email\"},\n    {\"name\": \"Tel\u00e3o\", \"type\": \"number\"}\n])<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>To add annotations, we apply the&nbsp;&nbsp;<code>add_annotation()<\/code>object&nbsp; method&nbsp;<code>PDFKit<\/code>.&nbsp;For example:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>pdf.add_annotation(text=\"This is an annotation example\")<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>In the end, we save the PDF using the&nbsp;&nbsp;<code>save()<\/code> object&nbsp; method&nbsp;<code>PDFKit<\/code>.&nbsp;For example:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>pdf.save(\"file_name.pdf\")<\/code><\/pre>\n\n\n\n<p>This is the basic way to create a PDF using PDFKit in Python.&nbsp;In this sense, we recommend consulting the official PDFKit documentation to learn more about the functionalities and resources available.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Adding_metadata_to_a_PDF_file_in_Python\"><\/span>Adding metadata to a PDF file in Python<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Now let&#8217;s learn how to add metadata to PDF files in Python with different libraries.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Adding_metadata_to_a_PDF_file_in_Python-2\"><\/span>Adding metadata to a PDF file in Python<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>To add metadata to a PDF file in Python, we use the&nbsp;&nbsp;<code>PyPDF2<\/code>.&nbsp;This library allows you to read and write PDF files and also allows you to add, change and&nbsp;<a href=\"https:\/\/www.copahost.com\/blog\/trim-python\/\">remove<\/a>&nbsp;metadata.<\/p>\n\n\n\n<p>Here is an example of how to add metadata to a PDF file using&nbsp;&nbsp;<code>PyPDF2<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import PyPDF2\n\n# Open the PDF file\nwith open('arquivo.pdf', 'rb') as f:\n    # Create a PyPDF2.PdfFileReader object to read the PDF file\n    pdf_reader = PyPDF2.PdfFileReader(f)\n    \n    # Add metadata to PDF file\n    pdf_reader.addMetadata({\n        'title': 'My PDF file',\n        'author': 'Jo\u00e3o da Silva',\n        'creator': 'Python and PyPDF2',\n        'producer': 'My PDF Creator'\n    })\n    \n    # Save the PDF file with the added metadata\n    with open('arquivo-metadados.pdf', 'wb') as f:\n        pdf_reader.write(f)<\/code><\/pre>\n\n\n\n<p>In this example, we are using the&nbsp;&nbsp;<code><strong>addMetadata<\/strong><\/code>&nbsp;object&nbsp; method&nbsp;<code><strong>PdfFileReader<\/strong><\/code>&nbsp;to add four metadata to the PDF file: title, author, creator, and producer.&nbsp;So we can add more metadata as needed.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Reading_information_about_PDF_file_in_python\"><\/span>Reading information about PDF file in python<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Now let&#8217;s apply the&nbsp;&nbsp;<code><strong>get_info<\/strong><\/code>&nbsp;object&nbsp; method&nbsp;<code><strong>PDFDocument<\/strong><\/code>&nbsp;to read information about the PDF file, such as the title, author, creator and producer:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pdfminer\n\n# Open the PDF file\nwith open('arquivo.pdf', 'rb') as f:\n    # Create a pdfminer.PDFDocument object to read the PDF file\n    doc = pdfminer.PDFDocument(f)\n    \n    # Read information from PDF file\n    info = doc.get_info()\n    \n    # Print the information\n    print(info)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Validating_PDF_data_in_Python\"><\/span>Validating PDF data in Python<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Now we have another example of using the library&nbsp;&nbsp;<code>pydantic<\/code>, creating a data model and using the method&nbsp;&nbsp;<code><strong>validate<\/strong>()<\/code>&nbsp;to validate data from a PDF extensions file:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pydantic\n\n# Create a data model for the PDF file\nclass PdfFile(pydantic.BaseModel):\n    title: str\n    author: str\n    creator: str\n    producer: str\n\n# Create a pydantic.PDFFile object to read the PDF file\nwith open('arquivo.pdf', 'rb') as f:\n    pdf_file = PdfFile(f)\n\n# Validate PDF file data\nif pdf_file.validate():\n    print(\"The data in the PDF file is valid.\")\nelse:\n    print(\"The data in the PDF file is not valid.\")<\/code><\/pre>\n\n\n\n<p>In this example, we are creating a data model&nbsp;&nbsp;<code><strong>PdfFile<\/strong><\/code>&nbsp;with four fields:&nbsp;&nbsp;<code>title<\/code>,&nbsp;&nbsp;<code>author<\/code>,&nbsp;&nbsp;<code>creator<\/code>&nbsp;and&nbsp;&nbsp;<code>producer<\/code>.&nbsp;Next, we are creating an object&nbsp;&nbsp;<code><strong>PdfFile<\/strong><\/code>&nbsp;from the PDF file and using the method&nbsp;&nbsp;<code><strong>validate<\/strong>()<\/code>&nbsp;to validate the data from the PDF file.<\/p>\n\n\n\n<p>If the data in the PDF file is valid, the method &nbsp;&nbsp;<code><strong>validate<\/strong>()<\/code>&nbsp; returns&nbsp;&nbsp;<code>True<\/code>&nbsp;and prints the message \u201cThe data in the PDF file is valid.\u201d.&nbsp;Otherwise, the method&nbsp;&nbsp;<code><strong>validate<\/strong>()<\/code>&nbsp;will return&nbsp;&nbsp;<code>False<\/code>&nbsp;and print the message \u201cThe data in the PDF file is not valid.\u201d.<\/p>\n\n\n\n<p>This way, we can adapt this example for our own purposes by creating a custom data model for the PDF file and using the method&nbsp;&nbsp;<code><strong>validate<\/strong>()<\/code>&nbsp;to validate the data in the PDF file.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Examples_of_how_to_convert_PDF_files_in_Python\"><\/span>Examples of how to convert PDF files in Python<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>To convert PDF files to other formats using Python, we can&nbsp;<code>is using some libraries<\/code>.&nbsp;Thus, allowing you to read and write PDF files and convert them to other formats, such as Image, Text, HTML, among others.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"626\" height=\"626\" src=\"https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2023\/09\/image.png\" alt=\"convert PDF files in Python\" class=\"wp-image-3706\" style=\"width:382px;height:382px\" srcset=\"https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2023\/09\/image.png 626w, https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2023\/09\/image-300x300.png 300w, https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2023\/09\/image-150x150.png 150w, https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2023\/09\/image-50x50.png 50w\" sizes=\"(max-width: 626px) 100vw, 626px\" \/><\/figure>\n<\/div>\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"1_Converting_PDF_to_image\"><\/span>1. Converting PDF to image<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Here is an example of how to convert a PDF file to a PNG image file using&nbsp;&nbsp;<code>PyPDF2<\/code>.&nbsp;In this example, we are opening a file called PDF&nbsp;&nbsp;<code>arquivo.pdf<\/code>&nbsp;and selecting the first page (&nbsp;<code>page_number = 1<\/code>) to be converted to a PNG image.&nbsp;Next, we are using the method&nbsp;&nbsp;<code><strong>convertToImage<\/strong>()<\/code>&nbsp;to create the image and saving it to disk with the name&nbsp;&nbsp;<code>image.png<\/code>.&nbsp;Look:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import PyPDF2\n\n# Open the PDF file\nwith open('arquivo.pdf', 'rb') as f:\n    # Create a PyPDF2.PdfFileReader object for the PDF file\n    pdf = PyPDF2.PdfFileReader(f)\n    \n    # Enter the number of the page we want to convert\n    page_number = 1\n    \n    # Create a PNG image from the selected page\n    image = pdf.getPage(page_number).convertToImage()\n    \n    # Save the image to disk\n    with open('image.png', 'wb') as f:\n        f.write(image)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"2_Converting_PDF_to_HTML\"><\/span>2. Converting PDF to HTML<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>In addition, we convert PDF files into other formats, such as Text, HTML, among others, using the methods&nbsp;&nbsp;<code><strong>getPage().getText()<\/strong><\/code>&nbsp;to obtain the text from the page and&nbsp;&nbsp;to obtain the HTML code from the page, respectively.<strong>&nbsp;<code>getPage(). convertToHtml()<\/code><\/strong><\/p>\n\n\n\n<p>To convert to another format, it is necessary to install the necessary libraries, for example,&nbsp;&nbsp;<code><strong>reportlab<\/strong><\/code>&nbsp;to convert to HTML.&nbsp;Look:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from reportlab.lib.pagesizes import letter\nfrom reportlab.pdfgen import canvas\n\n# Open the PDF file\nwith open('arquivo.pdf', 'rb') as f:\n    # Create a PyPDF2.PdfFileReader object for the PDF file\n    pdf = PyPDF2.PdfFileReader(f)\n    \n    # Enter the number of the page we want to convert\n    page_number = 1\n    \n    # Create a reportlab.pdfgen.canvas object for the selected page\n    canvas = pdf.getPage(page_number).convertToCanvas()\n    \n    # Set page size\n    page_size = letter.A4\n    \n    # Create an empty HTML file\n    html = ''\n    \n    # Add the page's HTML code to the HTML file\n    html += canvas.get_html(page_size)\n    \n    # Save the HTML file to disk\n    with open('arquivo.html', 'w') as f:\n        f.write(html)<\/code><\/pre>\n\n\n\n<p>In this example, we are converting the first page of the PDF file to an HTML file.&nbsp;Next, we are using the method&nbsp;&nbsp;<code><strong>convertToCanvas()<\/strong><\/code>&nbsp;to create an object&nbsp;&nbsp;<code><strong>reportlab.pdfgen.canvas<\/strong><\/code>&nbsp;for the selected page and the method&nbsp;&nbsp;<code><strong>get_html()<\/strong><\/code>&nbsp;to get the page&#8217;s HTML code.&nbsp;Finally, we are saving the HTML file to disk with the name&nbsp;&nbsp;<code>arquivo.html<\/code>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"3_Converting_PDF_to_Excel\"><\/span>3. Converting PDF to Excel<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>To convert a PDF file to an Excel file using Python, we use the&nbsp;&nbsp;<code><a href=\"https:\/\/www.copahost.com\/blog\/pandas-python\/\">pandas<\/a><\/code>&nbsp; e&nbsp; library&nbsp;<code>openpyxl<\/code>.<\/p>\n\n\n\n<p>Here is an example of how to convert a PDF file to an Excel file using these extensions:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pandas as pd\nfrom openpyxl import load_workbook\n\n# Open the PDF file\nwith open('arquivo.pdf', 'rb') as f:\n    # Create a pandas.DataFrame object from the PDF file\n    df = pd.read_pdf(f)\n\n# Convert the DataFrame to an Excel file\nworkbook = load_workbook(filename='arquivo.xlsx')\nsheet = workbook.active\n\n# Copy the DataFrame cells to the Excel file\ndf.to_excel(sheet, index=False)\n\n# Save the Excel file to disk\nworkbook.save('file.xlsx')<\/code><\/pre>\n\n\n\n<p>In this example, we are opening a named PDF file&nbsp;&nbsp;<code>arquivo.pdf<\/code>&nbsp;and using the&nbsp;&nbsp;<code><strong>read_pdf()<\/strong><\/code>&nbsp;library&nbsp; method&nbsp;<code>pandas<\/code>&nbsp;to create an object&nbsp;&nbsp;<code>pandas.DataFrame<\/code>&nbsp;from the file contents.&nbsp;Next, we are converting this DataFrame to an Excel file using&nbsp;&nbsp;<code><strong>to_excel()<\/strong><\/code>&nbsp;the&nbsp;&nbsp;<code>openpyxl<\/code>.&nbsp;Finally, we are saving it to disk with the name&nbsp;&nbsp;<code>arquivo.xlsx<\/code>.<\/p>\n\n\n\n<p>The method&nbsp;&nbsp;<code><strong>read_pdf()<\/strong><\/code>&nbsp;accepts several options, such&nbsp;&nbsp;<code><strong>skip_rows<\/strong><\/code>, which can be used to customize the reading of the PDF file.&nbsp;Thus, The method&nbsp;&nbsp;<code><strong>to_excel()<\/strong><\/code>&nbsp;accepts several options, such as&nbsp;&nbsp;<code><strong>sheet_name<\/strong><\/code>&nbsp;and&nbsp;&nbsp;<code>index<\/code>, which are used to customize the writing of the Excel file.<\/p>\n\n\n\n<p>Therefore, it is important to remember that the quality of the conversion may vary depending on the content of the PDF file and the configuration of the reading and writing options.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Working_with_attachments_in_a_PDF_file_in_Python\"><\/span>Working with attachments in a PDF file in Python<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>To work with attachments in a PDF file <a href=\"https:\/\/www.copahost.com\/blog\/free-python-courses\/\">using Python<\/a> and add an attachment to a PDF file using&nbsp;&nbsp;<code>PyPDF2<\/code>, let&#8217;s follow the following steps:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Install a library&nbsp;&nbsp;<code>PyPDF2<\/code> using the command&nbsp;&nbsp;<code>pip install PyPDF2<\/code>.<\/li>\n\n\n\n<li>Import a library&nbsp;&nbsp;<code>PyPDF2<\/code> in to Python code.<\/li>\n\n\n\n<li>Open the PDF file using the&nbsp;&nbsp;<code><strong>PdfFileReader<\/strong><\/code> library&nbsp; function&nbsp;<code>PyPDF2<\/code>.<\/li>\n\n\n\n<li>Add the attachment using the&nbsp;&nbsp;<strong><code>addAttachment<\/code>object<\/strong>&nbsp;function&nbsp;&nbsp;<code>PdfFileReader<\/code>.<\/li>\n\n\n\n<li>Save the updated PDF file using the&nbsp;&nbsp;<code><strong>write<\/strong><\/code> object&nbsp; function&nbsp;<code>PdfFileReader<\/code>.<\/li>\n<\/ol>\n\n\n\n<p>Here is a code example that adds an attachment to a PDF file using&nbsp;&nbsp;<code>PyPDF2<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import PyPDF2\n\n# Open the PDF file\nwith open('document.pdf', 'rb') as f:\n    pdf = PyPDF2.PdfFileReader(f)\n\n# Add the attachment\npdf.addAttachment('path\/to\/attachment.txt', 'text\/plain')\n\n# Save the updated PDF file\nwith open('document_with_attachment.pdf', 'wb') as f:\n    pdf.write(f)<\/code><\/pre>\n\n\n\n<p>This code opens the PDF file&nbsp;&nbsp;<code>document.pdf<\/code>, adds an attachment named&nbsp;&nbsp;<code>attachment.txt<\/code>with the content type&nbsp;&nbsp;<code>text\/plain<\/code>, and saves the updated PDF file as&nbsp;&nbsp;<code>document_with_attachment.pdf<\/code>.<\/p>\n\n\n\n<p>In this sense, we use the function&nbsp;&nbsp;<code><strong>addAttachment<\/strong><\/code> to add attachments in other formats, such as images, audios and videos.<\/p>\n\n\n\n<p>To read an attachment from a PDF file using&nbsp;&nbsp;<code>PyPDF2<\/code>, we use the&nbsp;&nbsp;<strong><code>getAttachment<\/code>object<\/strong>&nbsp;function&nbsp;&nbsp;<code><strong>PdfFileReader<\/strong><\/code>.&nbsp;Thus, this function returns a tuple containing the attachment name and the attachment content.<\/p>\n\n\n\n<p>Here is an example of code that reads an attachment from a PDF file using&nbsp;&nbsp;<code>PyPDF2<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import PyPDF2\n\n# Open the PDF file\nwith open('document_with_attachment.pdf', 'rb') as f:\n    pdf = PyPDF2.PdfFileReader(f)\n\n# Read the attachment\nattachment = pdf.getAttachment('attachment.txt')\n\n# Print attachment content\nprint(attachment&#091;1])<\/code><\/pre>\n\n\n\n<p>So, this code opens the PDF file&nbsp;&nbsp;<code>document_with_attachment.pdf<\/code>, searches for the called attachment&nbsp;&nbsp;<code>attachment.txt<\/code>and prints the contents of the attachment.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The PDF extensions libraries in Python (ext:pdf) allow you to work with PDF files. In this way, it allows you not only to read and write PDF files, but also to manipulate their contents, such as adding, removing and changing pages, form fields and metadata. Furthermore, the library also allows us to convert PDF files [&hellip;]<\/p>\n","protected":false},"author":17,"featured_media":3709,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[174],"tags":[],"class_list":["post-3699","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.8 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Python ext:pdf \u2013 PDF extensions in Python - Copahost<\/title>\n<meta name=\"description\" content=\"PDF Extensions in Python: Manipulate, manage and convert PDF files with ease. Create, add notes, forms and more!\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Python ext:pdf \u2013 PDF extensions in Python - Copahost\" \/>\n<meta property=\"og:description\" content=\"PDF Extensions in Python: Manipulate, manage and convert PDF files with ease. Create, add notes, forms and more!\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.copahost.com\/blog\/python-extpdf\/\" \/>\n<meta property=\"og:site_name\" content=\"Copahost\" \/>\n<meta property=\"article:published_time\" content=\"2023-10-08T12:22:52+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-12-02T15:13:21+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2023\/09\/pdf-python.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1075\" \/>\n\t<meta property=\"og:image:height\" content=\"717\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Schenia T\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Schenia T\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.copahost.com\/blog\/python-extpdf\/\"},\"author\":{\"name\":\"Schenia T\",\"@id\":\"https:\/\/www.copahost.com\/blog\/#\/schema\/person\/2efb96f9dfaf6162f347abcd06b1429f\"},\"headline\":\"Python ext:pdf \u2013 PDF extensions in Python\",\"datePublished\":\"2023-10-08T12:22:52+00:00\",\"dateModified\":\"2023-12-02T15:13:21+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.copahost.com\/blog\/python-extpdf\/\"},\"wordCount\":2645,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.copahost.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2023\/09\/pdf-python.png\",\"articleSection\":[\"Python\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.copahost.com\/blog\/python-extpdf\/\",\"url\":\"https:\/\/www.copahost.com\/blog\/python-extpdf\/\",\"name\":\"Python ext:pdf \u2013 PDF extensions in Python - Copahost\",\"isPartOf\":{\"@id\":\"https:\/\/www.copahost.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2023\/09\/pdf-python.png\",\"datePublished\":\"2023-10-08T12:22:52+00:00\",\"dateModified\":\"2023-12-02T15:13:21+00:00\",\"description\":\"PDF Extensions in Python: Manipulate, manage and convert PDF files with ease. Create, add notes, forms and more!\",\"breadcrumb\":{\"@id\":\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.copahost.com\/blog\/python-extpdf\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#primaryimage\",\"url\":\"https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2023\/09\/pdf-python.png\",\"contentUrl\":\"https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2023\/09\/pdf-python.png\",\"width\":1075,\"height\":717,\"caption\":\"pdf python\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.copahost.com\/blog\/python-extpdf\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.copahost.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Python ext:pdf \u2013 PDF extensions in Python\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.copahost.com\/blog\/#website\",\"url\":\"https:\/\/www.copahost.com\/blog\/\",\"name\":\"Copahost\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/www.copahost.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.copahost.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.copahost.com\/blog\/#organization\",\"name\":\"Copahost\",\"url\":\"https:\/\/www.copahost.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.copahost.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2016\/03\/copahostlogo.png\",\"contentUrl\":\"https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2016\/03\/copahostlogo.png\",\"width\":223,\"height\":40,\"caption\":\"Copahost\"},\"image\":{\"@id\":\"https:\/\/www.copahost.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.copahost.com\/blog\/#\/schema\/person\/2efb96f9dfaf6162f347abcd06b1429f\",\"name\":\"Schenia T\",\"description\":\"Data scientist, passionate about technology tools and games. Undergraduate student in Statistics at UFPB. Her hobby is binge-watching series, enjoying good music working or cooking, going to the movies and learning new things!\",\"url\":\"https:\/\/www.copahost.com\/blog\/author\/schenia\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Python ext:pdf \u2013 PDF extensions in Python - Copahost","description":"PDF Extensions in Python: Manipulate, manage and convert PDF files with ease. Create, add notes, forms and more!","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.copahost.com\/blog\/python-extpdf\/","og_locale":"en_US","og_type":"article","og_title":"Python ext:pdf \u2013 PDF extensions in Python - Copahost","og_description":"PDF Extensions in Python: Manipulate, manage and convert PDF files with ease. Create, add notes, forms and more!","og_url":"https:\/\/www.copahost.com\/blog\/python-extpdf\/","og_site_name":"Copahost","article_published_time":"2023-10-08T12:22:52+00:00","article_modified_time":"2023-12-02T15:13:21+00:00","og_image":[{"width":1075,"height":717,"url":"https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2023\/09\/pdf-python.png","type":"image\/png"}],"author":"Schenia T","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Schenia T","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.copahost.com\/blog\/python-extpdf\/#article","isPartOf":{"@id":"https:\/\/www.copahost.com\/blog\/python-extpdf\/"},"author":{"name":"Schenia T","@id":"https:\/\/www.copahost.com\/blog\/#\/schema\/person\/2efb96f9dfaf6162f347abcd06b1429f"},"headline":"Python ext:pdf \u2013 PDF extensions in Python","datePublished":"2023-10-08T12:22:52+00:00","dateModified":"2023-12-02T15:13:21+00:00","mainEntityOfPage":{"@id":"https:\/\/www.copahost.com\/blog\/python-extpdf\/"},"wordCount":2645,"commentCount":0,"publisher":{"@id":"https:\/\/www.copahost.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.copahost.com\/blog\/python-extpdf\/#primaryimage"},"thumbnailUrl":"https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2023\/09\/pdf-python.png","articleSection":["Python"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.copahost.com\/blog\/python-extpdf\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.copahost.com\/blog\/python-extpdf\/","url":"https:\/\/www.copahost.com\/blog\/python-extpdf\/","name":"Python ext:pdf \u2013 PDF extensions in Python - Copahost","isPartOf":{"@id":"https:\/\/www.copahost.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.copahost.com\/blog\/python-extpdf\/#primaryimage"},"image":{"@id":"https:\/\/www.copahost.com\/blog\/python-extpdf\/#primaryimage"},"thumbnailUrl":"https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2023\/09\/pdf-python.png","datePublished":"2023-10-08T12:22:52+00:00","dateModified":"2023-12-02T15:13:21+00:00","description":"PDF Extensions in Python: Manipulate, manage and convert PDF files with ease. Create, add notes, forms and more!","breadcrumb":{"@id":"https:\/\/www.copahost.com\/blog\/python-extpdf\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.copahost.com\/blog\/python-extpdf\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.copahost.com\/blog\/python-extpdf\/#primaryimage","url":"https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2023\/09\/pdf-python.png","contentUrl":"https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2023\/09\/pdf-python.png","width":1075,"height":717,"caption":"pdf python"},{"@type":"BreadcrumbList","@id":"https:\/\/www.copahost.com\/blog\/python-extpdf\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.copahost.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Python ext:pdf \u2013 PDF extensions in Python"}]},{"@type":"WebSite","@id":"https:\/\/www.copahost.com\/blog\/#website","url":"https:\/\/www.copahost.com\/blog\/","name":"Copahost","description":"","publisher":{"@id":"https:\/\/www.copahost.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.copahost.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.copahost.com\/blog\/#organization","name":"Copahost","url":"https:\/\/www.copahost.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.copahost.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2016\/03\/copahostlogo.png","contentUrl":"https:\/\/www.copahost.com\/blog\/wp-content\/uploads\/2016\/03\/copahostlogo.png","width":223,"height":40,"caption":"Copahost"},"image":{"@id":"https:\/\/www.copahost.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.copahost.com\/blog\/#\/schema\/person\/2efb96f9dfaf6162f347abcd06b1429f","name":"Schenia T","description":"Data scientist, passionate about technology tools and games. Undergraduate student in Statistics at UFPB. Her hobby is binge-watching series, enjoying good music working or cooking, going to the movies and learning new things!","url":"https:\/\/www.copahost.com\/blog\/author\/schenia\/"}]}},"_links":{"self":[{"href":"https:\/\/www.copahost.com\/blog\/wp-json\/wp\/v2\/posts\/3699","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.copahost.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.copahost.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.copahost.com\/blog\/wp-json\/wp\/v2\/users\/17"}],"replies":[{"embeddable":true,"href":"https:\/\/www.copahost.com\/blog\/wp-json\/wp\/v2\/comments?post=3699"}],"version-history":[{"count":12,"href":"https:\/\/www.copahost.com\/blog\/wp-json\/wp\/v2\/posts\/3699\/revisions"}],"predecessor-version":[{"id":3939,"href":"https:\/\/www.copahost.com\/blog\/wp-json\/wp\/v2\/posts\/3699\/revisions\/3939"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.copahost.com\/blog\/wp-json\/wp\/v2\/media\/3709"}],"wp:attachment":[{"href":"https:\/\/www.copahost.com\/blog\/wp-json\/wp\/v2\/media?parent=3699"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.copahost.com\/blog\/wp-json\/wp\/v2\/categories?post=3699"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.copahost.com\/blog\/wp-json\/wp\/v2\/tags?post=3699"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}