Learning How to Do Things with Python

by Jarrett Retz November 25th, 2020
programming python learning

Process

Before doing a difficult task, it helps to have a process. Processes allow you to reduce the cognitive load of a task by outsourcing part of the "figuring out" to the process.

Processes can—also—reduce the chance that you will miss something obvious, or waste time trying to do something in a way that is based on your current mood, feelings, energy level, etc.

In this article, I am going to walk through a simple process for figuring out how to do something in Python. I will share the steps that I like to follow and the resources I like to use.

The task? Merge and split PDFs.

This is pretty easy, and straightforward, in Python. However, I have never done it. There are more difficult tasks that I could have chosen. But, the first step when choosing to do something in Python is effort consideration and research.

Research

Before I decide to take on a task with Python, I need to ask myself one of (if not all):

  • How much effort is this going to take?
  • Is this something that is within the bounds of my skill level?
  • How much time is this going to take?
  • Will the time/effort equal the reward?

An example of a task that is beyond my skill level, would take more time than I have, and has no immediate benefit for me, is using the OpenCV library to try to identify a car.

An example of a task below the bounds of my skill level, that satisfies the effort/reward ratio is merging and splitting PDFs in a blog post.

Next, I can use a search engine to see if this has been done successfully by other programmers. Using DuckDuckGo​ I searched, python pdf merge. The results are promising.

When I do a search about programming I am hoping to find something on Stack Overflow (SO). I am in luck because the first hit is a post on SO.

After clicking on the post (https://stackoverflow.com/questions/3444645/merge-pdf-files), I found:

  • It's a task people have been asking about for many years
  • It can be done without a lot of code
  • The best library for the task is PyPDF2

If this were not the case, I would have a few more options:

  • Look for other blogs articles
  • Try a different search
  • Reconsider what I am trying to do

If I don't find something tangible quick, I may need to reconsider what, or how, I am doing something. Typically, I think, if other people aren't doing [this thing I am trying to do] then maybe I should not be doing it either.

Fortunately, that is not the case, and we can move on with setting up a playground.

Set-Up

I know that I can use the PyPDF2 library to merge PDFs. Next, I need to set up a small playground environment where I can get to know the library. This step requires getting the proper data, files, URLs, libraries, etc.

I need to do two things:

  • Acquire two PDFs
  • Install PyPDF2

In a terminal, I installed the library.

pip3 install pypdf2

In my desktop folder, I have two PDFs: example_file and example_file2.

Now, I can open up Pythons' IDLE application and try to merge the PDFs.

Merging PDF Files with Python

It's important that I have an uncomplicated environment, and that I am using small files. When I am doing something with Python that is new to me, I want to make the initial task trivial.

My goal may be to build a cloud function that parses PDFs and conditionally merges or splits them for an application. Or, it can be a contrived example in a blog post. Either way, I want to be successful with the library as soon as possible.

This means I need to do something small.

What's nice about this task is that there is example code online that I can follow. There are many different ways that people have done this on SO, but they all seem to import the same module. In the Python shell, I am going to import that module.

>>> from PyPDF2 import PdfFileMerger

Documentation

I used Stack Overflow and got a pretty good start.

I saw that people were using the PyPdf2 library, and that there was a module in the library named PdfFileMerger. Before getting too much into copying and running code, I want to investigate this class further.

Next, I searched the documentation online and found information on how to use the PdfFileMerger class.

In the documentation, I see a method named merge.

This is an ideal progression:

  1. Find some starting code on Stack Overflow
  2. Quickly locate useful documentation

So, I believe I can create a merger object and then use the merge() method.

Merging

First, I can create the merger object.

>>> file_merger = PdfFileMerger()

Then, I can call the merge() function and pass in the necessary arguments for the two files.

>>> file_merger.merge(0, "./example_file.pdf")
>>> file_merger.merge(1, "./example_file2.pdf")

The data is in the file_merger object.

Finally, we can write that data to a new file.

>>> file_merger.write("./merged_example_files.pdf")

With this code, I am able to create a new file that merges the two PDFs!

Conclusion

I was able to accomplish this new task with only a few lines of careful code. I combined informal information on Stack Overflow with formal documentation on the library website to conserve time and limit frustration.

I now feel comfortable using two new methods and a new library to merge PDFs with Python!


Have a thought about the article?

Send JRTS a message!

We'll use this email to respond to your message.

Contact