Home

Blitline supports importing PDFs and turning them into images:


  • You can import it and turn it into a giant tall image (with all the pages layed out vertically).
  • You can import multiple individual pages of a PDF and perform image operations on those.
  • You can import all the individual pages and have a batch of functions performed on each of those items.


Large PDFs? (Over 20 pages)

Are you turning large PDFs into individual images for printing? Try our new "burst_pdf" functionality.

The functions below only work on PDFs smaller than 20 pages


Import entire PDF as one large image

You don’t have to do anything special, just make your "src" point to a ".pdf" file and Blitline will automatically recognize it, and convert it into a large image. From there you can perform operations on it as you would any other image.

Or...

Import all the pages of the PDF and process each

You must add an "src_type" field to the json indicating that you are processing a multi-page document. The value for "src_type" must be "multi_page". When set to "multi_page" blitline will load the PDF and perform all the subsequent Blitline operations on each page. The output from Blitline (via the "save" field) will generate a file with an underscore and page number appended to it.

"json" : '{ "application_id": "YOUR_APP_ID",
          "src" : "https://s3.amazonaws.com/bltemp/non_stock_bulk_sell_sheet.pdf",
          "src_type" : "multi_page",
          "functions" :
          [{
            "name": "resize_to_fit",
            "params": { "width" : 200, "height" : 200},
            "save" : {
                  "image_identifier" : "external_sample_1"
              }
           }
          ]}'
Remember: The output filename will have a _0 and _1 appended to it, representing the page number that the image was taken from. So for the example above, the output filename might be something like "EDePXXSiljSVBvHi42o3sg.jpg", but the outputted files will be "http://s3.amazonaws.com/blitline/2012081421/20/EDePXXSiljSVBvHi42o3sg_0.jpg" and "http://s3.amazonaws.com/blitline/2012081421/20/EDePXXSiljSVBvHi42o3sg_1.jpg"

Or...

Import specific pages from a PDF

You must add an "src_type" field to the json indicating that you are processing a multi-page document. The value for "src_type" must be a json hash of { "name" : "multi_page", "pages" : [0,1]}
This behaves the same way as the default "multi_page" abovem but only performs the operations on the pages identified by the "pages" array.
 
"json" : '{ "application_id": "YOUR_APP_ID",
          "src" : "https://s3.amazonaws.com/bltemp/non_stock_bulk_sell_sheet.pdf",
          "src_type" : {"name" : "multi_page", "pages" : [0,1]},
          "v" : 1.20,
          "functions" :
          [{
            "name": "resize_to_fit",
            "params": { "width" : 200, "height" : 200},
            "save" : {
                  "image_identifier" : "external_sample_1"
              }
           }
          ]}'
Remember: The output filename will have a _0 and _1 appended to it, representing the page number that the image was taken from. So for the example above, the output filename might be something like "EDePXXSiljSVBvHi42o3sg.jpg", but the outputted files will be "http://s3.amazonaws.com/blitline/2012081421/20/EDePXXSiljSVBvHi42o3sg_0.jpg" and "http://s3.amazonaws.com/blitline/2012081421/20/EDePXXSiljSVBvHi42o3sg_1.jpg"


Or...

Burst PDF

Burst PDF is a different kind of PDF processing. This is a convenience function that we've added to Blitline which will take a large PDF source and break it into individual jobs, each processing a single page, and submitting them automatically back to Blitline. This allows you to process large PDFs in parallel.

Here is what happens behind the scenes:
  • Blitline downloads the src pdf
  • Blitline breaks the PDF into individual pages, and uploads these pages to a temp storge location
  • Blitline automatically creates a new "job" copying over the functions and data you have specified in the "burst_job", for each page of the PDF, automatically renaming the output files to have a "__X" suffix (THAT IS 2 UNDERSCORES, NOT 1). Where X refers to page number.
  • Blitline will track the jobs and when they are all completed will issue a "postback" to your postback_url or put the item in the long polling cache.
 
"json" : '{ "application_id": "YOUR_APP_ID",
          "src" : "https://s3.amazonaws.com/bltemp/non_stock_bulk_sell_sheet.pdf",
          "src_type" : "burst_pdf", 
          "src_data" : {"dpi" : 200},
          "v" : 1.20,
          "functions" :
          [{
            "name": "resize_to_fit",
            "params": { "width" : 500},
            "save" : {
                  "image_identifier" : "external_sample_1"
              }
           }
          ]}'

When submitted, this will return JSON that looks something like this:
{
      "results":
      {
          "images":[{
              "image_identifier": "MY_CLIENT_ID",
                  "s3_url": "https://s3.amazonaws.com/dev.blitline/2011111513/1/fDIFJQVNlO6IeDZwXlruYg.jpg"
          }],
          "job_id": "4ec2e057c29aba53a5000001",
          "group_completion_job_id" : "B734Hasd23423llasda"
      }
  }
This result is similar to a regular blitline job, but this one has a "group_completion_job_id" which is the virtual job_id indicating the completion of the group of jobs. You can poll this "group_completion_job_id" just as you would a regular job. It will also be the job_id of the postback when ALL the jobs are completed.


Remember: The output filename will have a __X appended to it, representing the page number that the image was taken from.

So for the example above, the output filename might be something like "EDePXXSiljSVBvHi42o3sg.jpg", but the outputted files will be "http://s3.amazonaws.com/blitline/2012081421/20/EDePXXSiljSVBvHi42o3sg__0.jpg" and "http://s3.amazonaws.com/blitline/2012081421/20/EDePXXSiljSVBvHi42o3sg__1.jpg"