Home

PDF Operations


Blitline can take PDFs and convert them into raster images. Either one page at a time, or all at once, Blitline can handle all you PDF pre/post processing needs, and can do it on Blitline's massively scalable cloud. Once your PDF pages have been turned into a raster image, you can perform any Blitline function on them.



Small PDFs

(less than 20 pages)


As one large image


You don’t have to do anything special, just make your "src" point to a ".pdf" file and Blitline will automatically recognize it, and convert it into a large image. From there you can perform operations on it as you would any other image.

Example:
"json" : '{ "application_id": "YOUR_APP_ID",
          "src" : "https://s3.amazonaws.com/blitdoc/pdfs/multi_page_sample.pdf",
          "functions" :
          [{
            "name": "resize_to_fit",
            "params": { "width" : 200, "height" : 200},
            "save" : {
                  "image_identifier" : "external_sample_1"
              }
           }
          ]}'
OR...

Each page of the PDF


You can process each page on it's own, pushing each page as a single image to your S3 bucket or Azure storage. Just add the extra JSON field "src_type" : "multi_page".

Example:
"json" : '{ "application_id": "YOUR_APP_ID",
          "src" : "https://s3.amazonaws.com/blitdoc/pdfs/multi_page_sample.pdf",
          "src_type" : "multi_page",
          "functions" :
          [{
            "name": "resize_to_fit",
            "params": { "width" : 200, "height" : 200},
            "save" : {
                  "image_identifier" : "external_sample_1"
              }
           }
          ]}'
Remember: The output filename will have a _0 and _1 appended to it, representing the page number that the image was taken from. So for the example above, the output filename might be something like "EDePXXSiljSVBvHi42o3sg.jpg", but the outputted files will be "http://s3.amazonaws.com/blitline/2012081421/20/EDePXXSiljSVBvHi42o3sg_0.jpg" and "http://s3.amazonaws.com/blitline/2012081421/20/EDePXXSiljSVBvHi42o3sg_1.jpg"
OR...

Individual pages


You can pick individual pages, using the same functionality as above, by adding "pages" : [0,x,y] as a JSON child of "src_type".

Example:
"json" : '{ "application_id": "YOUR_APP_ID",
          "src" : "https://s3.amazonaws.com/blitdoc/pdfs/multi_page_sample.pdf",
          "src_type" : {"name" : "multi_page", "pages" : [0,1]},
          "v" : 1.22,
          "functions" :
          [{
            "name": "resize_to_fit",
            "params": { "width" : 200, "height" : 200},
            "save" : {
                  "image_identifier" : "external_sample_1"
              }
           }
          ]}'
Remember: The output filename will have a _0 and _1 appended to it, representing the page number that the image was taken from. So for the example above, the output filename might be something like "EDePXXSiljSVBvHi42o3sg.jpg", but the outputted files will be "http://s3.amazonaws.com/blitline/2012081421/20/EDePXXSiljSVBvHi42o3sg_0.jpg" and "http://s3.amazonaws.com/blitline/2012081421/20/EDePXXSiljSVBvHi42o3sg_1.jpg"


Large PDFs

(Any number of pages)


Burst!


Bursting allows you to explode the PDF into all the individual pages and run them all in parallel on Blitline's massive image processing cloud. This allows HUGE PDF's to be processed in a fraction of the time it would take to do it on your own machine or in a linear fashion.

Here is what happens behind the scenes:
  • Blitline downloads the src pdf
  • Blitline breaks the PDF into individual pages, and uploads these pages to a temp storge location
  • Blitline automatically creates a new "job" copying over the functions and data you have specified in the "burst_job", for each page of the PDF, automatically renaming the output files to have a "__X" suffix (THAT IS 2 UNDERSCORES, NOT 1). Where X refers to page number.
  • Blitline will track the jobs and when they are all completed will issue a "postback" to your postback_url or put the item in the long polling cache.

Example:
"json" : '{ "application_id": "YOUR_APP_ID",
          "src" : "https://s3.amazonaws.com/blitdoc/pdfs/multi_page_sample.pdf",
          "src_type" : "burst_pdf",
          "v" : 1.22,
          "src_data" : {"dpi" : 200},
          "functions" :
          [{
            "name": "resize_to_fit",
            "params": { "width" : 500},
            "save" : {
                  "image_identifier" : "external_sample_1"
              }
           }
          ]}'

When submitted, this will return JSON that looks something like this:
{
      "results":
      {
          "images":[{
              "image_identifier": "MY_CLIENT_ID",
                  "s3_url": "http://dev.blitline.s3.amazonaws.com/2011111513/1/fDIFJQVNlO6IeDZwXlruYg.jpg"
          }],
          "job_id": "4ec2e057c29aba53a5000001",
          "group_completion_job_id" : "B734Hasd23423llasda"
      }
  }
This result is similar to a regular blitline job, but this one has a "group_completion_job_id" which is the virtual job_id indicating the completion of the group of jobs. You can poll this "group_completion_job_id" just as you would a regular job. It will also be the job_id of the postback when ALL the jobs are completed.


Remember: The output filename will have a __X appended to it, representing the page number that the image was taken from.

So for the example above, the output filename might be something like "EDePXXSiljSVBvHi42o3sg.jpg", but the outputted files will be "http://s3.amazonaws.com/blitline/2012081421/20/EDePXXSiljSVBvHi42o3sg__0.jpg" and "http://s3.amazonaws.com/blitline/2012081421/20/EDePXXSiljSVBvHi42o3sg__1.jpg"



Save Image as PDF

Saves any single image as a PDF.


To save an image as a pdf simply make the extension on the saved file ".pdf". This will automatically convert the image into a PDF upon saving.