The problem: Needing to download hundreds of images instantly
At Balihoo we are changing providers for our blogging platform, and one of the challenges that came up was how to extract all the images (Over 600) we had uploaded to our existing software without having a built-in tool for pulling them off the server. While I was able to get a complete list of the URLs by using some scraping tools, clearly copying each URL and “save image as…” for 600 wasn’t very efficient.
The Solution
We needed to download the entire list of images and get the file name of each image and associate it with the original post it was extracted from. Since I had promised to finish this whole project on Thursday afternoon and it was Friday at 9AM, I also needed this whole process to be fast. Since extracting images from an img src isn’t the kind of thing you need a browser to render code for, so instead of coding IE I set up VBA code that would make requests for images directly through Microsoft’s XML API.
The big picture was something like this:
For URL Make API Request for Image from URL Determine ImageName from URL Save Image to filestructure/ImageName Next URL
I chose to program this in VBA (since the list I was given for URLs was in Excel) and using MSXML12 for VBA. The code (thanks to VBA-and-Excel.com for most of it) for a single extraction is:
Sub Save_image() Dim oHTTP As Object Dim sDestFolder As String Dim sSrcUrl As String Dim sImageFile As String sDestFolder = "C:\Users\adale\Desktop\Compendium Images\" sSrcUrl = ActiveCell.Value If Left(sSrcUrl, 2) = "//" Then sSrcUrl = "http:" & sSrcUrl End If sImageFile = Right(ActiveCell.Value, Len(ActiveCell.Value) - InStrRev(ActiveCell.Value, "/")) Debug.Print sImageFile ActiveCell.Offset(0, 2).Value = sImageFile Set oHTTP = CreateObject("msxml2.XMLHTTP") oHTTP.Open "GET", sSrcUrl, False oHTTP.send Set oStream = CreateObject("ADODB.Stream") Const adTypeBinary = 1 Const adSaveCreateOverWrite = 2 oStream.Type = adTypeBinary oStream.Open oStream.write oHTTP.responseBody oStream.savetofile sDestFolder & sImageFile, adSaveCreateOverWrite Set oStream = Nothing Set oHTTP = Nothing End Sub
Breakdown:
This code works by selecting a cell with an img URL in it (and an image name two columns to the right) and looking up that image and saving it to my hard drive. In addition to the saving script, there will be a sweeper script needed, but coding a sweeper script for excel is something I’ve covered extensively before.
Sub Save_image() Dim oHTTP As Object Dim sDestFolder As String Dim sSrcUrl As String Dim sImageFile As String
Standard object and variable definitons. oHTTP will become out XML API request object.
sDestFolder = "C:\Users\adale\Desktop\Compendium Images\" sSrcUrl = ActiveCell.Value If Left(sSrcUrl, 2) = "//" Then sSrcUrl = "http:" & sSrcUrl End If sImageFile = Right(ActiveCell.Value, Len(ActiveCell.Value) - InStrRev(ActiveCell.Value, "/")) Debug.Print sImageFile ActiveCell.Offset(0, 2).Value = sImageFile
This bit of code defines the save location for the images (“Compendium Images” folder on my Desktop) as well as extracting the file name from the URL and setting that name as the value two columns to the right.
Set oHTTP = CreateObject("msxml2.XMLHTTP") oHTTP.Open "GET", sSrcUrl, False oHTTP.send Set oStream = CreateObject("ADODB.Stream") Const adTypeBinary = 1 Const adSaveCreateOverWrite = 2 oStream.Type = adTypeBinary oStream.Open oStream.write oHTTP.responseBody oStream.savetofile sDestFolder & sImageFile, adSaveCreateOverWrite
This is really the bulk of the code. In the first 3 lines VBA makes the request to the URL specified for the file it is looking for. The second chunk of code is initiating a place to store the API response, and the last two lines take the API response and write it to the specified file. If you are curious about the specific commands given in this section, Google can provide you with more details about the request, the parameters, and the outputs.
Besides the cleanup code at the bottom of the script, there really is nothing else to this code. While there are thousands of potential ways to utilize the MSXML packages, I found that to be a particularly helpful use of them. When all was said and done, I spend 20 minutes coding this and 5 minutes for it to make the request to 600 URLs and save the associated files instead of hours and hours copying and pasting and right clicking.
Great tip! Thanks for posting this. I’ve searched everywhere for a solution to this problem and this is the first one that works for me, sort of. I have a couple of issues wondering if you know the solution. One, I programed this as a module in VBA but when I run it it only does it one cell at a time. So i had to make a macro and do it cell by cell, how can I get it to run and do all of the cells in a column? Two, it is saving the image name in a new column. How can I get it to save the image name in the existing cell? Three, I have some cells that don’t have an image name like: http://www.xxxxxxx.com/core/media/media.nl?id=130185&c=849850&h=db34577c43aa1bd2a71. Any way you know of I can get it to pull the image and save the actual image name in the cell?
Excellent post and share!
IS SHOW ERROR ON
oHTTP.Open “GET”, sSrcUrl, False and give erro “invalid procedure call or argument”