This task is a lil-bit tricky. The thing about it is that the images for item cards in search results sets dynamically with JavaSript by some kind templating script or engine with template which is looks as this one:
Thus when you trying to get the whole HTML of page with the requests
library or something it delivers without links to images. Moreover it delivers without a single <img>
tag.
You can check it by yourself. Run the following python code in your terminal:
then with the [CTRL+SHIFT+F]
shortcut try to find any img
tag related to product cards.
I wouldn’t want to use a whole Selenium browser emulator for such a simple task. There must be some additional data here to generate image links for the above templates, right? Of course. If you try to inspect the code of any item card then you will see that the certain part of the link is the same as the string contains in the data-id
attribute in parent <a>
tag.
The image IDs separated by comma as you can see on the following image, so each id in the string corresponds to each image in the gallery respectively. The item images on the search results page has same IDs as the images on the actual items page. The only resolution is actually changes, it’s 300x300
for search results and 600x450
for full size. I’ll use 300x300
in code examples.
So now let’s get the IDs for each item iteratively and generate some *.json
file contains the list of dictionaries that include title and list of all image links:
As a result of executing the code you will receive a res.json
file with contents of such form: