Buying a car using Python, RPI and Telegram
Some time ago I started looking for a new car. I quickly became bored with browsing the biggest polish offer site – otomoto.pl, so I decided to automate the whole thing to do it for me. Obvious, right?
The idea behind the system is rather easy. I have a Python script, which downloads new offers from otomoto.pl (for a query provided by me). The script is run every 10 mins by CRON job on RPI. It then sends to my all new offers on a Telegram channel. This way I do not have to build a new notification system, and I have immediate info on both my phone and PC.
The first step, naturally, is getting all the offers. I have divided it into 3 functions: getting list of all offers from page, getting details of one offer, getting info from all pages. All the offers will be saved in JSON file, which will be loaded at the beginning of the script. This will make it easy for me check which offers are new, and which are old.
My work begins at otomoto.pl html code analysis. Inspector (Ctrl + Shift + I) and Network view (Ctrl + Shift + E) will come in handy. I quickly realize that all the offers are in “article’ element:
Therefore inside my first function I will navigate by “article” when using Xpath. URL variable holds all the filters from otomoto.pl I want to get e.g. younger than 2016. Next I download all the offers in “article” and check the URL. If it is not present in the file, I get all details and send a notification.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
def get_offers(n): #This URL is what I was looking for, but you may change it as you need url = 'https://www.otomoto.pl/osobowe/volkswagen/tiguan/seg-suv/od-2013/?search%5Bfilter_float_year%3Ato%5D=2016&search%5Bfilter_float_mileage%3Ato%5D=100000&search%5Bfilter_enum_fuel_type%5D%5B0%5D=petrol&search%5Border%5D=created_at_first%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D=&page='+str(n) headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'} request = requests.get(url, headers = headers) tree = html.fromstring(request.text) xpath_offer_details = '//div[@class="offers list"]/article'#//text() xpath_url = '//div[@class="offers list"]/article/@data-href'#//text() offer_details = tree.xpath(xpath_offer_details) list_of_urls = tree.xpath(xpath_url) for i, detail in enumerate(offer_details): try: if not list_of_urls[i] in previous_offers: #check if URLs was present before, if not download all the details previous_offers[list_of_urls[i]] = get_single_offer(detail) sendTelegram(list_of_urls[i]) #VIN and Phone require seperate logic offer_id = list_of_urls[i].split("-ID")[1].split("html")[0] sendTelegram("VIN: "+get_vin_and_phone(offer_id)[0]) sendTelegram("Phone: " + get_vin_and_phone(offer_id)[1]) except Exception as e: print(e) |
To download VIN and a phone number I had to check network traffic, which is generated by the website. Very quickly I discovered under which URL those features are stored. This way I will bypass any ReCaptcha and will not have to use Javascript.
Basing on this, I wrote a function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
def get_vin_and_phone(id): #Digging in website's code let me discover that Vin and Phone number are available under those URLs without any additional authentication vin_url = "https://www.otomoto.pl/ajax/misc/vin/" phone_url = "https://www.otomoto.pl/ajax/misc/contact/multi_phone/{}/0" request = requests.get(vin_url+id) vin = request.text.replace("\"","") request = requests.get(phone_url.format(id)) phone = json.loads(request.text)["value"].replace(" ","") return vin, phone |
To download VIN or a phone number, you just have to paste https://www.otomoto.pl/ajax/misc/vin/ + offer ID, which is at the end of any URL.
My next function retrieves all the details of an offer under “article”. In here Inspector tool is crucial. I marked red all the elements that are interesting for me:
I used Xpath navigation here too, as I find it the easiest. Additionally, I download car’s image from its URL locally on RPI as image.jpeg. This photo will be then send by Telegram to me.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
def get_single_offer(html_element): #This function will enter html_element and retrieve all offer details basing on xpath single_offer_details = {} single_offer_details['price'] = html_element.xpath('div[@class="offer-item__content ds-details-container"]/div[@class="offer-item__price"]/div/div/span/span')[0].text_content().strip() single_offer_details['foto'] = html_element.xpath('div[@class="offer-item__photo ds-photo-container"]/a/img/@data-srcset')[0].split(';s=')[0] single_offer_details['offer_details'] = html_element.xpath('div[@class="offer-item__content ds-details-container"]/*[@class="ds-params-block"]/*[@class="ds-param"]/span/text()') urllib.request.urlretrieve(single_offer_details['foto'], "/home/user/Documents/Programowanie/TiguanWatchOut/image.jpeg") #This will save photo from URL and save it locally. This will enable me to add this to my telegram sendTelegram('Nowy Tiguan, Price: '+ single_offer_details['price']+', Details: ' + ', '.join(offer_details)) sendPhoto() return single_offer_details |
Downloading multiple pages requires information about how many there are. To know this I use Inspector to find the correct place in HTML and include this in my function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
def get_number_of_pages(): #This function will just retrieve the maximum number of pages on the website. This is used when iterating through n pages url = 'https://www.otomoto.pl/osobowe/volkswagen/tiguan/seg-suv/od-2016/?search%5Bfilter_float_year%3Ato%5D=2018&search%5Bfilter_float_mileage%3Ato%5D=60000&search%5Bfilter_enum_fuel_type%5D%5B0%5D=petrol&search%5Border%5D=created_at_first%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D=&page=1' headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'} request = requests.get(url, headers = headers) tree = html.fromstring(request.text) max_page= tree.xpath('//ul[@class="om-pager rel"]/li[last()-1]/a/span/text()')[0].strip() return int(max_page) |
With all those pieces ready, I could start sending everything by Telegram. To do that I had to create a bot. It is a very easy process: first start a new conversation with BotFather. Write /newbot, give your bot a name and save the token you receive.
Using the token, I could write two additional functions. One for sending messages and the other for sending photos.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
def sendTelegram(message): token = '<Your Bot Token>' method = 'sendMessage' url = 'https://api.telegram.org/bot{0}/{1}'.format(token, method) try: response = requests.post(url=url , data = {'chat_id':<Your Chat ID>,'text':message , 'attachments':[{}]}).json() print(response) except Exception as e: print(e) def sendPhoto(): token = '<Your Bot Token>' method = 'sendPhoto' data={'chat_id': <Your Chat ID>} files = {'photo': ("/home/user/Documents/Programowanie/TiguanWatchOut/image.jpeg", open("/home/user/Documents/Programowanie/TiguanWatchOut/image.jpeg",'rb'))} try: response = requests.post(url='https://api.telegram.org/bot{0}/{1}'.format(token, 'sendPhoto'),data=data, files=files).json() print(response) except Exception as e: print(e) |
With this ready, I could save the script on RPI, which is constantly on. This way I am sure the script will be working 24/7. To make it run on schedule I used CRON job on linux. To configure that I wrote in terminal:
1 |
crontab -e |
And add path to my script:
1 |
0,10,20,30,40,50 * * * * /usr/bin/python3 /home/user/Documents/Programowanie/TiguanWatchOut/tiguan.py |
This will mean that any day, any hour that has minutes like 10, 20, 30 etc. the script will be triggered.
And this is it. This is how I created an automated process, which notifies me of any interesting offes. Example of script in action:
I can now be the first person to message the seller and use it as an advantage!
Below you can find the whole code. You are all welcome to comment too!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
import requests from lxml import html import os import json import time from telegram import sendTelegram from telegram import sendPhoto import datetime import urllib.request # All offers are saved in Json file, so at the beginning I load everything into variable, to check for URLs. I treat URLs as unique identifiers with open('/home/user/Documents/Programowanie/TiguanWatchOut/tiguan.json') as json_file: previous_offers = json.load(json_file) # This function will open n-th page and save all details of offers that are not present in tiguan.json def get_offers(n): #This URL is what I was looking for, but you may change it as you need url = 'https://www.otomoto.pl/osobowe/volkswagen/tiguan/seg-suv/od-2013/?search%5Bfilter_float_year%3Ato%5D=2016&search%5Bfilter_float_mileage%3Ato%5D=100000&search%5Bfilter_enum_fuel_type%5D%5B0%5D=petrol&search%5Border%5D=created_at_first%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D=&page='+str(n) headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'} request = requests.get(url, headers = headers) tree = html.fromstring(request.text) xpath_offer_details = '//div[@class="offers list"]/article'#//text() xpath_url = '//div[@class="offers list"]/article/@data-href'#//text() offer_details = tree.xpath(xpath_offer_details) list_of_urls = tree.xpath(xpath_url) for i, detail in enumerate(offer_details): try: if not list_of_urls[i] in previous_offers: #check if URLs was present before, if not download all the details previous_offers[list_of_urls[i]] = get_single_offer(detail) sendTelegram(list_of_urls[i]) #VIN and Phone require seperate logic offer_id = list_of_urls[i].split("-ID")[1].split(".html")[0] print(offer_id) sendTelegram("VIN: "+str(get_vin_and_phone(offer_id)[0])) sendTelegram("Phone: " + str(get_vin_and_phone(offer_id)[1])) except Exception as e: print(e) print("sss") def get_single_offer(html_element): #This function will enter html_element and retrieve all offer details basing on xpath single_offer_details = {} single_offer_details['price'] = html_element.xpath('div[@class="offer-item__content ds-details-container"]/div[@class="offer-item__price"]/div/div/span/span')[0].text_content().strip() single_offer_details['foto'] = html_element.xpath('div[@class="offer-item__photo ds-photo-container"]/a/img/@data-srcset')[0].split(';s=')[0] single_offer_details['offer_details'] = html_element.xpath('div[@class="offer-item__content ds-details-container"]/*[@class="ds-params-block"]/*[@class="ds-param"]/span/text()') urllib.request.urlretrieve(single_offer_details['foto'], "/home/user/Documents/Programowanie/TiguanWatchOut/image.jpeg") #This will save photo from URL and save it locally. This will enable me to add this to my telegram sendTelegram('Nowy Tiguan, Price: '+ single_offer_details['price']+', Details: ' + ', '.join(single_offer_details['offer_details'])) sendPhoto() return single_offer_details def get_number_of_pages(): #This function will just retrieve the maximum number of pages on the website. This is used when iterating through n pages url = 'https://www.otomoto.pl/osobowe/volkswagen/tiguan/seg-suv/od-2016/?search%5Bfilter_float_year%3Ato%5D=2018&search%5Bfilter_float_mileage%3Ato%5D=60000&search%5Bfilter_enum_fuel_type%5D%5B0%5D=petrol&search%5Border%5D=created_at_first%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D=&page=1' headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'} request = requests.get(url, headers = headers) tree = html.fromstring(request.text) max_page= tree.xpath('//ul[@class="om-pager rel"]/li[last()-1]/a/span/text()')[0].strip() return int(max_page) def get_everything(): #This function iterates through all pages, saving everything into globabl variable previous_offers that will be saves to json. for i in range(1,get_number_of_pages()): get_offers(i) try: with open('/home/user/Documents/Programowanie/TiguanWatchOut/tiguan.json', 'w') as json_file: json.dump(previous_offers, json_file) except Exception as e: sendTelegram(e) def get_vin_and_phone(id): #Digging in website's code let me discover that Vin and Phone number are available under those URLs without any additional authentication vin_url = "https://www.otomoto.pl/ajax/misc/vin/" phone_url = "https://www.otomoto.pl/ajax/misc/contact/multi_phone/{}/0" request = requests.get(vin_url+id) vin = request.text.replace("\"","") request = requests.get(phone_url.format(id)) phone = json.loads(request.text)["value"].replace(" ","") #print(vin) #print(phone) return vin, phone get_everything() |