Buying a car using Python, RPI and Telegram

Some time ago I started looking for a new car. I quickly became bored with browsing the biggest polish offer site – otomoto.pl, so I decided to automate the whole thing to do it for me. Obvious, right?

The idea behind the system is rather easy. I have a Python script, which downloads new offers from otomoto.pl (for a query provided by me). The script is run every 10 mins by CRON job on RPI. It then sends to my all new offers on a Telegram channel. This way I do not have to build a new notification system, and I have immediate info on both my phone and PC.

The first step, naturally, is getting all the offers. I have divided it into 3 functions: getting list of all offers from page, getting details of one offer, getting info from all pages. All the offers will be saved in JSON file, which will be loaded at the beginning of the script. This will make it easy for me check which offers are new, and which are old.

My work begins at otomoto.pl html code analysis. Inspector (Ctrl + Shift + I) and Network view (Ctrl + Shift + E) will come in handy. I quickly realize that all the offers are in “article’ element:

Therefore inside my first function I will navigate by “article” when using Xpath. URL variable holds all the filters from otomoto.pl I want to get e.g. younger than 2016. Next I download all the offers in “article” and check the URL. If it is not present in the file, I get all details and send a notification.

def get_offers(n):

#This URL is what I was looking for, but you may change it as you need

    url = 'https://www.otomoto.pl/osobowe/volkswagen/tiguan/seg-suv/od-2013/?search%5Bfilter_float_year%3Ato%5D=2016&search%5Bfilter_float_mileage%3Ato%5D=100000&search%5Bfilter_enum_fuel_type%5D%5B0%5D=petrol&search%5Border%5D=created_at_first%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D=&page='+str(n)

    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

    request = requests.get(url, headers = headers)

    tree = html.fromstring(request.text)

    xpath_offer_details = '//div[@class="offers list"]/article'#//text()

    xpath_url = '//div[@class="offers list"]/article/@data-href'#//text()

    offer_details = tree.xpath(xpath_offer_details)

    list_of_urls = tree.xpath(xpath_url)

for i, detail in enumerate(offer_details):

try:

if not list_of_urls[i] in previous_offers: #check if URLs was present before, if not download all the details

                previous_offers[list_of_urls[i]] = get_single_offer(detail)

                sendTelegram(list_of_urls[i])

#VIN and Phone require seperate logic

                offer_id = list_of_urls[i].split("-ID")[1].split("html")[0]

                sendTelegram("VIN: "+get_vin_and_phone(offer_id)[0])

                sendTelegram("Phone: " + get_vin_and_phone(offer_id)[1])

except Exception as e:

print(e)

def get_offers(n):

#This URL is what I was looking for, but you may change it as you need

url = 'https://www.otomoto.pl/osobowe/volkswagen/tiguan/seg-suv/od-2013/?search%5Bfilter_float_year%3Ato%5D=2016&search%5Bfilter_float_mileage%3Ato%5D=100000&search%5Bfilter_enum_fuel_type%5D%5B0%5D=petrol&search%5Border%5D=created_at_first%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D=&page='+str(n)

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

request = requests.get(url, headers = headers)

tree = html.fromstring(request.text)

xpath_offer_details = '//div[@class="offers list"]/article'#//text()

xpath_url = '//div[@class="offers list"]/article/@data-href'#//text()

offer_details = tree.xpath(xpath_offer_details)

list_of_urls = tree.xpath(xpath_url)

for i, detail in enumerate(offer_details):

try:

if not list_of_urls[i] in previous_offers: #check if URLs was present before, if not download all the details

previous_offers[list_of_urls[i]] = get_single_offer(detail)

sendTelegram(list_of_urls[i])

#VIN and Phone require seperate logic

offer_id = list_of_urls[i].split("-ID")[1].split("html")[0]

sendTelegram("VIN: "+get_vin_and_phone(offer_id)[0])

sendTelegram("Phone: " + get_vin_and_phone(offer_id)[1])

except Exception as e:

print(e)

To download VIN and a phone number I had to check network traffic, which is generated by the website. Very quickly I discovered under which URL those features are stored. This way I will bypass any ReCaptcha and will not have to use Javascript.

Basing on this, I wrote a function:

def get_vin_and_phone(id):

#Digging in website's code let me discover that Vin and Phone number are available under those URLs without any additional authentication

    vin_url = "https://www.otomoto.pl/ajax/misc/vin/"

    phone_url = "https://www.otomoto.pl/ajax/misc/contact/multi_phone/{}/0"

    request = requests.get(vin_url+id)

    vin = request.text.replace("\"","")

    request = requests.get(phone_url.format(id))

    phone = json.loads(request.text)["value"].replace(" ","")

return vin, phone

def get_vin_and_phone(id):

#Digging in website's code let me discover that Vin and Phone number are available under those URLs without any additional authentication

vin_url = "https://www.otomoto.pl/ajax/misc/vin/"

phone_url = "https://www.otomoto.pl/ajax/misc/contact/multi_phone/{}/0"

request = requests.get(vin_url+id)

vin = request.text.replace("\"","")

request = requests.get(phone_url.format(id))

phone = json.loads(request.text)["value"].replace(" ","")

return vin, phone

To download VIN or a phone number, you just have to paste https://www.otomoto.pl/ajax/misc/vin/ + offer ID, which is at the end of any URL.

My next function retrieves all the details of an offer under “article”. In here Inspector tool is crucial. I marked red all the elements that are interesting for me:

I used Xpath navigation here too, as I find it the easiest. Additionally, I download car’s image from its URL locally on RPI as image.jpeg. This photo will be then send by Telegram to me.

def get_single_offer(html_element):

#This function will enter html_element and retrieve all offer details basing on xpath

    single_offer_details = {}

    single_offer_details['price'] = html_element.xpath('div[@class="offer-item__content ds-details-container"]/div[@class="offer-item__price"]/div/div/span/span')[0].text_content().strip()

    single_offer_details['foto'] = html_element.xpath('div[@class="offer-item__photo  ds-photo-container"]/a/img/@data-srcset')[0].split(';s=')[0]

    single_offer_details['offer_details'] =  html_element.xpath('div[@class="offer-item__content ds-details-container"]/*[@class="ds-params-block"]/*[@class="ds-param"]/span/text()')

    urllib.request.urlretrieve(single_offer_details['foto'], "/home/user/Documents/Programowanie/TiguanWatchOut/image.jpeg") #This will save photo from URL and save it locally. This will enable me to add this to my telegram

    sendTelegram('Nowy Tiguan, Price: '+ single_offer_details['price']+', Details: ' + ', '.join(offer_details))

    sendPhoto()

return single_offer_details

def get_single_offer(html_element):

#This function will enter html_element and retrieve all offer details basing on xpath

single_offer_details = {}

single_offer_details['price'] = html_element.xpath('div[@class="offer-item__content ds-details-container"]/div[@class="offer-item__price"]/div/div/span/span')[0].text_content().strip()

single_offer_details['foto'] = html_element.xpath('div[@class="offer-item__photo ds-photo-container"]/a/img/@data-srcset')[0].split(';s=')[0]

single_offer_details['offer_details'] = html_element.xpath('div[@class="offer-item__content ds-details-container"]/*[@class="ds-params-block"]/*[@class="ds-param"]/span/text()')

urllib.request.urlretrieve(single_offer_details['foto'], "/home/user/Documents/Programowanie/TiguanWatchOut/image.jpeg") #This will save photo from URL and save it locally. This will enable me to add this to my telegram

sendTelegram('Nowy Tiguan, Price: '+ single_offer_details['price']+', Details: ' + ', '.join(offer_details))

sendPhoto()

return single_offer_details

Downloading multiple pages requires information about how many there are. To know this I use Inspector to find the correct place in HTML and include this in my function:

def get_number_of_pages():

#This function will just retrieve the maximum number of pages on the website. This is used when iterating through n pages

    url = 'https://www.otomoto.pl/osobowe/volkswagen/tiguan/seg-suv/od-2016/?search%5Bfilter_float_year%3Ato%5D=2018&search%5Bfilter_float_mileage%3Ato%5D=60000&search%5Bfilter_enum_fuel_type%5D%5B0%5D=petrol&search%5Border%5D=created_at_first%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D=&page=1'

    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

    request = requests.get(url, headers = headers)

    tree = html.fromstring(request.text)

    max_page= tree.xpath('//ul[@class="om-pager rel"]/li[last()-1]/a/span/text()')[0].strip()

return int(max_page)

def get_number_of_pages():

#This function will just retrieve the maximum number of pages on the website. This is used when iterating through n pages

url = 'https://www.otomoto.pl/osobowe/volkswagen/tiguan/seg-suv/od-2016/?search%5Bfilter_float_year%3Ato%5D=2018&search%5Bfilter_float_mileage%3Ato%5D=60000&search%5Bfilter_enum_fuel_type%5D%5B0%5D=petrol&search%5Border%5D=created_at_first%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D=&page=1'

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

request = requests.get(url, headers = headers)

tree = html.fromstring(request.text)

max_page= tree.xpath('//ul[@class="om-pager rel"]/li[last()-1]/a/span/text()')[0].strip()

return int(max_page)

With all those pieces ready, I could start sending everything by Telegram. To do that I had to create a bot. It is a very easy process: first start a new conversation with BotFather. Write /newbot, give your bot a name and save the token you receive.

Using the token, I could write two additional functions. One for sending messages and the other for sending photos.

def sendTelegram(message):

    token = '<Your Bot Token>'

    method = 'sendMessage'

    url = 'https://api.telegram.org/bot{0}/{1}'.format(token, method)

try:

        response = requests.post(url=url , data = {'chat_id':<Your Chat ID>,'text':message , 'attachments':[{}]}).json()

print(response)

except Exception as e: print(e)

def sendPhoto():

    token = '<Your Bot Token>'

    method = 'sendPhoto'

    data={'chat_id': <Your Chat ID>}

    files = {'photo': ("/home/user/Documents/Programowanie/TiguanWatchOut/image.jpeg", open("/home/user/Documents/Programowanie/TiguanWatchOut/image.jpeg",'rb'))}

try:

        response = requests.post(url='https://api.telegram.org/bot{0}/{1}'.format(token, 'sendPhoto'),data=data, files=files).json()

print(response)

except Exception as e: print(e)

def sendTelegram(message):

token = '<Your Bot Token>'

method = 'sendMessage'

url = 'https://api.telegram.org/bot{0}/{1}'.format(token, method)

try:

response = requests.post(url=url , data = {'chat_id':<Your Chat ID>,'text':message , 'attachments':[{}]}).json()

print(response)

except Exception as e: print(e)

def sendPhoto():

token = '<Your Bot Token>'

method = 'sendPhoto'

data={'chat_id': <Your Chat ID>}

files = {'photo': ("/home/user/Documents/Programowanie/TiguanWatchOut/image.jpeg", open("/home/user/Documents/Programowanie/TiguanWatchOut/image.jpeg",'rb'))}

try:

response = requests.post(url='https://api.telegram.org/bot{0}/{1}'.format(token, 'sendPhoto'),data=data, files=files).json()

print(response)

except Exception as e: print(e)

With this ready, I could save the script on RPI, which is constantly on. This way I am sure the script will be working 24/7. To make it run on schedule I used CRON job on linux. To configure that I wrote in terminal:

crontab -e

1	crontab -e

And add path to my script:

0,10,20,30,40,50 * * * * /usr/bin/python3 /home/user/Documents/Programowanie/TiguanWatchOut/tiguan.py

1	0,10,20,30,40,50 * * * * /usr/bin/python3 /home/user/Documents/Programowanie/TiguanWatchOut/tiguan.py

This will mean that any day, any hour that has minutes like 10, 20, 30 etc. the script will be triggered.

And this is it. This is how I created an automated process, which notifies me of any interesting offes. Example of script in action:

I can now be the first person to message the seller and use it as an advantage!

Below you can find the whole code. You are all welcome to comment too!

import requests

from lxml import html

import os

import json

import time

from telegram import sendTelegram

from telegram import sendPhoto

import datetime

import urllib.request

# All offers are saved in Json file, so at the beginning I load everything into variable, to check for URLs. I treat URLs as unique identifiers

with open('/home/user/Documents/Programowanie/TiguanWatchOut/tiguan.json') as json_file:

    previous_offers = json.load(json_file)

# This function will open n-th page and save all details of offers that are not present in tiguan.json

def get_offers(n):

#This URL is what I was looking for, but you may change it as you need

    url = 'https://www.otomoto.pl/osobowe/volkswagen/tiguan/seg-suv/od-2013/?search%5Bfilter_float_year%3Ato%5D=2016&search%5Bfilter_float_mileage%3Ato%5D=100000&search%5Bfilter_enum_fuel_type%5D%5B0%5D=petrol&search%5Border%5D=created_at_first%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D=&page='+str(n)

    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

    request = requests.get(url, headers = headers)

    tree = html.fromstring(request.text)

    xpath_offer_details = '//div[@class="offers list"]/article'#//text()

    xpath_url = '//div[@class="offers list"]/article/@data-href'#//text()

    offer_details = tree.xpath(xpath_offer_details)

    list_of_urls = tree.xpath(xpath_url)

for i, detail in enumerate(offer_details):

try:

if not list_of_urls[i] in previous_offers: #check if URLs was present before, if not download all the details

                previous_offers[list_of_urls[i]] = get_single_offer(detail)

                sendTelegram(list_of_urls[i])

#VIN and Phone require seperate logic

                offer_id = list_of_urls[i].split("-ID")[1].split(".html")[0]

print(offer_id)

                sendTelegram("VIN: "+str(get_vin_and_phone(offer_id)[0]))

                sendTelegram("Phone: " + str(get_vin_and_phone(offer_id)[1]))

except Exception as e:

print(e)

print("sss")

def get_single_offer(html_element):

#This function will enter html_element and retrieve all offer details basing on xpath

    single_offer_details = {}

    single_offer_details['price'] = html_element.xpath('div[@class="offer-item__content ds-details-container"]/div[@class="offer-item__price"]/div/div/span/span')[0].text_content().strip()

    single_offer_details['foto'] = html_element.xpath('div[@class="offer-item__photo  ds-photo-container"]/a/img/@data-srcset')[0].split(';s=')[0]

    single_offer_details['offer_details'] =  html_element.xpath('div[@class="offer-item__content ds-details-container"]/*[@class="ds-params-block"]/*[@class="ds-param"]/span/text()')

    urllib.request.urlretrieve(single_offer_details['foto'], "/home/user/Documents/Programowanie/TiguanWatchOut/image.jpeg") #This will save photo from URL and save it locally. This will enable me to add this to my telegram

    sendTelegram('Nowy Tiguan, Price: '+ single_offer_details['price']+', Details: ' + ', '.join(single_offer_details['offer_details']))

    sendPhoto()

return single_offer_details

def get_number_of_pages():

#This function will just retrieve the maximum number of pages on the website. This is used when iterating through n pages

    url = 'https://www.otomoto.pl/osobowe/volkswagen/tiguan/seg-suv/od-2016/?search%5Bfilter_float_year%3Ato%5D=2018&search%5Bfilter_float_mileage%3Ato%5D=60000&search%5Bfilter_enum_fuel_type%5D%5B0%5D=petrol&search%5Border%5D=created_at_first%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D=&page=1'

    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

    request = requests.get(url, headers = headers)

    tree = html.fromstring(request.text)

    max_page= tree.xpath('//ul[@class="om-pager rel"]/li[last()-1]/a/span/text()')[0].strip()

return int(max_page)

def get_everything():

#This function iterates through all pages, saving everything into globabl variable previous_offers that will be saves to json.

for i in range(1,get_number_of_pages()):

        get_offers(i)

try:

with open('/home/user/Documents/Programowanie/TiguanWatchOut/tiguan.json', 'w') as json_file:

            json.dump(previous_offers, json_file)

except Exception as e:

        sendTelegram(e)

def get_vin_and_phone(id):

#Digging in website's code let me discover that Vin and Phone number are available under those URLs without any additional authentication

    vin_url = "https://www.otomoto.pl/ajax/misc/vin/"

    phone_url = "https://www.otomoto.pl/ajax/misc/contact/multi_phone/{}/0"

    request = requests.get(vin_url+id)

    vin = request.text.replace("\"","")

    request = requests.get(phone_url.format(id))

    phone = json.loads(request.text)["value"].replace(" ","")

#print(vin)

#print(phone)

return vin, phone

get_everything()

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

import requests

from lxml import html

import os

import json

import time

from telegram import sendTelegram

from telegram import sendPhoto

import datetime

import urllib.request

# All offers are saved in Json file, so at the beginning I load everything into variable, to check for URLs. I treat URLs as unique identifiers

with open('/home/user/Documents/Programowanie/TiguanWatchOut/tiguan.json') as json_file:

previous_offers = json.load(json_file)

# This function will open n-th page and save all details of offers that are not present in tiguan.json

def get_offers(n):

#This URL is what I was looking for, but you may change it as you need

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

request = requests.get(url, headers = headers)

tree = html.fromstring(request.text)

xpath_offer_details = '//div[@class="offers list"]/article'#//text()

xpath_url = '//div[@class="offers list"]/article/@data-href'#//text()

offer_details = tree.xpath(xpath_offer_details)

list_of_urls = tree.xpath(xpath_url)

for i, detail in enumerate(offer_details):

try:

if not list_of_urls[i] in previous_offers: #check if URLs was present before, if not download all the details

previous_offers[list_of_urls[i]] = get_single_offer(detail)

sendTelegram(list_of_urls[i])

#VIN and Phone require seperate logic

offer_id = list_of_urls[i].split("-ID")[1].split(".html")[0]

print(offer_id)

sendTelegram("VIN: "+str(get_vin_and_phone(offer_id)[0]))

sendTelegram("Phone: " + str(get_vin_and_phone(offer_id)[1]))

except Exception as e:

print(e)

print("sss")

def get_single_offer(html_element):

#This function will enter html_element and retrieve all offer details basing on xpath

single_offer_details = {}

single_offer_details['price'] = html_element.xpath('div[@class="offer-item__content ds-details-container"]/div[@class="offer-item__price"]/div/div/span/span')[0].text_content().strip()

single_offer_details['foto'] = html_element.xpath('div[@class="offer-item__photo ds-photo-container"]/a/img/@data-srcset')[0].split(';s=')[0]

single_offer_details['offer_details'] = html_element.xpath('div[@class="offer-item__content ds-details-container"]/*[@class="ds-params-block"]/*[@class="ds-param"]/span/text()')

sendTelegram('Nowy Tiguan, Price: '+ single_offer_details['price']+', Details: ' + ', '.join(single_offer_details['offer_details']))

sendPhoto()

return single_offer_details

def get_number_of_pages():

#This function will just retrieve the maximum number of pages on the website. This is used when iterating through n pages

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

request = requests.get(url, headers = headers)

tree = html.fromstring(request.text)

max_page= tree.xpath('//ul[@class="om-pager rel"]/li[last()-1]/a/span/text()')[0].strip()

return int(max_page)

def get_everything():

#This function iterates through all pages, saving everything into globabl variable previous_offers that will be saves to json.

for i in range(1,get_number_of_pages()):

get_offers(i)

try:

with open('/home/user/Documents/Programowanie/TiguanWatchOut/tiguan.json', 'w') as json_file:

json.dump(previous_offers, json_file)

except Exception as e:

sendTelegram(e)

def get_vin_and_phone(id):

#Digging in website's code let me discover that Vin and Phone number are available under those URLs without any additional authentication

vin_url = "https://www.otomoto.pl/ajax/misc/vin/"

phone_url = "https://www.otomoto.pl/ajax/misc/contact/multi_phone/{}/0"

request = requests.get(vin_url+id)

vin = request.text.replace("\"","")

request = requests.get(phone_url.format(id))

phone = json.loads(request.text)["value"].replace(" ","")

#print(vin)

#print(phone)

return vin, phone

get_everything()

Michał Ćwiok

Analytics, clouds, privacy and stuff.

Buying a car using Python, RPI and Telegram

Michał

Leave a Reply Cancel reply