Kupowanie samochodu za pomocą Pythona, RPI i Telegrama

Jakiś czas temu zacząłem rozglądać się za samochodem. Szybko znużyło mnie przeglądanie ofert na np. otomoto.pl, więc postanowiłem zautomatyzować cały proces i ułatwić sobie życie.

Pomysł na cały system jest bardzo prosty. Mam kod w Pythonie, który pobiera nowe oferty z otomoto.pl (dla wybranego przeze mnie samochodu). Kod jest uruchamiany co 10 minut na RPI za pomocą cron joba, a następnie wysyła do mnie wszystkie nowe oferty na kanale na Telegramie. Tym sposobem nie muszę tworzyć nowego systemu powiadomień, a mam informację od razu na i komórce, i komputerze.

Pierwszym krokiem jest naturalnie pobranie wszystkich ofert. Podzieliłem to na 3 funkcje: pobranie listy ofert na jednej stronie, pobranie szczegółów jednej oferty, a potem pobranie wszystkich stron. Wszystkie oferty będą zapisywane w pliku w formacie JSON, który będzie wczytywany na samym początku kodu. Tym sposobem będę mógł sprawdzić, które oferty są nowe, a które stare.

Pracę rozpoczynam od analizy kodu otomoto.pl, tutaj przyda się narzędzie do inspekcji kodu (Ctrl + Shift + I) i podglądu sieci (Ctrl + Shift + E). Szybko zauważam, że wszystkie oferty znajdują się w elemencie “article”:

Dlatego w pierwszej funkcji, jaką stworzę podaję właśnie “article” nawigując za pomocą xpath. W parametrze URL są wszystkie filtry z otomoto.pl, na jakich mi zależy. Następnie pobieram wszystkie kawałki kodu pod “article” i sprawdzam jaki jest adres URL każdego z nich. Jeśli nie ma go w pliku ze starymi ofertami, pobieram szczegóły i wysyłam powiadomienie.

def get_offers(n):

#This URL is what I was looking for, but you may change it as you need

    url = 'https://www.otomoto.pl/osobowe/volkswagen/tiguan/seg-suv/od-2013/?search%5Bfilter_float_year%3Ato%5D=2016&search%5Bfilter_float_mileage%3Ato%5D=100000&search%5Bfilter_enum_fuel_type%5D%5B0%5D=petrol&search%5Border%5D=created_at_first%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D=&page='+str(n)

    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

    request = requests.get(url, headers = headers)

    tree = html.fromstring(request.text)

    xpath_offer_details = '//div[@class="offers list"]/article'#//text()

    xpath_url = '//div[@class="offers list"]/article/@data-href'#//text()

    offer_details = tree.xpath(xpath_offer_details)

    list_of_urls = tree.xpath(xpath_url)

for i, detail in enumerate(offer_details):

try:

if not list_of_urls[i] in previous_offers: #check if URLs was present before, if not download all the details

                previous_offers[list_of_urls[i]] = get_single_offer(detail)

                sendTelegram(list_of_urls[i])

#VIN and Phone require seperate logic

                offer_id = list_of_urls[i].split("-ID")[1].split("html")[0]

                sendTelegram("VIN: "+get_vin_and_phone(offer_id)[0])

                sendTelegram("Phone: " + get_vin_and_phone(offer_id)[1])

except Exception as e:

print(e)

def get_offers(n):

#This URL is what I was looking for, but you may change it as you need

url = 'https://www.otomoto.pl/osobowe/volkswagen/tiguan/seg-suv/od-2013/?search%5Bfilter_float_year%3Ato%5D=2016&search%5Bfilter_float_mileage%3Ato%5D=100000&search%5Bfilter_enum_fuel_type%5D%5B0%5D=petrol&search%5Border%5D=created_at_first%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D=&page='+str(n)

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

request = requests.get(url, headers = headers)

tree = html.fromstring(request.text)

xpath_offer_details = '//div[@class="offers list"]/article'#//text()

xpath_url = '//div[@class="offers list"]/article/@data-href'#//text()

offer_details = tree.xpath(xpath_offer_details)

list_of_urls = tree.xpath(xpath_url)

for i, detail in enumerate(offer_details):

try:

if not list_of_urls[i] in previous_offers: #check if URLs was present before, if not download all the details

previous_offers[list_of_urls[i]] = get_single_offer(detail)

sendTelegram(list_of_urls[i])

#VIN and Phone require seperate logic

offer_id = list_of_urls[i].split("-ID")[1].split("html")[0]

sendTelegram("VIN: "+get_vin_and_phone(offer_id)[0])

sendTelegram("Phone: " + get_vin_and_phone(offer_id)[1])

except Exception as e:

print(e)

Aby pobrać VIN i Telefon musiałem sprawdzić ruch, jaki jest generowany przez stronę. Tym sposobem odkryłem pod jakim adresem te informacje się znajdują. Nie będę musiał przechodzić przez ReCaptcha ani uruchamiać JavaScript.

Na tej podstawie napisałem funkcję:

def get_vin_and_phone(id):

#Digging in website's code let me discover that Vin and Phone number are available under those URLs without any additional authentication

    vin_url = "https://www.otomoto.pl/ajax/misc/vin/"

    phone_url = "https://www.otomoto.pl/ajax/misc/contact/multi_phone/{}/0"

    request = requests.get(vin_url+id)

    vin = request.text.replace("\"","")

    request = requests.get(phone_url.format(id))

    phone = json.loads(request.text)["value"].replace(" ","")

return vin, phone

def get_vin_and_phone(id):

#Digging in website's code let me discover that Vin and Phone number are available under those URLs without any additional authentication

vin_url = "https://www.otomoto.pl/ajax/misc/vin/"

phone_url = "https://www.otomoto.pl/ajax/misc/contact/multi_phone/{}/0"

request = requests.get(vin_url+id)

vin = request.text.replace("\"","")

request = requests.get(phone_url.format(id))

phone = json.loads(request.text)["value"].replace(" ","")

return vin, phone

Aby pobrać VIN lub telefon wystarczy wkleić w przeglądarkę https://www.otomoto.pl/ajax/misc/vin/ + ID oferty, która znajduje się na końcu adresu każdej oferty.

Kolejna funkcja pobiera szczegóły każdej oferty zapisanej pod article. Tutaj również bardzo przydaje się inspekcja kodu. Na czerwono zaznaczyłem elementy, które mnie interesują:

Tutaj również nawigując za pomocą Xpath, pobieram informacje, które są dla mnie istotne. Dodatkowo, mając URL zdjęcia samochodu, pobieram to zdjęcie na dysk i zapisuje jako image.jpeg. To zdjęcie zostanie mi wysłane przez Telegram razem ze wszystkimi szczegółami oferty.

def get_single_offer(html_element):

#This function will enter html_element and retrieve all offer details basing on xpath

    single_offer_details = {}

    single_offer_details['price'] = html_element.xpath('div[@class="offer-item__content ds-details-container"]/div[@class="offer-item__price"]/div/div/span/span')[0].text_content().strip()

    single_offer_details['foto'] = html_element.xpath('div[@class="offer-item__photo  ds-photo-container"]/a/img/@data-srcset')[0].split(';s=')[0]

    single_offer_details['offer_details'] =  html_element.xpath('div[@class="offer-item__content ds-details-container"]/*[@class="ds-params-block"]/*[@class="ds-param"]/span/text()')

    urllib.request.urlretrieve(single_offer_details['foto'], "/home/user/Documents/Programowanie/TiguanWatchOut/image.jpeg") #This will save photo from URL and save it locally. This will enable me to add this to my telegram

    sendTelegram('Nowy Tiguan, Price: '+ single_offer_details['price']+', Details: ' + ', '.join(offer_details))

    sendPhoto()

return single_offer_details

def get_single_offer(html_element):

#This function will enter html_element and retrieve all offer details basing on xpath

single_offer_details = {}

single_offer_details['price'] = html_element.xpath('div[@class="offer-item__content ds-details-container"]/div[@class="offer-item__price"]/div/div/span/span')[0].text_content().strip()

single_offer_details['foto'] = html_element.xpath('div[@class="offer-item__photo ds-photo-container"]/a/img/@data-srcset')[0].split(';s=')[0]

single_offer_details['offer_details'] = html_element.xpath('div[@class="offer-item__content ds-details-container"]/*[@class="ds-params-block"]/*[@class="ds-param"]/span/text()')

urllib.request.urlretrieve(single_offer_details['foto'], "/home/user/Documents/Programowanie/TiguanWatchOut/image.jpeg") #This will save photo from URL and save it locally. This will enable me to add this to my telegram

sendTelegram('Nowy Tiguan, Price: '+ single_offer_details['price']+', Details: ' + ', '.join(offer_details))

sendPhoto()

return single_offer_details

Aby pobrać wiele stron muszę mieć informację ile ich jest. Dlatego za pomocą inspektora sprawdzam, gdzie znajduje się to w kodzie HTML i pobieram za pomocą funkcji:

def get_number_of_pages():

#This function will just retrieve the maximum number of pages on the website. This is used when iterating through n pages

    url = 'https://www.otomoto.pl/osobowe/volkswagen/tiguan/seg-suv/od-2016/?search%5Bfilter_float_year%3Ato%5D=2018&search%5Bfilter_float_mileage%3Ato%5D=60000&search%5Bfilter_enum_fuel_type%5D%5B0%5D=petrol&search%5Border%5D=created_at_first%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D=&page=1'

    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

    request = requests.get(url, headers = headers)

    tree = html.fromstring(request.text)

    max_page= tree.xpath('//ul[@class="om-pager rel"]/li[last()-1]/a/span/text()')[0].strip()

return int(max_page)

def get_number_of_pages():

#This function will just retrieve the maximum number of pages on the website. This is used when iterating through n pages

url = 'https://www.otomoto.pl/osobowe/volkswagen/tiguan/seg-suv/od-2016/?search%5Bfilter_float_year%3Ato%5D=2018&search%5Bfilter_float_mileage%3Ato%5D=60000&search%5Bfilter_enum_fuel_type%5D%5B0%5D=petrol&search%5Border%5D=created_at_first%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D=&page=1'

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

request = requests.get(url, headers = headers)

tree = html.fromstring(request.text)

max_page= tree.xpath('//ul[@class="om-pager rel"]/li[last()-1]/a/span/text()')[0].strip()

return int(max_page)

Mając już te kawałki kodu, mogłem zacząć wysyłkę poprzez Telegram. Żeby mieć taką możliwość musiałem stworzyć Bota. Aby to zrobić należy rozpocząć rozmowę z BotFather. Wystarczy napisać /newbot, podać nazwę Bota, a w odpowiedzi dostaniemy token.

Korzystając z tego tokena mogłem stworzyć dwie funkcje. Jedna do wysyłania wiadomości na czacie, a druga do wysyłania zdjęcia image.jpeg do czatu.

def sendTelegram(message):

    token = '<Your Bot Token>'

    method = 'sendMessage'

    url = 'https://api.telegram.org/bot{0}/{1}'.format(token, method)

try:

        response = requests.post(url=url , data = {'chat_id':<Your Chat ID>,'text':message , 'attachments':[{}]}).json()

print(response)

except Exception as e: print(e)

def sendPhoto():

    token = '<Your Bot Token>'

    method = 'sendPhoto'

    data={'chat_id': <Your Chat ID>}

    files = {'photo': ("/home/user/Documents/Programowanie/TiguanWatchOut/image.jpeg", open("/home/user/Documents/Programowanie/TiguanWatchOut/image.jpeg",'rb'))}

try:

        response = requests.post(url='https://api.telegram.org/bot{0}/{1}'.format(token, 'sendPhoto'),data=data, files=files).json()

print(response)

except Exception as e: print(e)

def sendTelegram(message):

token = '<Your Bot Token>'

method = 'sendMessage'

url = 'https://api.telegram.org/bot{0}/{1}'.format(token, method)

try:

response = requests.post(url=url , data = {'chat_id':<Your Chat ID>,'text':message , 'attachments':[{}]}).json()

print(response)

except Exception as e: print(e)

def sendPhoto():

token = '<Your Bot Token>'

method = 'sendPhoto'

data={'chat_id': <Your Chat ID>}

files = {'photo': ("/home/user/Documents/Programowanie/TiguanWatchOut/image.jpeg", open("/home/user/Documents/Programowanie/TiguanWatchOut/image.jpeg",'rb'))}

try:

response = requests.post(url='https://api.telegram.org/bot{0}/{1}'.format(token, 'sendPhoto'),data=data, files=files).json()

print(response)

except Exception as e: print(e)

Mając już gotowy kod, mogłem zapisać do na moim RPI, który jest stale uruchomiony. Tym sposobem jestem pewien, że skrypt zawsze będzie działał. Aby kod odpalał się automatycznie skorzystałem z usługi CRON, która jest swoistym harmonogramem zadań na linux. Aby dodać nowe zadanie wpisuję:

crontab -e

1	crontab -e

A następnie dodaję w pliku adres mojego skryptu:

0,10,20,30,40,50 * * * * /usr/bin/python3 /home/user/Documents/Programowanie/TiguanWatchOut/tiguan.py

1	0,10,20,30,40,50 * * * * /usr/bin/python3 /home/user/Documents/Programowanie/TiguanWatchOut/tiguan.py

Taki zapis oznacza, że o dowolnego dnia, o dowolnej godzinie, która mi minuty 10, 20, 30 itp. skrypt zadziała.

I właśnie w ten sposób udało mi się stworzyć automat, który powiadamia mnie o ciekawych ofertach. Przykład działania:

Tym sposobem mogę być pierwszą osobą, która dzwoni do sprzedającego i mieć przewagę pierwszeństwa.

Poniżej cały kod. Zapraszam do komentowania.

import requests

from lxml import html

import os

import json

import time

from telegram import sendTelegram

from telegram import sendPhoto

import datetime

import urllib.request

# All offers are saved in Json file, so at the beginning I load everything into variable, to check for URLs. I treat URLs as unique identifiers

with open('/home/user/Documents/Programowanie/TiguanWatchOut/tiguan.json') as json_file:

    previous_offers = json.load(json_file)

# This function will open n-th page and save all details of offers that are not present in tiguan.json

def get_offers(n):

#This URL is what I was looking for, but you may change it as you need

    url = 'https://www.otomoto.pl/osobowe/volkswagen/tiguan/seg-suv/od-2013/?search%5Bfilter_float_year%3Ato%5D=2016&search%5Bfilter_float_mileage%3Ato%5D=100000&search%5Bfilter_enum_fuel_type%5D%5B0%5D=petrol&search%5Border%5D=created_at_first%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D=&page='+str(n)

    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

    request = requests.get(url, headers = headers)

    tree = html.fromstring(request.text)

    xpath_offer_details = '//div[@class="offers list"]/article'#//text()

    xpath_url = '//div[@class="offers list"]/article/@data-href'#//text()

    offer_details = tree.xpath(xpath_offer_details)

    list_of_urls = tree.xpath(xpath_url)

for i, detail in enumerate(offer_details):

try:

if not list_of_urls[i] in previous_offers: #check if URLs was present before, if not download all the details

                previous_offers[list_of_urls[i]] = get_single_offer(detail)

                sendTelegram(list_of_urls[i])

#VIN and Phone require seperate logic

                offer_id = list_of_urls[i].split("-ID")[1].split(".html")[0]

print(offer_id)

                sendTelegram("VIN: "+str(get_vin_and_phone(offer_id)[0]))

                sendTelegram("Phone: " + str(get_vin_and_phone(offer_id)[1]))

except Exception as e:

print(e)

print("sss")

def get_single_offer(html_element):

#This function will enter html_element and retrieve all offer details basing on xpath

    single_offer_details = {}

    single_offer_details['price'] = html_element.xpath('div[@class="offer-item__content ds-details-container"]/div[@class="offer-item__price"]/div/div/span/span')[0].text_content().strip()

    single_offer_details['foto'] = html_element.xpath('div[@class="offer-item__photo  ds-photo-container"]/a/img/@data-srcset')[0].split(';s=')[0]

    single_offer_details['offer_details'] =  html_element.xpath('div[@class="offer-item__content ds-details-container"]/*[@class="ds-params-block"]/*[@class="ds-param"]/span/text()')

    urllib.request.urlretrieve(single_offer_details['foto'], "/home/user/Documents/Programowanie/TiguanWatchOut/image.jpeg") #This will save photo from URL and save it locally. This will enable me to add this to my telegram

    sendTelegram('Nowy Tiguan, Price: '+ single_offer_details['price']+', Details: ' + ', '.join(single_offer_details['offer_details']))

    sendPhoto()

return single_offer_details

def get_number_of_pages():

#This function will just retrieve the maximum number of pages on the website. This is used when iterating through n pages

    url = 'https://www.otomoto.pl/osobowe/volkswagen/tiguan/seg-suv/od-2016/?search%5Bfilter_float_year%3Ato%5D=2018&search%5Bfilter_float_mileage%3Ato%5D=60000&search%5Bfilter_enum_fuel_type%5D%5B0%5D=petrol&search%5Border%5D=created_at_first%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D=&page=1'

    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

    request = requests.get(url, headers = headers)

    tree = html.fromstring(request.text)

    max_page= tree.xpath('//ul[@class="om-pager rel"]/li[last()-1]/a/span/text()')[0].strip()

return int(max_page)

def get_everything():

#This function iterates through all pages, saving everything into globabl variable previous_offers that will be saves to json.

for i in range(1,get_number_of_pages()):

        get_offers(i)

try:

with open('/home/user/Documents/Programowanie/TiguanWatchOut/tiguan.json', 'w') as json_file:

            json.dump(previous_offers, json_file)

except Exception as e:

        sendTelegram(e)

def get_vin_and_phone(id):

#Digging in website's code let me discover that Vin and Phone number are available under those URLs without any additional authentication

    vin_url = "https://www.otomoto.pl/ajax/misc/vin/"

    phone_url = "https://www.otomoto.pl/ajax/misc/contact/multi_phone/{}/0"

    request = requests.get(vin_url+id)

    vin = request.text.replace("\"","")

    request = requests.get(phone_url.format(id))

    phone = json.loads(request.text)["value"].replace(" ","")

#print(vin)

#print(phone)

return vin, phone

get_everything()

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

import requests

from lxml import html

import os

import json

import time

from telegram import sendTelegram

from telegram import sendPhoto

import datetime

import urllib.request

# All offers are saved in Json file, so at the beginning I load everything into variable, to check for URLs. I treat URLs as unique identifiers

with open('/home/user/Documents/Programowanie/TiguanWatchOut/tiguan.json') as json_file:

previous_offers = json.load(json_file)

# This function will open n-th page and save all details of offers that are not present in tiguan.json

def get_offers(n):

#This URL is what I was looking for, but you may change it as you need

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

request = requests.get(url, headers = headers)

tree = html.fromstring(request.text)

xpath_offer_details = '//div[@class="offers list"]/article'#//text()

xpath_url = '//div[@class="offers list"]/article/@data-href'#//text()

offer_details = tree.xpath(xpath_offer_details)

list_of_urls = tree.xpath(xpath_url)

for i, detail in enumerate(offer_details):

try:

if not list_of_urls[i] in previous_offers: #check if URLs was present before, if not download all the details

previous_offers[list_of_urls[i]] = get_single_offer(detail)

sendTelegram(list_of_urls[i])

#VIN and Phone require seperate logic

offer_id = list_of_urls[i].split("-ID")[1].split(".html")[0]

print(offer_id)

sendTelegram("VIN: "+str(get_vin_and_phone(offer_id)[0]))

sendTelegram("Phone: " + str(get_vin_and_phone(offer_id)[1]))

except Exception as e:

print(e)

print("sss")

def get_single_offer(html_element):

#This function will enter html_element and retrieve all offer details basing on xpath

single_offer_details = {}

single_offer_details['price'] = html_element.xpath('div[@class="offer-item__content ds-details-container"]/div[@class="offer-item__price"]/div/div/span/span')[0].text_content().strip()

single_offer_details['foto'] = html_element.xpath('div[@class="offer-item__photo ds-photo-container"]/a/img/@data-srcset')[0].split(';s=')[0]

single_offer_details['offer_details'] = html_element.xpath('div[@class="offer-item__content ds-details-container"]/*[@class="ds-params-block"]/*[@class="ds-param"]/span/text()')

sendTelegram('Nowy Tiguan, Price: '+ single_offer_details['price']+', Details: ' + ', '.join(single_offer_details['offer_details']))

sendPhoto()

return single_offer_details

def get_number_of_pages():

#This function will just retrieve the maximum number of pages on the website. This is used when iterating through n pages

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

request = requests.get(url, headers = headers)

tree = html.fromstring(request.text)

max_page= tree.xpath('//ul[@class="om-pager rel"]/li[last()-1]/a/span/text()')[0].strip()

return int(max_page)

def get_everything():

#This function iterates through all pages, saving everything into globabl variable previous_offers that will be saves to json.

for i in range(1,get_number_of_pages()):

get_offers(i)

try:

with open('/home/user/Documents/Programowanie/TiguanWatchOut/tiguan.json', 'w') as json_file:

json.dump(previous_offers, json_file)

except Exception as e:

sendTelegram(e)

def get_vin_and_phone(id):

#Digging in website's code let me discover that Vin and Phone number are available under those URLs without any additional authentication

vin_url = "https://www.otomoto.pl/ajax/misc/vin/"

phone_url = "https://www.otomoto.pl/ajax/misc/contact/multi_phone/{}/0"

request = requests.get(vin_url+id)

vin = request.text.replace("\"","")

request = requests.get(phone_url.format(id))

phone = json.loads(request.text)["value"].replace(" ","")

#print(vin)

#print(phone)

return vin, phone

get_everything()

Michał Ćwiok

Analityka, chmura, prywatność i ogólnie takie takie.

Kupowanie samochodu za pomocą Pythona, RPI i Telegrama

Michał

Dodaj komentarz Anuluj pisanie odpowiedzi