I'm atttempting to gather various information on Amazon products that I specify.
Let's use for example:
ASIN: B0001UZQWG
ASIN: B0043WCH66
These are both normal, Amazon products that both have Sales Ranks as seen on their respective webpages.
However, when gathering their Sales Rank through Amazon's API, the first ASIN is NOT able to find its Sales Rank while the second ASIN is.
import amazonproduct
from bs4 import BeautifulSoup
import requests
config = {
'access_key': 'AKIAJXO4FE4MTEPGZG5A',
'secret_key': '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',
'associate_tag': 'imnotsur-20',
'locale': 'us'
}
api = amazonproduct.API(cfg=config)
result = api.item_lookup('B0001UZQWG', ResponseGroup = 'SalesRank')
for i in result.Items.Item:
try:
print 'Item One: ' + str(i.SalesRank)
except:
pass
result2 = api.item_lookup('B0043WCH66', ResponseGroup = 'SalesRank')
for i in result2.Items.Item:
try:
print 'Item Two: ' + str(i.SalesRank)
except:
pass
This simply returns:
Item Two: 20873
This issue is rather bizzare. So I checked out the HTML and it is obvious that there are major inconsistencies in the Amazon HTML that the API is unable to deal with.
This is the element for the first ASIN:
<ul class="zg_hrsr">
<li class="zg_hrsr_item">
<span class="zg_hrsr_rank">#187</span>
And this is for the second:
<li id="SalesRank">
<b>Amazon Best Sellers Rank:</b>
#20,873 in Cell Phones & Accessories (<a href="http://ift.tt/1ONRb4t">See Top 100 in Cell Phones & Accessories</a>)
<ul class="zg_hrsr">
<li class="zg_hrsr_item">
<span class="zg_hrsr_rank">#249</span>
It's apparent that the ' works and the other format doesn't.
My question is: what is the best way to compensate for these differences?
I attempted to "try" to get SalesRank through the API then parse the HTML if that didn't work, but was unsuccessful:
info = api.item_lookup(asin, ResponseGroup='Medium')
try:
ranking = info.Items.Item.SalesRank
return ranking
except:
url = 'http://ift.tt/1D7UJ0d'.format(asin)
r = requests.get(url)
soup = BeautifulSoup(r.content)
y = soup.find_all("li")
for i in y:
try:
ranking = i.find("span", {"class" : "zg_hrsr_rank"}).text
print ranking
except:
pass
Aucun commentaire:
Enregistrer un commentaire