mardi 28 juillet 2015

Finding Sales Rank based off Amazon API

I'm atttempting to gather various information on Amazon products that I specify.

Let's use for example:

ASIN: B0001UZQWG
ASIN: B0043WCH66

These are both normal, Amazon products that both have Sales Ranks as seen on their respective webpages.

However, when gathering their Sales Rank through Amazon's API, the first ASIN is NOT able to find its Sales Rank while the second ASIN is.

import amazonproduct
from bs4 import BeautifulSoup
import requests
config = {
    'access_key': 'AKIAJXO4FE4MTEPGZG5A',
    'secret_key': '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',
    'associate_tag': 'imnotsur-20',
    'locale': 'us'
}

api = amazonproduct.API(cfg=config)


result = api.item_lookup('B0001UZQWG', ResponseGroup = 'SalesRank')
for i in result.Items.Item:
    try:
        print 'Item One: ' + str(i.SalesRank)
    except:
        pass

result2 = api.item_lookup('B0043WCH66', ResponseGroup = 'SalesRank')
for i in result2.Items.Item:
    try:
        print 'Item Two: ' + str(i.SalesRank)
    except:
        pass

This simply returns:

Item Two: 20873

This issue is rather bizzare. So I checked out the HTML and it is obvious that there are major inconsistencies in the Amazon HTML that the API is unable to deal with.

This is the element for the first ASIN:

<ul class="zg_hrsr">
    <li class="zg_hrsr_item">
    <span class="zg_hrsr_rank">#187</span> 

And this is for the second:

<li id="SalesRank">
<b>Amazon Best Sellers Rank:</b>
#20,873 in Cell Phones &amp; Accessories (<a href="http://ift.tt/1ONRb4t">See Top 100 in Cell Phones &amp; Accessories</a>)
   <ul class="zg_hrsr">
       <li class="zg_hrsr_item">
           <span class="zg_hrsr_rank">#249</span>

It's apparent that the ' works and the other format doesn't.

My question is: what is the best way to compensate for these differences?

I attempted to "try" to get SalesRank through the API then parse the HTML if that didn't work, but was unsuccessful:

    info = api.item_lookup(asin, ResponseGroup='Medium')

    try:
        ranking = info.Items.Item.SalesRank
        return ranking
    except:
        url = 'http://ift.tt/1D7UJ0d'.format(asin)
        r = requests.get(url)
        soup = BeautifulSoup(r.content)

        y = soup.find_all("li")
        for i in y:
            try:
                ranking = i.find("span", {"class" : "zg_hrsr_rank"}).text
                print ranking
            except:
                pass




Aucun commentaire:

Enregistrer un commentaire