dimanche 26 juillet 2015

Is there a way to compensate for varying information location when parsing HTML?

I'm trying to gather the ranking of Amazon products and I've noticed that the API functions can only return the SalesRank if the Sales Rank variable in the HTML is in a specific container with a specific parent. After analyzing several random amazon products, the HTML is not very consistent.

Is there anyway to compensate for this? I'm parsing with Beautiful Soup 4 in Python.




Aucun commentaire:

Enregistrer un commentaire