lundi 31 août 2015

How to efficiently implement a newsletter system to prevent duplicate e-mails?

We run an e-commerce site with approximately 500,000 active customers. We regularly send newsletters via an external mail service (http://ift.tt/1lFeFI1) however as the company grows and we get more customers, our newsletter costs increase. I recently came across Amazon SES which is perfect for us as they are cheap and since we are already using AWS for our servers, it makes things a little easier to maintain. I know there's cheaper alternatives, but I would prefer to stick to Amazon SES.

Let's say I create a newsletter in my system that sends an e-mail to everyone that's having a birthday today. It just so happens that 400,000 people have a birthday today, so 400,000 e-mails get sent. The next day, the remaining 100,000 people have birthdays as well as 100,000 more people that already received the birthday e-mail yesterday but changed their birth date to this day because they entered it incorrectly when they signed up. The system now sees 200,000 e-mails to send, but should know to only send to 100,000 of them because 100,000 already received the e-mail for the time period (365 days).

The problem I'm having is regarding database and script efficiency.

On the first day, I store the e-mail addresses of 400,000 people in a table called "sent". On the second day, I retrieve the list of all e-mail addresses to send to. I then have the following options:

  • Query the "sent" table 200,000 times to see which e-mail addresses to send to - Extremely inefficient, plus unnecessary load on the server that is running the site itself.
  • Retrieve the entire "sent" table, store into an array and filter accordingly - Fast (just 1 DB query), but requires a lot of RAM. This is currently how I do things, but obviously I receive PHP memory exhausted errors, so it's no longer viable, especially as the site grows.
  • Sort the 200,000 e-mail addresses alphabetically and query the "sent" table in batches, such as all of the A e-mail addresses, all of the B e-mail addresses, and so on. - Fast and should use less RAM depending on the batch conditions, but what if all 200,000 e-mail addresses begin with the letter S?
  • Something else?!

I realize that these are very extreme examples that will never happen in a real environment, but I would prefer to (re)implement this correctly now and not have to revisit it again in the future.

If Amazon SES has this functionality built in, I would love to know where I can find it! I've searched their docs but I can't see it.

Does anyone have any experience in this field, or have any suggestions? I've been racking my brain for days trying to come up with a good solution, but I just can't come up with anything.

HELP! :)


EDIT: I'm using MySQL. I've looked into No-SQL but don't feel it's necessary.

EDIT: The birthday thing is just an example of a newsletter. It should be a completely generic system that does not rely on any predefined notions of what the data contains.

EDIT: The server already implements good e-mail practices such as SPF, DKIM, and so forth.




Aucun commentaire:

Enregistrer un commentaire