Handling millions of ad-request per hour
The below diagram summarizes the implementation of the ad targeting system. Ad request and response are handled via an Apache server. Each ad request is saved in a log file on the filesystem. Each hour a new log file is generated. System is designed to handle atleast a million requests per hour. Maximum server response time must not exceed 50 milliseconds.
Kafka is used to handle high loads. Cassandra is used as a scalable NoSQL solution. MySQL is used to store summary data in the system.
Synchronization of Hadoop Tasks.
Summary Data is also generated daily by using Amazon EMR implementation of Hadoop, and Amazon SWF for task synchronization. For reporting needs, the recent data that is not yet summarized, is fetched from Cassandra. The below workflow, explains the implemented flow.