Cytoscape Web Graphs -> Force Directed Network Graph

Posted in API, Big Data, Web Tagged , , , ,

The graph is used for analysis of “Ad Targeting Data”. Relevant data is presented at campaign level. Targeted keywords and URLs are related in the graph.

The challenge here is that sometimes the data points to be shown on the graph can be about 1,76,000 which makes it extremely slow. Also calculations for 1,76,000 points is extremely time consuming.

So the solution proposed was to perform calculations as a back-end process. The calculations could be refreshed at any time, by the click of a button. The data requirements were sufficing with bg processing.

To give an idea of the calculations involved, the time taken by the bg process to be completed was 8 hours, on a dedicated server with 4GB RAM. Configurations are implemented to allow long running bg processes.

Force Directed Network Graph

The data on graph is restricted by number of minimum related nodes, which can be specified by administrator. For eg: only those data points will be shown which are related to atleast 5 other keywords. The top matching results can also be filtered.

Squash Logs – Amazon Web Services

Posted in Big Data, Cloud Tagged , , ,

Right from start, this project necessitated creation of design which is scalable in the cloud.

The data-store needed to be cloud based.

The concept is that on a web-request, the html page is pulled up, as a static file – from Cloudfront – which will then use JavaScript to populate the Opponents and Locations. So basically each user will not have to hit the web server or the database at all, unless they add a new Opponent/Location, or save a game result. On saving a call to DynamoDB is made to save the squash game data.

Basically each user had 2 sets of data – opponent list and location list. He could add values to this data set any time through the log form.

For each user, we have created json location file and json opponent file to store data on S3/Cloudfront. Hence user data is loaded very fast, without any serve load. Whenever user edits this data, the json files are updated via Amazon Web Services, API call.

An example of the implemented URL format would be:  Here “user1.json” is a random name, so users cannot loop through others users opponent or locations, by randomly guessing names.

Once the user clicks save, the data is sent to DynamoDB datastore via Amazon Web Services API. Admin can also update these logs or delete them via API calls.

You can read more about this project here.

Adserver – Pretargeting

Posted in Big Data, Cloud Tagged , , , , , , , , , , , ,

Handling millions of ad-request per hour

The below diagram summarizes the implementation of the ad targeting system. Ad request and response are handled via an Apache server. Each ad request is saved in a log file on the filesystem. Each hour a new log file is generated. System is designed to handle atleast a million requests per hour.  Maximum server response time must not exceed 50 milliseconds.

Kafka is used to handle high loads. Cassandra is used as a scalable NoSQL solution. MySQL is used to store summary data in the system.


Synchronization of Hadoop Tasks.

Summary Data is also generated daily by using Amazon EMR implementation of Hadoop, and Amazon SWF for task synchronization. For reporting needs, the recent data that is not yet summarized, is fetched from Cassandra. The below workflow, explains the implemented flow.

workflow diagram