Algorithms, Design, Code and more: Amazon

Showing posts with label Amazon. Show all posts

Wednesday, April 17, 2013

Upload to Amazon S3 Bucket via Signed Url with Server Side Encryption

Continuing further from my previous post on upload & download from Amazon S3 bucket via signed url's, here is how to enable Server Side Encryption (SES) with the file being uploaded to S3.

Add a x-amz-server-side-encryption request parameter with the GeneratePresignedUrlRequest before getting the signed url:

Monday, April 1, 2013

Upload and Download from Amazon AWS S3 Bucket via Signed Url

While the code snippets are using the Java AWS SDKs, principally these will work with the other SDKs as well.

1. Get hold of FederatedCredentials using your AWS credentials:

Pass in proper access Policy settings for the FederatedCredentials on the S3 Bucket and/ or Item.

E.g.

For Download you could additionally set up ResponseHeaderOverrides for withContentDisposition, ContentType, etc.

2. Get BasicSessionCredentials using the Federated Credentials

3. Generate GeneratePresignedUrlRequest

4. Finally, generate a pre-signed url via the S3Client object:

5. To test this:
- Download:
Get the url.toString() & hit it from a browser

- Upload:

Wednesday, March 20, 2013

Uploading Large Files In Chunks To Amazon S3

A collection of best practices based on my experience building a scaled out solution for the server side file upload handler.

1. Authentication/ Authorization

2. Chunking

3. Stateless upload & Session

4. Shared memory for post file operations

5. Retries & Failover

6. Bulk operations

To be completed..

Wednesday, February 6, 2013

Mocking AWS ELB Behaviour Locally For Testing

Once hosted out of Amazon, you make use of the AWS Elastic Load Balancer (ELB) for balancing load across your EC2's within or acroos Availability Zones (AZ). Since code gets developed and tested locally (outside of Amazon), at times you might want to test load balancer scenarios before deploying to production. Here's one way to mock up the load balancer behaviour for local testing.

Use Apache (you could very well use something like Nginx instead) in a reverse proxy, load balancer set up via mod_proxy & mod_proxy_balancer. Fairly simple for anyone with slight experience with configuring Apache. We used Apache as a load balancer front-end to IIS on local, exactly the way ELB would load balance in front of production IIS.

Additionally, since ELB was also an SSL end point for our production servers, we set up Apache to be the SSL end point (via mod_ssl) on local. Apache was configured to listen on port 443 (using a self-signed certificate), and would forward all traffic from port 443 to backend IIS on port 80.

Once we had that set-up going, we were quickly able to reproduce an issue with application generated Secure cookies not getting set properly across client request/ response. Once we had the fix on the local (which was to set the flag on the cookies in the request, not response) the same worked flawlessly on the AWS as well.

Wednesday, October 10, 2012

Brewer's CAP Theorem

Brewer's CAP theorem talks about Consistency (C), Availability (A), Partition (P) tolerance, as the constraints that primarily govern the design of all distributed systems. There's a lot of literature available online explaining the theorem. The summary is that given that network partitions (P) will happen, pick one of the other two - Consistency (C) or Availability (A) for designing your system on a case by case basis (since you can't have all three)!

A partition could be caused by the failure of some kind of component - hardware (routers, gateway, cables, physical boxes/ nodes, disks, etc.) and/or software. When that happens:

- If you pick Consistency (C) => All your systems, processing, etc. is blocked/ held up until the failed component(s) recovers.

This has been the default with traditional RDBMS (thanks to their being ACID compliant). For financial & banking applications this normally has to be the choice.

- On the other hand, if you pick Availability (A) => All systems, other than the currently partitioned/ failed systems, continue to function as is within their own partitions. Seems good? Well not quite, cause this obviously results in inconsistencies across the two (or more) partitioned sections.

Systems thus designed with Availability (A) as their selection (over C), must be able to live with inconsistencies across different partitions. Such systems also have some automated way to later get back to consistent state (eventual consistency) once the partitioned/ failed systems have recovered.

This is mostly the design choice with the NoSqls. Also with services such as Amazon AWS where eventual consistency within some reasonable time window (of a few seconds to a minutes) is acceptable.