Wednesday, November 27, 2013

Learning Guava -- Google Guava blog series

​I have been a huge fan of Google Guava from the time I came across it 3 years back.
For starters, Guava is a project which contains many Google's core libraries like collections, caching, math, primitives, concurrency, networking, common annotations, string processing, I/O, refelction and many others.
It is very well designed API. Guava is designed, implemented and maintained by Google Engineers like Kevin Bourrillion and Kurt Alfred Kluever, etc.

Guava follows almost all the excellent patterns and practices mentioned in Effective Java book written by Joshua Bloch, who has designed the impeccable Java Collections API while he was at Sun. Later he joined Google. Under his mentor-ship, Google Guava got wings and became a very well designed and effective API, useful for many situations and scenarios with an ever-growing feature list. I ensure I add Guava dependency as the first thing to my Gradle or Maven build script. Guava makes Java code a lot more readable, clean, simple and elegant. It utilises the Java generics very well.

Consider the following example which I tweeted few months back.
Google Guava sample code

Which of the above versions looks fine? Obviously the second option, aint it?
There are many such examples where Guava wins by a margin compared to normal Java code and or other libraries like commons, etc.

Guava also helps for [in a way] functional programming too. There are few options which are really helpful there as well. Having said that, Guava creators implore the developers not to litter code with too much functional programming which might lead to unreadable code.

I will start with writing few posts on Google Guava with the tag, "LearningGuava". I have been using Guava extensively in almost every project of mine since few years. This will not only help some one else looking for info or starting on Google Guava, but as well as for me also so that I will remember in future if I need any quick snippet on something specific with Guava usage. That being the motivation, I hope it will be of good experience for you and me as well.

This post will have list of all the posts written for Google Guava. This post kinda serves as an Index and quick reference of my Google Guava posts.

  1. Load properties file using Guava
  2. Calibrating time using Stopwatch 

Update on 05th April, 2015: After a fair bit of time here, I have moved on to GitHub hosted Octopress blogs. Please find me on henceforth for all new updates.

Monday, July 15, 2013

Presented Storm at The Fifth Elephant, 2013

On 11th July, 2013 I hosted a workshop on "Big Data, Real-time Processing and Storm"at The Fifth Elephant, 2013 here in Bangalore.

I have also uploaded the slides of the presentation deck to SpeakerDeck.

The source code I used in this workshop can be found on my GitHub account.

I have also started curating a bundle for Storm and Big Data as such.

Update on 05th April, 2015: After a fair bit of time here, I have moved on to GitHub hosted Octopress blogs. Please find me on henceforth for all new updates.

Saturday, May 25, 2013

My upcoming workshop on Storm at The Fifth Elephant, 2013

The Fifth Elephant is a conference in Bangalore, India which focuses on Big Data and Analytics. Its a community powered conference. This means, as highlighted in their website, it is "Of the Community, By the Community, For the Community". So, any one can propose a session as such for the conference in the Funnel. Participants who have purchased conference tickets can vote on session proposals.

In 2012 edition, Fifth Elephant with more than 50 sessions, attracted 600+ participants from many MNCs, startups alike. This 2-day conference was preceded by a one-day workshop sessions as well due to the overwhelming demand. The biggest USP of any of the HasGeek organized conferences [other than being community powered conferences] is, they live stream most of the sessions and they also upload all the recorded videos to Youtube and / or HasGeek TV.

I gave a session on Introduction to Pig at The Fifth Elephant last year. I have written about it previously here.

The Fifth Elephant is back this year. Its even better with a dedicated day for Workshops at the same venue and the regular 2-day Conference on Big Data, Storage and Analytics and also with the product demos and hacker corners. There are some wonderful sessions proposed including one on Neo4J, Julia, etc and few sessions have already been selected by the Program Committee.

I have been working on Big Data especially on Hadoop Ecosystem since more than 2 years now. I am fascinated by Big Data and various tools / frameworks which help analyze such large amounts of data. During this time, I came across Storm, which not just analyzes the Big Data, but analyzes in real-time. Yes, real-time very unlike Hadoop, which is basically batch-processing. I worked on couple of use cases and processed the streaming live data in really real-time using Storm. I can quote streaming tweets as my main source of real-time data , which I processed for multiple use cases using Storm.

This year I have proposed a session on Storm titled "Big Data, Real-time Processing and Storm" and it has been accepted as the first workshop this time around. I will be speaking on 11th July, 2013. It will be a live-coding session, which will help the participants understand and appreciate Storm as one of the better alternatives of Hadoop. Below is the outline of this workshop.

I have also uploaded the slides of the outline of this workshop to SpeakerDeck.

Check the above slides and do let me know if you have any feedback and / or comments on this outline for the workshop.

Wish me good luck on @P7h. And also if you happen to be there in this Conference, do come and say hi.

Please find the complete slides of this workshop session here.

Update on 05th April, 2015: After a fair bit of time here, I have moved on to GitHub hosted Octopress blogs. Please find me on henceforth for all new updates.

Monday, May 20, 2013

Open Source licenses

Understanding Open Source licenses turns out to be rather too difficult. At least I always have had issues in understanding which Open Source license is too restrictive and which is a bit liberal.

After looking around for some time, I found the following 3 alternatives for easily understanding the terms of few of the Open Source licenses.


+Brian Fitzpatrick and +Dan Bentley have made a brilliant flow chart for Open Source licenses. It is pretty simple and easy to understand.


+Marakana have another interesting flowchart for understanding Open Source licenses.


And finally another option to understand Open Source licenses is tl;drLegal Website, which summarizes and explains Open Source licenses in simple terms and in plain English. Its a pretty decent website which is fast and also very intuitive and easy to use. Just key in the name of the License you want to read about. The website will do the rest with a quick summary and also the full text of this particular license.

I always look up to one of these alternatives when I am in doubt about licensing terms of a particular Open Source License. And when I need to check the complete text of a license, I usually use tl;drLegal Website.

Hope this is of some help to you too.

Update on 05th April, 2015: After a fair bit of time here, I have moved on to GitHub hosted Octopress blogs. Please find me on henceforth for all new updates.

Saturday, January 26, 2013

My presentation on "Introduction to Pig"

Few months back, I conducted a 2-hour workshop on "Introduction to Pig" at Fifth Elephant, Bangalore, India on 26th July, 2012. This is a community-powered conference on the Big Data ecosystem.

As part of this workshop, I have touched a bit on Hadoop, MapReduce and Hive. But as the title says, the focus was on Apache Pig. I have also demoed few usecases of execution of Java MapReduce, Hive and Pig. And also a brief overview and demo of Twitter's Ambrose UI for visualizing Pig MapReduce jobs.

Here are the slides of my presentation. This presentation gives a basic understanding of
  1. Big Data
  2. Basics of Hadoop and MapReduce
  3. Landscape of Hadoop ecosystem
  4. Introduction to Apache Pig
  5. Basics of Pig and Pig Latin
  6. Pig vs. Hadoop MR
  7. Pig vs. SQL and Pig vs. Hive
  8. Twitter Ambrose for visualizing Pig MR Jobs

I have also posted the same slides on Speaker Deck.
Code developed for the demos in this workshop can be found on Github.

Update on 05th April, 2015: After a fair bit of time here, I have moved on to GitHub hosted Octopress blogs. Please find me on henceforth for all new updates.

Sunday, January 20, 2013

Aaron Swartz Memorial at New York

Aaron Swartz, a very accomplished and a highly talented nerd committed suicide last Friday, i.e. 11th January, 2013 at an young age of 26 years. So many people have written so much about him. Check the complete Wikipedia page and check the References section of his Wikipedia page for more info and other articles written by his friends. He has left wonderful impressions on Reddit, Creative Commons, SOPA, Markdown, RSS, to name a few of his outstanding contributions.
Aaron Swartz
Aaron's memorial was held on Saturday, 19th January, 2013 at Cooper Union, New York. Here is the complete set of recorded videos from his memorial which are hosted on livestream. There are totally 6 videos which you must checkout to understand what are his contributions and how talented a person he was and what impressions he has left people with, even such an young age.
If you are hard-pressed for time, then ensure you at least check the message from Taren Stinebrickner-Kauffman, partner of Aaron, embedded below. It is really inspiring.

Rest in peace Aaron!!