Easy explanation on difference between spark’s aggregate functions (reduceByKey, groupByKey and combineByKey) Spark comes with a lot of easy to use aggregate functions out of the box. For the same reason spark becomes a powerful technology for ETL on BigData. Grouping the data is a very common use case in the world of ETL. Just […]
Month: May 2016
How to Insert data to remote Hive server from Spark
Spark is the buzz word in world of BigData now.So what makes Spark so unique? As we know, Spark is fast – it use in memory computation on special data objects called RDD (Resilient distributed data set) Spark allows execution on multiple modes i.e. run standalone, run local (without even a hadoop server), on cluster […]