Word counting using Spark program from Windows 7 cmd prompt. Create a file in the name “SPARK_WORD_COUNT” and save on the C drive . Here we are trying to count the word “HADOOP” from the saved file . I added 17 “HADOOP” words in this file and end of the step , spark program counts 17 “HADOOP” […]
Category: Spark
Spark Getting started – Local development using eclipse
Spark Getting started – Local development using eclipse This article will help you to jump start on spark development on your PC or laptop (Windows) without having a fully functional Hadoop cluster installed. I have a 8 GB RAM , 128 GB storage, Windows 10 machine. These days I try to isolate development in various […]
Apache Spark – tuning spark jobs-Optimal setting for executor, core and memory
Executor, memory and core setting for optimal performance on Spark Spark is adopted by tech giants to bring intelligence to their applications. Predictive analysis and machine learning along with traditional data warehousing is using spark as the execution engine behind the scenes. I have been exploring spark since incubation and I have used spark core […]
Jumpstart Real-Time Projects with Apache Spark
Real-time analytics and fast big data processing are now a reality with Apache Spark and Talend 6. The Talend Real-Time Big Data Sandbox is the easiest, fastest and most powerful way to get data into Spark, Hadoop and NoSQL. With the Talend Sandbox, you can jumpstart real-time streaming and operational big data projects such as […]
Getting Started with Spark (in Python)
oop is the standard tool for distributed computing across really large data sets and is the reason why you see “Big Data” on advertisements as you walk through the airport. It has become an operating system for Big Data, providing a rich ecosystem of tools and techniques that allow you to use a large cluster […]