Looking for a Tutor Near You?

Post Learning Requirement »
x

Choose Country Code

x

Direction

x

Ask a Question

x

x
x
x
Hire a Tutor

Hadoop And Big Data

Loading...

Published in: Big Data & Hadoop
770 Views

Important Points On PIG Programming.

Priyashree B / Mumbai

35 years of teaching experience

Qualification: M.Tech (RGPV BHOPAL, MP - 2016)

Teaches: Mental Maths, All Subjects, EVS, Mathematics, School Level Computer, Science, Social Studies

Contact this Tutor
  1. Pig: If you don't know Java then this is the correct option for you 200 lines of code in MapReduce can be replaced by 10 lines of code in Pig We have built-in operations in Pig like: join , group, filter, sort ,etc. Who developed Pig - Yahoo Pig is an open-source high-level dataflow system. It provides a simple language for queries and data manipulation Pig Latin, that is compiled into map-reduce jobs that are run on Hadoop Companies like Yahoo, Google and Microsoft are collecting enormous data sets in the form of click streams, search logs, and web crawls. Some form of ad-hoc processing and analysis of all of this information is required Why was Pig Created- An ad-hoc way of creating and executing map-reduce jobs on very large data sets NO java is required Rapid development Why should I go for Pig if MR is there? Development time is lesser than MR almost 1/16 Lines of code is lesser than MR almost 1/20 Map-Reduce Powerful model for parallelism. Based on a rigid procedural structure. Provides a good opportunity to parallelize algorithm. Have a higher level declarative language. map Räöiuæe
  2. PIG It is desirable to have a higher level declarative language. Similar to SQL query where the user specifies the "what" and leaves the "how" to the underlying processing engine. Where should I use Pig: It is a data flow language It is at the top of Hadoop and enables to create jobs to process large volume of data quickly and efficiently We use PIG for following operations: Time sensitive data loads Processing many data sources Analytic Insight Through Sampling Where not to use PIG: Really nasty data formats or completely unstructured data (video, audio, raw human-readable text) Pig is definitely slow compared to Map Reduce jobs When you would like more power to optimize your code
  3. Real time examples where pig is used: Processing of Web Logs Data processing for search platforms Support for Ad Hoc queries across large datasets. Quick Prototyping of algorithms for processing large datasets