Enterprise Data Workflows with Cascading
Author | : | |
Rating | : | 4.35 (589 Votes) |
Asin | : | 1449358721 |
Format Type | : | paperback |
Number of Pages | : | 170 Pages |
Publish Date | : | 2016-12-03 |
Language | : | English |
DESCRIPTION:
There is an easier way to build Hadoop applications. This book demonstrates how this framework can help your business extract meaningful information from large amounts of distributed data.Start working on Cascading example projects right awayModel and analyze unstructured data in any format, from any sourceBuild and test applications with familiar constructs and reusable componentsWork with the Scalding and Cascalog Domain-Specific LanguagesEasily deploy applications to Hadoop, regardless of cluster location or data sizeBuild workflows that integrate several big data frameworks and processesExplore common use cases for Cascading, including features and tools that support themExamine a case study that uses a dataset from the Open Data Initiative. With this hands-on book, you’ll learn how to use Cascading, the open source abstraction framework for Hadoop that lets you easily create and manage powerful enterprise-grade data processing applications—without having to learn the intricacies of MapReduce.Working with sample apps based on Java and other JVM languages, you’ll quickly learn Cascading’s streamlined approach to data processing, data filtering, and workflow optimization
Sujit Pal said Good overview of Cascading and friends. This book is a good overview of Cascading and related software such as Scalding, Cascalog, Lingual, Pattern, etc. It will not make you a Cascading expert, but it will get you started writing Cascading code, probably much quicker than if you relied solely on online documentation and blog posts to get started. Cascading is a compact but very rich API that is also. M B said Good book, the author has clearly worked with real teams at real companies. At first glance, the book describes how to use the Cascading data pipe assemblies with Java, Scalding, and Cascalog. But it also includes rationale (in the form of computer science, software engineering, business risk, inter-team workflow, design patterns, etc) for why Cascading can be a best practice for enterprises. The book is casually infused with the autho. "It's the same FREE user documentation, just for a price" according to asarkar. I needed to learn Cascading fast for a new project and since this was the only book in the market, I purchased it without a second thought. Big mistake, should've looked at the Cascading website first! It's everything that's FREELY available in the user documentation, only for a price. The author didn't even bother to change the code, every single line is copie
He has presented twice on the AWS Start-Up Tour, and gives talks often about Hadoop, Data Science, and Cloud Computing.. About the AuthorPaco Nathan is a Data Scientist at Concurrent, Inc., and heads up the developer outreach program there. He has a dual background from Stanford in math/stats and distributed computing, with 25+ years experience in the tech industry. As an expert in Hadoop, R, predictive analytics, machine learning, natural language processing, Paco has built and led several expert Data Science teams, with data infrastructure based on large-s
He has a dual background from Stanford in math/stats and distributed computing, with 25+ years experience in the tech industry. He has presented twice on the AWS Start-Up Tour, and gives talks often about Hadoop, Data Science, and Cloud Computing.. Paco Nathan is a Data Scientist at Concurrent, Inc., and heads up the developer outreach program there. As an expert in Ha