Download E-books Programming Hive PDF

By Edward Capriolo, Dean Wampler

Need to maneuver a relational database program to Hadoop? This finished consultant introduces you to Apache Hive, Hadoop’s facts warehouse infrastructure. You’ll speedy find out how to use Hive’s SQL dialect—HiveQL—to summarize, question, and examine huge datasets saved in Hadoop’s disbursed filesystem.

This example-driven consultant indicates you the way to establish and configure Hive on your setting, offers an in depth evaluate of Hadoop and MapReduce, and demonstrates how Hive works in the Hadoop atmosphere. You’ll additionally locate real-world case reviews that describe how businesses have used Hive to resolve distinctive difficulties regarding petabytes of data.

  • Use Hive to create, adjust, and drop databases, tables, perspectives, services, and indexes
  • Customize information codecs and garage suggestions, from records to exterior databases
  • Load and extract info from tables—and use queries, grouping, filtering, becoming a member of, and different traditional question methods
  • Gain most sensible practices for developing person outlined features (UDFs)
  • Learn Hive styles you can use and anti-patterns you'll want to avoid
  • Integrate Hive with different facts processing programs
  • Use garage handlers for NoSQL databases and different datastores
  • Learn the professionals and cons of working Hive on Amazon’s Elastic MapReduce

Show description

Read Online or Download Programming Hive PDF

Best Computers books

Digital Design and Computer Architecture, Second Edition

Electronic layout and laptop structure takes a different and glossy method of electronic layout. starting with electronic good judgment gates and progressing to the layout of combinational and sequential circuits, Harris and Harris use those primary construction blocks because the foundation for what follows: the layout of a precise MIPS processor.

The Linux Programmer's Toolbox

Grasp the Linux instruments that would Make You a extra effective, powerful Programmer The Linux Programmer's Toolbox is helping you faucet into the significant number of open resource instruments to be had for GNU/Linux. writer John Fusco systematically describes the main helpful instruments on hand on such a lot GNU/Linux distributions utilizing concise examples for you to simply regulate to satisfy your wishes.

Algorithms in C++, Parts 1-4: Fundamentals, Data Structure, Sorting, Searching, Third Edition

Robert Sedgewick has completely rewritten and considerably increased and up to date his renowned paintings to supply present and finished insurance of vital algorithms and knowledge buildings. Christopher Van Wyk and Sedgewick have built new C++ implementations that either exhibit the tools in a concise and direct demeanour, and likewise offer programmers with the sensible capability to check them on actual functions.

Introduction to Machine Learning (Adaptive Computation and Machine Learning series)

The objective of computer studying is to software desktops to exploit instance info or previous adventure to resolve a given challenge. Many profitable functions of computer studying already exist, together with platforms that learn previous revenues info to foretell client habit, optimize robotic habit in order that a role will be accomplished utilizing minimal assets, and extract wisdom from bioinformatics info.

Extra resources for Programming Hive

Show sample text content

Groovysh presents a REPL for interactive programming. Groovy code is identical to Java code, even though it does produce other types together with closures. For the main half, you could write Groovy as you will write Java. Connecting to HiveServer From the REPL, import Hive- and Thrift-related sessions. those sessions are used to connect with Hive and create an example of HiveClient. HiveClient has the equipment clients will usually use to engage with Hive: $ $HOME/groovy/groovy-1. eight. 0/bin/groovysh Groovy Shell (1. eight. zero, JVM: 1. 6. 0_23) kind 'help' or '\h' for support. groovy:000> import org. apache. hadoop. hive. provider. *; groovy:000> import org. apache. thrift. protocol. *; groovy:000> import org. apache. thrift. delivery. *; groovy:000> delivery = new TSocket("localhost" , 10000); groovy:000> protocol = new TBinaryProtocol(transport); groovy:000> customer = new HiveClient(protocol); groovy:000> delivery. open(); groovy:000> buyer. execute("show tables"); Getting Cluster prestige The getClusterStatus process retrieves details from the Hadoop JobTracker. this is often used to assemble functionality metrics and will even be used to attend for a lull to release a role: groovy:000> purchaser. getClusterStatus() ===> HiveClusterStatus(taskTrackers:50, mapTasks:52, reduceTasks:40, maxMapTasks:480, maxReduceTasks:240, state:RUNNING) consequence Set Schema After executing a question, you will get the schema of the outcome set utilizing the getSchema() procedure. in case you name this system ahead of a question, it may possibly go back a null schema: groovy:000> patron. getSchema() ===> Schema(fieldSchemas:null, properties:null) groovy:000> customer. execute("show tables"); ===> null groovy:000> patron. getSchema() ===> Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) Fetching effects After a question is administered, you could fetch effects with the fetchOne() strategy. Retrieving huge outcome units with the Thrift interface isn't urged. in spite of the fact that, it does supply a number of tips on how to retrieve information utilizing a one-way cursor. The fetchOne() process retrieves a whole row: groovy:000> patron. fetchOne() ===> cookjar_small rather than retrieving rows separately, the whole outcome set might be retrieved as a string array utilizing the fetchAll() approach: groovy:000> customer. fetchAll() ===> [macetest, missing_final, one, time_to_serve, ] additionally to be had is fetchN, which fetches N rows at a time. Retrieving question Plan After a question is begun, the getQueryPlan() approach is used to retrieve prestige information regarding the question. the data comprises details on counters and the kingdom of the activity: groovy:000> consumer. execute("SELECT * FROM time_to_serve"); ===> null groovy:000> patron. getQueryPlan() ===> QueryPlan(queries:[Query(queryId:hadoop_20120218180808_... -aedf367ea2f3, queryType:null, queryAttributes:{queryString=SELECT * FROM time_to_serve}, queryCounters:null, stageGraph:Graph(nodeType:STAGE, roots:null, adjacencyList:null), stageList:null, done:true, started:true)], done:false, started:false) (A lengthy quantity was once elided.

Rated 4.37 of 5 – based on 3 votes