• impala SQL еще немного про новую опердень от клаудеры
    As of the initial Impala release(s):
    • Impala will run against a variety of storage managers, choices among which will have different performance implications. HDFS (Hadoop Distributed File System) and HBase will both be supported. Multiple HDFS formats will be supported, both row-based and columnar. (See the Trevni comments in my first Impala post.)
    • In the simplest of scanning scenarios, Impala can read row-based data at near the theoretically optimum speed, while Hive runs at 1/3 of that.
    • Initially, all Impala joins will be (distributed) hash joins. These seem to start at 10X Hive’s performance and go up from there.
    • The fastest Impala queries take > 1 second.
    • One test showed Impala surviving a load of 100 concurrent queries. Another test showed Impala running 10 cloned copies of a query with 25%ish performance degradation.
    • Impala will have Microstrategy support on Day 1, so it obviously can handle fairly complex SQL. (Also Pentaho, Tableau, and QlikView.)
    • Column statistics and the like are under active development, which will help in query optimization. A true cost-based optimizer is, of course, further off.

Replies (0)