Amazon EMR 6.6.0 - Hive release notes - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Amazon EMR 6.6.0 - Hive release notes

Amazon EMR 6.6.0 - Hive changes

Type Description
Upgrade

Upgrade Parquet to 1.12.1.

Upgrade

Upgrade jetty jars version to 9.4.43.v20210629

Bug Fixed an issue that was causing Hive to be installed on all task/core nodes when LLAP was enabled on a Hive cluster.
Backport HIVE-25942: Upgrade commons-io to 2.8.0 due to CVE-2021-29425
Backport HIVE-25726: Upgrade velocity to 2.3 due to CVE-2020-13936
Backport HIVE-25680: Authorize #get_table_meta HiveMetastore Server API to use any of the HiveMetastore Authorization model.
Backport HIVE-25554: Upgrade arrow version to 0.15
Backport HIVE-25242: Query performs extremely slow with vectorized.adaptor = chosen
Backport HIVE-25085: MetaStore Clients no longer shared across sessions.
Backport HIVE-24827: Hive aggregation query returns incorrect results for non text files.
Backport HIVE-24683: Hadoop23Shims getFileId prone to NPE for non-existing paths
Backport HIVE-24656: CBO fails for queries with is null on map and array types
Backport HIVE-24556: Optimize DefaultGraphWalker for case with no grandchild
Backport HIVE-24408: Upgrade Parquet to 1.11.1
Backport HIVE-24391: Fix FIX TestOrcFile failures in branch-3.1
Backport HIVE-24362: AST tree processing is suboptimal for tree with large number of nodes
Backport HIVE-24316: Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1
Backport HIVE-24307: Beeline with property-file and -e parameter is failing
Backport HIVE-24245: Vectorized PTF with count and distinct over partition producing incorrect results.
Backport HIVE-24224: Fix skipping header/footer for Hive on Tez on compressed file
Backport HIVE-24157: Strict mode to fail on CAST timestamp ↔ numeric
Backport HIVE-24113: NPE in GenericUDFToUnixTimeStamp
Backport HIVE-23987: Upgrade arrow version to 0.11.0
Backport HIVE-23972: Add external client ID to LLAP external client
Backport HIVE-23806: Avoid clearing column stat states in all partition in case schema is extended. This improves runtime of alter table add columns statement.
Backport HIVE-23779: BasicStatsTask Info is not getting printed in beeline console
Backport HIVE-23306: RESET command does not work if there is a config set by System.getProperty
Backport HIVE-23164: Server is not properly terminated because of non-daemon threads
Backport HIVE-22967: Support hive.reloadable.aux.jars.path for Hive on Tez
Backport HIVE-22934: Hive server interactive log counters to error stream
Backport HIVE-22901: Variable substitution can lead to OOM on circular references
Backport HIVE-22769: Incorrect query results and query failure during split generation for compressed text files
Backport HIVE-22716: Reading to ByteBuffer is broken in ParquetFooterInputFromCache
Backport HIVE-22648: Upgrade Parquet to 1.11.0
Backport HIVE-22640: Decimal64ColumnVector: ClassCastException when partition column type is Decimal
Backport HIVE-22621: unstable testcase: TestLlapSignerImpl.testSigning
Backport HIVE-22533: Fix possible LLAP daemon web UI vulnerabilities
Backport HIVE-22532: PTFPPD may push limit incorrectly through Rank/DenseRank function
Backport HIVE-22514: HiveProtoLoggingHook might consume lots of memory
Backport HIVE-22476: Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none
Backport HIVE-22429: Migrated clustered tables using bucketing_version 1 on hive 3 uses bucketing_version 2 for inserts
Backport HIVE-22412: StatsUtils throw NPE when explain
Backport HIVE-22360: MultiDelimitSerDe returns wrong results in last column when the loaded file has more columns than those in table schema
Backport HIVE-22332: Hive should ensure valid schema evolution settings since ORC-540
Backport HIVE-22331: unix_timestamp without argument returns timestamp in millisecond instead of second
Backport HIVE-22275: OperationManager.queryIdOperation does not properly clean up multiple queryIds
Backport HIVE-22273: Access check is failed when a temporary directory is removed
Backport HIVE-22270: Upgrade commons-io to 2.6
Backport HIVE-22241: Implement UDF to interpret date/timestamp using its internal representation and Gregorian-Julian hybrid calendar
Backport HIVE-22241: Implement UDF to interpret date/timestamp using its internal representation and Gregorian-Julian hybrid
Backport HIVE-22232: NPE when hive.order.columnalignment is set to false
Backport HIVE-22231: Hive query with big size via knox fails with Broken pipe Write failed
Backport HIVE-22221: Llap external client - Need to reduce LlapBaseInputFormat#getSplits
Backport HIVE-22208: Column name with reserved keyword is unescaped when query including join on table with mask column is re-written
Backport HIVE-22197: Common Merge join throwing class cast exception.
Backport HIVE-22170: from_unixtime and unix_timestamp should use user session time zone
Backport HIVE-22169: Tez: SplitGenerator tries to look for plan files which won't exist for Tez
Backport HIVE-22168: Remove very expensive logging from the llap cache hotpath
Backport HIVE-22161: UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType class
Backport HIVE-22120: Fix wrong results/ArrayOutOfBound exception in left outer map joins on specific boundary conditions
Backport HIVE-22115: Prevent the creation of query routing appender if property is set to false
Backport HIVE-22113: Prevent LLAP shutdown on AMReporter related RuntimeException
Backport HIVE-22106: Remove cross-query synchronization for the partition-eval
Backport HIVE-22099: Several date related UDFs can't handle Julian dates properly since HIVE-20007
Backport HIVE-22037: HS2 should log when shutting down due to OOM
Backport HIVE-21976: Offset should be null instead of zero in Calcite HiveSortLimit
Backport HIVE-21924: Split text files even if header/footer exists
Backport HIVE-21913: GenericUDTFGetSplits should handle usernames in the same way as LLAP
Backport HIVE-21905: Generics improvement around the FetchOperator class
Backport HIVE-21902: HiveServer2 UI: jetty response header needs X-Frame-Options
Backport HIVE-21888: Set hive.parquet.timestamp.skip.conversion default to true
Backport HIVE-21868: Vectorize CAST...FORMAT
Backport HIVE-21864: LlapBaseInputFormat#closeAll
Backport HIVE-21863: Improve Vectorizer type casting for WHEN expression
Backport HIVE-21862: ORC ppd produces wrong result with timestamp
Backport HIVE-21846: Create a thread in TezAM which periodically fetches LlapDaemon metrics
Backport HIVE-21837: MapJoin is throwing exception when selected column is having completely null values
Backport HIVE-21834: Avoid unnecessary calls to simplify filter conditions
Backport HIVE-21832: New metrics to get the average queue/serving/response time
Backport HIVE-21827: Multiple calls in SemanticAnalyzer do not go through getTableObjectByName method
Backport HIVE-21822: Expose LlapDaemon metrics through a new API method
Backport HIVE-21818: CBO: Copying TableRelOptHiveTable has metastore traffic
Backport HIVE-21815: Stats in ORC file are parsed twice
Backport HIVE-21805: HiveServer2: Use the fast ShutdownHookManager APIs
Backport HIVE-21799: NullPointerException in DynamicPartitionPruningOptimization, when join key is on aggregation column
Backport HIVE-21794: Add materialized view parameters to sqlStdAuthSafeVarNameRegexes
Backport HIVE-21768: JDBC: Strip the default union prefix for un-enclosed UNION queries
Backport HIVE-21746: ArrayIndexOutOfBoundsException during dynamically partitioned hash join, with CBO disabled
Backport HIVE-21717: Rename is failing for directory in move task.
Backport HIVE-21685: Wrong simplification in query with multiple IN clauses
Backport HIVE-21681: Describe formatted shows incorrect information for multiple primary keys
Backport HIVE-21651: Move protobuf serde into hive-exec.
Backport HIVE-21619: Print timestamp type without precision in SQL explain extended
Backport HIVE-21592: OptimizedSql is not shown when the expression contains CONCAT
Backport HIVE-21576: Introduce CAST...FORMAT and limited list of SQL:2016 datetime formats
Backport HIVE-21573: Binary transport shall ignore principal if auth is set to delegationToken
Backport HIVE-21550: TestObjectStore tests are flaky - A lock could not be obtained within the time requested
Backport HIVE-21544: Constant propagation corrupts coalesce/case/when expressions during folding
Backport HIVE-21539: GroupBy + where clause on same column results in incorrect query rewrite
Backport HIVE-21538: Beeline: password source though the console reader did not pass to connection param
Backport HIVE-21509: LLAP may cache corrupted column vectors and return wrong query result
Backport HIVE-21499: should not remove the function from registry if create command failed with AlreadyExistsException
Backport HIVE-21496: Automatic sizing of unordered buffer can overflow
Backport HIVE-21468: Case sensitivity in identifier names for JDBC storage handler
Backport HIVE-21467: Remove deprecated junit.framework.Assert imports
Backport HIVE-21435: LlapBaseInputFormat should get task number from TASK_ATTEMPT_ID conf if present, while building SubmitWorkRequestProto
Backport HIVE-21389: Hive distribution miss javax.ws.rs-api.jar after HIVE-21247
Backport HIVE-21385: Allow disabling pushdown of non-splittable computation to JDBC sources
Backport HIVE-21383: JDBC storage handler: Use catalog and schema to retrieve tables if specified
Backport HIVE-21382: Group by keys reduction optimization - keys are not reduced in query23
Backport HIVE-21362: Add an input format and serde to read from protobuf files.
Backport HIVE-21340: CBO: Prune non-key columns feeding into a SemiJoin
Backport HIVE-21332: Purge the non locked buffers instead of locked ones
Backport HIVE-21329: Custom Tez runtime unordered output buffer size depending on operator pipeline
Backport HIVE-21295: StorageHandler shall convert date to string using Hive convention
Backport HIVE-21294: Vectorization: 1-reducer Shuffle can skip the object hash functions
Backport HIVE-21255: Remove QueryConditionBuilder in JdbcStorageHandler
Backport HIVE-21253: Support DB2 in JDBC StorageHandler
Backport HIVE-21232: LLAP: Add a cache-miss friendly split affinity provider
Backport HIVE-21214: MoveTask : Use attemptId instead of file size for deduplication of files compareTempOrDuplicateFiles
Backport HIVE-21184: Add explain and explain formatted CBO plan with cost information
Backport HIVE-21182: Skip setting up hive scratch dir during planning
Backport HIVE-21171: Skip creating scratch dirs for tez if RPC is on
Backport HIVE-21126: Allow session level queries in LlapBaseInputFormat#getSplit
Backport HIVE-21107: Cannot find field" error during dynamically partitioned hash join
Backport HIVE-21061: CTAS query fails with IllegalStateException for empty source
Backport HIVE-21041: NPE, ParseException in getting schema from logical plan
Backport HIVE-21013: JdbcStorageHandler fail to find partition column in Oracle
Backport HIVE-21006: Extend SharedWorkOptimizer to remove semijoins when there is a reutilization opportunity
Backport HIVE-20992: Split the config hive.metastore.dbaccess.ssl.properties into more meaningful configs
Backport HIVE-20989: JDBC - The GetOperationStatus + log can block query progress via sleep
Backport HIVE-20988: Wrong results for group by queries with primary key on multiple columns
Backport HIVE-20985: If select operator inputs are temporary columns vectorization may reuse some of them as output
Backport HIVE-20978: "hive.jdbc.*" should add to sqlStdAuthSafeVarNameRegexes
Backport HIVE-20953: Remove a function from function registry when it can not be added to the metastore when creating it.
Backport HIVE-20952: Cleaning VectorizationContext.java
Backport HIVE-20951: LLAP: Set Xms to 50% always
Backport HIVE-20949: Improve PKFK cardinality estimation in physical planning
Backport HIVE-20944: Not validate stats during query compilation
Backport HIVE-20940: Bridge cases in which Calcite's type resolution is more stricter than Hive.
Backport HIVE-20937: Postgres jdbc query fail with "LIMIT must not be negative"
Backport HIVE-20926: Semi join reduction hint fails when bloom filter entries are high or when there are no stats
Backport HIVE-20920: Use SQL constraints to improve join reordering algorithm
Backport HIVE-20918: Flag to enable/disable pushdown of computation from Calcite into JDBC connection
Backport HIVE-20915: Make dynamic sort partition optimization available to HoS and MR
Backport HIVE-20910: Insert in bucketed table fails due to dynamic partition sort optimization
Backport HIVE-20899: Keytab URI for LLAP YARN Service is restrictive to support HDFS only
Backport HIVE-20898: For time related functions arguments may not be casted to a non nullable type
Backport HIVE-20881: Constant propagation oversimplifies projections
Backport HIVE-20880: Update default value for hive.stats.filter.in.min.ratio
Backport HIVE-20873: Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision
Backport HIVE-20868: SMB Join fails intermittently when TezDummyOperator has child op in getFinalOp in MapRecordProcessor
Backport HIVE-20853: Expose ShuffleHandler.registerDag in the llap daemon API
Backport HIVE-20850: Push case conditional from projections to dimension tables if possible
Backport HIVE-20842: Fix logic introduced in HIVE-20660 to estimate statistics for group by
Backport HIVE-20839: "Cannot find field" error during dynamically partitioned hash join
Backport HIVE-20835: Interaction between constraints and MV rewriting may create loop in Calcite planner
Backport HIVE-20834: Hive QueryResultCache entries keeping reference to SemanticAnalyzer from cached query
Backport HIVE-20830: JdbcStorageHandler range query assertion failure in some cases
Backport HIVE-20829: JdbcStorageHandler range split throws NPE
Backport HIVE-20827: Inconsistent results for empty arrays
Backport HIVE-20826: Enhance HiveSemiJoin rule to convert join + group by on left side to Left Semi Join
Backport HIVE-20821: Rewrite SUM0 into SUM + COALESCE combination
Backport HIVE-20815: JdbcRecordReader.next shall not eat exception
Backport HIVE-20813: udf to_epoch_milli need to support timestamp without time zone as well.
Backport HIVE-20804: Further improvements to group by optimization with constraints
Backport HIVE-20792: Inserting timestamp with zones truncates the data
Backport HIVE-20788: Extended SJ reduction may backtrack columns incorrectly when creating filters
Backport HIVE-20778: Join reordering may not be triggered if all joins in plan are created by decorrelation logic
Backport HIVE-20772: record per-task CPU counters in LLAP
Backport HIVE-20768: Adding Tumbling Window UDF
Backport HIVE-20767: Multiple project between join operators may affect join reordering using constraints
Backport HIVE-20762: NOTIFICATION_LOG cleanup interval is hardcoded as 60s and is too small
Backport HIVE-20761: Select for update on notification_sequence table has retry interval and retries count too small
Backport HIVE-20751: Upgrade arrow version to 0.10.0
Backport HIVE-20746: HiveProtoHookLogger does not close file at end of day.
Backport HIVE-20744: Use SQL constraints to improve join reordering algorithm
Backport HIVE-20740: Remove global lock in ObjectStore.setConf method. This cherrypick backports HIVE-20740 intended for Hive 3.2 and 4.x to 3.1.x
Backport HIVE-20734: Beeline: When beeline-site.xml is and hive CLI redirects to beeline, it should use the system username/dummy password instead of prompting for one
Backport HIVE-20731: keystore file in JdbcStorageHandler should be authorized
Backport HIVE-20720: Add partition column option to JDBC handler
Backport HIVE-20719: SELECT statement fails after UPDATE with hive.optimize.sort.dynamic.partition optimization and vectorization on
Backport HIVE-20718: Add perf cli driver with constraints
Backport HIVE-20716: Set default value for hive.cbo.stats.correlated.multi.key.joins to true
Backport HIVE-20712: HivePointLookupOptimizer should extract deep cases
Backport HIVE-20710: Constant folding may not create null constants without types
Backport HIVE-20706: external_jdbc_table2.q failing intermittently
Backport HIVE-20704: Extend HivePreFilteringRule to support other functions
Backport HIVE-20703: Put dynamic sort partition optimization under cost based decision
Backport HIVE-20702: Account for overhead from datastructure aware estimations during mapjoin selection
Backport HIVE-20692: Enable folding of NOT x IS (NOT) [TRUE|FALSE] expressions
Backport HIVE-20691: Fix org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[cttl]
Backport HIVE-20682: Async query execution can potentially fail if shared sessionHive is closed by master thread
Backport HIVE-20676: HiveServer2: PrivilegeSynchronizer is not set to daemon status
Backport HIVE-20660: Group by statistics estimation could be improved by bounding the total number of rows to source table
Backport HIVE-20652: JdbcStorageHandler push join of two different datasource to jdbc driver
Backport HIVE-20651: JdbcStorageHandler password should be encrypted
Backport HIVE-20649: LLAP aware memory manager for Orc writers
Backport HIVE-20648: LLAP: Vector group by operator should use memory per executor
Backport HIVE-20646: Partition filter condition is not pushed down to metastore query if it has IS NOT NULL
Backport HIVE-20644: Avoid exposing sensitive infomation through a Hive Runtime exception
Backport HIVE-20636: Improve number of null values estimation after outer join
Backport HIVE-20632: Query with get_splits UDF fails if materialized view is created on queried table
Backport HIVE-20627: Concurrent async queries intermittently fails with LockException and cause memory leak
Backport HIVE-20623: Shared work: Extend sharing of map-join cache entries in LLAP
Backport HIVE-20619: Include MultiDelimitSerDe in HiveServer2 By Default
Backport HIVE-20618: During join selection BucketMapJoin might be choosen for non bucketed tables
Backport HIVE-20617: Fix type of constants in IN expressions to have correct type
Backport HIVE-20612: Create new join multi-key correlation flag for CBO
Backport HIVE-20603: "Wrong FS" error when inserting to partition after changing table location filesystem
Backport HIVE-20601: EnvironmentContext null in ALTER_PARTITION event in DbNotificationListener
Backport HIVE-20583: Use canonical hostname only for kerberos auth in HiveConnection
Backport HIVE-20582: Make hflush in hive proto logging configurable
Backport HIVE-20563: Vectorization: CASE WHEN expression fails when THEN/ELSE type and result type are different
Backport HIVE-20558: Change default of hive.hashtable.key.count.adjustment to 0.99
Backport HIVE-20552: Get Schema from LogicalPlan faster
Backport HIVE-20550: Switch WebHCat to use beeline to submit Hive queries
Backport HIVE-20537: Multi-column joins estimates with uncorrelated columns different in CBO and Hive
Backport HIVE-20524: Schema Evolution checking is broken in going from Hive version 2 to version 3 for ALTER TABLE VARCHAR to DECIMAL
Backport HIVE-20522: HiveFilterSetOpTransposeRule may throw assertion error due to nullability of fields
Backport HIVE-20521: HS2 doAs=true has permission issue with hadoop.tmp.dir, with MR and S3A filesystem
Backport HIVE-20515: Empty query results when using results cache and query temp dir, results cache dir in different filesystems
Backport HIVE-20508: Hive does not support user names of type "user@realm"
Backport HIVE-20507: Beeline: Add a utility command to retrieve all uris from beeline-site.xml
Backport HIVE-20505: upgrade org.openjdk.jmh:jmh-core to 1.21
Backport HIVE-20503: Use datastructure aware estimations during mapjoin selection
Backport HIVE-20498: Support date type for column stats autogather
Backport HIVE-20496: Vectorization: Vectorized PTF IllegalStateException
Backport HIVE-20494: GenericUDFRestrictInformationSchema is broken after HIVE-19440
Backport HIVE-20477: OptimizedSql is not shown if the expression contains INs
Backport HIVE-20467: Allow IF NOT EXISTS/IF EXISTS in Resource plan creation/drop
Backport HIVE-20462: "CREATE VIEW IF NOT EXISTS" fails if view already exists
Backport HIVE-20455: Log spew from security.authorization.PrivilegeSynchonizer.run
Backport HIVE-20439: Use the inflated memory limit during join selection for llap
Backport HIVE-20433: Implicit String to Timestamp conversion is slow
Backport HIVE-20432: Rewrite BETWEEN to IN for integer types for stats estimation
Backport HIVE-20423: Set NULLS LAST as the default null ordering
Backport HIVE-20418: LLAP IO may not handle ORC files that have row index disabled correctly for queries with no columns selected
Backport HIVE-20412: NPE in HiveMetaHook
Backport HIVE-20406: Nested Coalesce giving incorrect results
Backport HIVE-20399: CTAS w/a custom table location that is not fully qualified fails for MM tables
Backport HIVE-20393: Semijoin Reduction : markSemiJoinForDPP behaves inconsistently
Backport HIVE-20391: HiveAggregateReduceFunctionsRule may infer wrong return type when decomposing aggregate function
Backport HIVE-20383: Invalid queue name and synchronisation issues in hive proto events hook.
Backport HIVE-20367: Vectorization: Support streaming for PTF AVG, MAX, MIN, SUM
Backport HIVE-20366: TPC-DS query78 stats estimates are off for is null filte
Backport HIVE-20364: Update default for hive.map.aggr.hash.min.reduction
Backport HIVE-20352: Vectorization: Support grouping function
Backport HIVE-20347: hive.optimize.sort.dynamic.partition should work with partitioned CTAS and MV
Backport HIVE-20345: Drop database may hang if the tables get deleted from a different call
Backport HIVE-20343: Hive 3: CTAS does not respect transactional_properties
Backport HIVE-20340: Druid Needs Explicit CASTs from Timestamp to STRING when the output of timestamp function is used as Strin
Backport HIVE-20339: Vectorization: Lift unneeded restriction causing some PTF with RANK not to be vectorized
Backport HIVE-20337: CachedStore: getPartitionsByExpr is not populating the partition list correctly
Backport HIVE-20336: Masking and filtering policies for materialized views
Backport HIVE-20326: Create constraints with RELY as default instead of NO RELY
Backport HIVE-20321: Vectorization: Cut down memory size of 1 col VectorHashKeyWrapper to <1 CacheLine
Backport HIVE-20320: Turn on hive.optimize.remove.sq_count_check flag
Backport HIVE-20315: Vectorization: Fix more NULL / Wrong Results issues and avoid unnecessary casts/conversions
Backport HIVE-20314: Include partition pruning in materialized view rewriting
Backport HIVE-20312: Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService
Backport HIVE-20302: LLAP: non-vectorized execution in IO ignores virtual columns, including ROW__ID
Backport HIVE-20300: VectorFileSinkArrowOperator
Backport HIVE-20299: potential race in LLAP signer unit test
Backport HIVE-20296: Improve HivePointLookupOptimizerRule to be able to extract from more sophisticated contexts
Backport HIVE-20294: Vectorization: Fix NULL / Wrong Results issues in COALESCE / ELT
Backport HIVE-20292: Bad join ordering in tpcds query93 with primary constraint defined
Backport HIVE-20290: Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during GetSplits
Backport HIVE-20281: SharedWorkOptimizer fails with 'operator cache contents and actual plan differ'
Backport HIVE-20277: Vectorization: Case expressions that return BOOLEAN are not supported for FILTER
Backport HIVE-20267: Expanding WebUI to include form to dynamically config log levels
Backport HIVE-20263: Typo in HiveReduceExpressionsWithStatsRule variable
Backport HIVE-20260: NDV of a column shouldn't be scaled when row count is changed by filter on another column
Backport HIVE-20252: Semijoin Reduction : Cycles due to semi join branch may remain undetected if small table side has a map join upstream.
Backport HIVE-20245: Vectorization: Fix NULL / Wrong Results issues in BETWEEN / IN
Backport HIVE-20241: Support partitioning spec in CTAS statements
Backport HIVE-20240: Semijoin Reduction: Use local variable to check for external table condition
Backport HIVE-20226: HMS getNextNotification will throw exception when request maxEvents exceed table's max_rows
Backport HIVE-20225: SerDe to support Teradata Binary Format
Backport HIVE-20213: Upgrade Calcite to 1.17.0
Backport HIVE-20212: Hiveserver2 in http mode emitting metric default.General.open_connections incorrectly
Backport HIVE-20210: Simple Fetch optimizer should lead to MapReduce when filter on non-partition column and conversion is minimal
Backport HIVE-20209: Metastore connection fails for first attempt in repl dump
Backport HIVE-20207: Vectorization: Fix NULL / Wrong Results issues in Filter / Compare
Backport HIVE-20204: Type conversion during IN
Backport HIVE-20203: Arrow SerDe leaks a DirectByteBuffer
Backport HIVE-20197: Vectorization: Add DECIMAL_64 testing, add Date/Interval/Timestamp arithmetic, and add more GROUP BY Aggregation
Backport HIVE-20193: cboInfo is not present in the explain plan json
Backport HIVE-20192: HS2 with embedded metastore is leaking JDOPersistenceManager objects
Backport HIVE-20183: Inserting from bucketed table can cause data loss, if the source table contains empty bucket
Backport HIVE-20177: Vectorization: Reduce KeyWrapper allocation in GroupBy Streaming mode
Backport HIVE-20174: Vectorization: Fix NULL / Wrong Results issues in GROUP BY Aggregation Functions
Backport HIVE-20172: StatsUpdater failed with GSS Exception while trying to connect to remote metastore
Backport HIVE-20153: Count and Sum UDF consume more memory in Hive 2+
Backport HIVE-20152: reset db state, when repl dump fails, so rename table can be done
Backport HIVE-20149: TestHiveCli failing/timing out
Backport HIVE-20130: Better logging for information schema synchronizer
Backport HIVE-20129: Revert to position based schema evolution for orc tables
Backport HIVE-20118: SessionStateUserAuthenticator.getGroupNames
Backport HIVE-20116: TezTask is using parent logger
Backport HIVE-20115: Acid tables should not use footer scan for analyze
Backport HIVE-20103: WM: Only Aggregate DAG counters if at least one is used
Backport HIVE-20101: BloomKFilter: Avoid using the local byte[] arrays entirely
Backport HIVE-20100: OpTraits : Select Optraits should stop when a mismatch is detected
Backport HIVE-20098: Statistics: NPE when getting Date column partition statistics
Backport HIVE-20095: Fix feature to push computation to jdbc external tables
Backport HIVE-20093: LlapOutputFomatService: Use ArrowBuf with Netty for Accounting
Backport HIVE-20090: Extend creation of semijoin reduction filters to be able to discover new opportunities
Backport HIVE-20088: Beeline config location path is assembled incorrectly
Backport HIVE-20082: HiveDecimal to string conversion doesn't format the decimal correctly
Backport HIVE-20069: Fix reoptimization in case of DPP and Semijoin optimization
Backport HIVE-20051: Skip authorization for temp tables
Backport HIVE-20044: Arrow Serde should pad char values and handle empty strings correctly
Backport HIVE-20028: Metastore client cache config is used incorrectly
Backport HIVE-20025: Clean-up of event files created by HiveProtoLoggingHook
Backport HIVE-20020: Hive contrib jar should not be in lib
Backport HIVE-20013: Add an Implicit cast to date type for to_date function
Backport HIVE-20011: Move away from append mode in proto logging hook
Backport HIVE-20005: acid_table_stats, acid_no_buckets, etc - query result change on the branch
Backport HIVE-20004: Wrong scale used by ConvertDecimal64ToDecimal results in incorrect results
Backport HIVE-19995: Aggregate row traffic for acid tables
Backport HIVE-19993: Using a table alias which also appears as a column name is not possible
Backport HIVE-19992: Vectorization: Follow-on to HIVE-19951 --> add call to SchemaEvolution.isOnlyImplicitConversion to disable encoded LLAP I/O for ORC only when data type conversion is not implicit
Backport HIVE-19989: Metastore uses wrong application name for HADOOP2 metrics
Backport HIVE-19981: Managed tables converted to external tables by the HiveStrictManagedMigration utility should be set to delete data when the table is dropped
Backport HIVE-19967: SMB Join : Need Optraits for PTFOperator ala GBY Op
Backport HIVE-19935: Hive WM session killed: Failed to update LLAP tasks count
Backport HIVE-19924: Tag distcp jobs run by Repl Load
Backport HIVE-19891: inserting into external tables with custom partition directories may cause data loss
Backport HIVE-19850: Dynamic partition pruning in Tez is leading to 'No work found for tablescan' error
Backport HIVE-19806: Sort qtests output to avoid flakiness in test results
Backport HIVE-19770: Support for CBO for queries with multiple same columns in select
Backport HIVE-19769: Create dedicated objects for DB and Table names
Backport HIVE-19765: Add Parquet specific tests to BlobstoreCliDriver
Backport HIVE-19759: Flaky test: TestRpc#testServerPort
Backport HIVE-19711: Refactor Hive Schema Tool
Backport HIVE-19701: getDelegationTokenFromMetaStore doesn't need to be synchronized
Backport HIVE-19694: Create Materialized View statement should check for MV name conflicts before running MV's SQL statement.
Backport HIVE-19674: Group by Decimal Constants push down to Druid table
Backport HIVE-19668: Over 30% of the heap wasted by duplicate org.antlr.runtime.CommonToken's and duplicate strings
Backport HIVE-19663: refactor LLAP IO report generation
Backport HIVE-19661: switch Hive UDFs to use Re2J regex engine
Backport HIVE-19628: possible NPE in LLAP testSigning
Backport HIVE-19568: Active/Passive HS2 HA: Disallow direct connection to passive HS2 instance
Backport HIVE-19564: Vectorization: Fix NULL / Wrong Results issues in Arithmetic
Backport HIVE-19552: Enable TestMiniDruidKafkaCliDriver#druidkafkamini_basic.q
Backport HIVE-19432: GetTablesOperation is too slow if the hive has too many databases and tables
Backport HIVE-19360: CBO: Add an "optimizedSQL" to QueryPlan object
Backport HIVE-19326: stats auto gather: incorrect aggregation during UNION queries
Backport HIVE-19313: TestJdbcWithDBTokenStoreNoDoAs tests are failing
Backport HIVE-19285: Add logs to the subclasses of MetaDataOperation
Backport HIVE-19235: Update golden files for Minimr tests
Backport HIVE-19104: When test MetaStore is started with retry the instances should be independent
Backport HIVE-18986: Table rename will run java.lang.StackOverflowError in dataNucleus if the table contains large number of columns
Backport HIVE-18920: CBO: Initialize the Janino providers ahead of 1st query
Backport HIVE-18873: Skipping predicate pushdown for MR silently at HiveInputFormat can cause storage handlers to produce erroneous result
Backport HIVE-18871: hive on tez execution error due to set hive.aux.jars.path to hdfs://
Backport HIVE-18725: Improve error handling for subqueries if there is wrong column reference
Backport HIVE-18696: The partition folders might not get cleaned up properly in the HiveMetaStore.add_partitions_core method if an
Backport HIVE-18453: ACID: Add "CREATE TRANSACTIONAL TABLE" syntax to unify ACID ORC & Parquet support
Backport HIVE-18201: Disable XPROD_EDGE for sq_count_chec
Backport HIVE-18140: Partitioned tables statistics can go wrong in basic stats mixed case
Backport HIVE-17921: Aggregation with struct in LLAP produces wrong result
Backport HIVE-17896: TopNKey: Create a standalone vectorizable TopNKey operator
Backport HIVE-17840: HiveMetaStore eats exception if transactionalListeners.notifyEvent fail
Backport HIVE-17043: Remove non unique columns from group by keys if not referenced later
Backport HIVE-17040: Join elimination in the presence of FK relationship
Backport HIVE-16839: Unbalanced calls to openTransaction/commitTransaction when alter the same partition concurrently
Backport HIVE-16100: Dynamic Sorted Partition optimizer loses sibling operators
Backport HIVE-15956: StackOverflowError when drop lots of partitions
Backport HIVE-15177: Authentication with hive fails when kerberos auth type is set to fromSubject and principal contains _HOST
Backport HIVE-14898: HS2 shouldn't log callstack for an empty auth header error
Backport HIVE-14493: Partitioning support for materialized views
Backport HIVE-14431: Recognize COALESCE as CASE
Backport HIVE-13457: Create HS2 REST API endpoints for monitoring information
Backport HIVE-12342: Set default value of hive.optimize.index.filter to true
Backport HIVE-10296: Cast exception observed when hive runs a multi join query on metastore
Backport HIVE-6980: Drop table by using direct sql

Amazon EMR 6.6.0 - Hive configuration changes

  • As part of OSS change HIVE-20703, the property to sort dynamic partitions, hive.optimize.sort.dynamic.partition, has been replaced with hive.optimize.sort.dynamic.partition.threshold.

    The hive.optimize.sort.dynamic.partition.threshold configuration has the following potential values:

    Value Description

    0

    (default)

    Makes the optimization to sort dynamic partitions a cost-based decision when ORC files are used. The max number of writers allowed in INSERT queries is computed based on (executor/container memory) * (percentage of memory taken by orc) divided by max memory (stripe size) taken by a single writer.

    -1

    Disables the optimization to sort dynamic partitions completely.

    1

    Enables global sorting of dynamic partitions. This keeps only one record writer open for each partition value in the reducer, thereby reducing the memory pressure on reducers.

    2

    (or greater integer)

    Tells Hive to use the specified integer as threshold for the maximum number of writers.

Amazon EMR 6.6.0 - Hive known issues

  • Queries with windowing functions on the same column as join may lead to invalid transformations as reported in HIVE-25278 and cause incorrect results or query failures. As a workaround, you can disable CBO at the query level for such queries. Contact Amazon support for further information.

  • Amazon EMR 6.6.0 includes Hive software version 3.1.2. Hive 3.1.2 introduces a feature that splits text files if they contain a header and footer (HIVE-21924). The Apache Tez App Master reads each of your files to determine offset points in the data range. These behaviors combined could negatively impact performance if your queries read a large number of small text files. As a workaround, use CombineHiveInputFormat and tune the max split size by configuring the following properties:

    SET hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; SET mapreduce.input.fileinputformat.split.maxsize=16777216;
  • With Amazon EMR 6.6.0 through 6.9.x, INSERT queries with dynamic partition and an ORDER BY or SORT BY clause will always have two reducers. This issue is caused by OSS change HIVE-20703, which puts dynamic sort partition optimization under cost-based decision. If your workload doesn't require sorting of dynamic partitions, we recommend that you set the hive.optimize.sort.dynamic.partition.threshold property to -1 to disable the new feature and get the correctly calculated number of reducers. This issue is fixed in OSS Hive as part of HIVE-22269 and is fixed in Amazon EMR 6.10.0.