

# Apache Pig
<a name="emr-pig"></a>

Apache Pig is an open-source Apache library that runs on top of Hadoop, providing a scripting language that you can use to transform large data sets without having to write complex code in a lower level computer language like Java. The library takes SQL-like commands written in a language called Pig Latin and converts those commands into Tez jobs based on directed acyclic graphs (DAGs) or MapReduce programs. Pig works with structured and unstructured data in a variety of formats. For more information about Pig, see [http://pig.apache.org/](http://pig.apache.org/).

You can execute Pig commands interactively or in batch mode. To use Pig interactively, create an SSH connection to the master node and submit commands using the Grunt shell. To use Pig in batch mode, write your Pig scripts, upload them to Amazon S3, and submit them as cluster steps. For more information on submitting work to a cluster, see [Submit work to a cluster](https://docs.amazonaws.cn/emr/latest/ManagementGuide/emr-work-with-steps.html) in the *Amazon EMR Management Guide*.

 When you use Pig to write output to an HCatalog table in Amazon S3, disable Amazon EMR direct write by setting the `mapred.output.direct.NativeS3FileSystem` and `mapred.output.direct.EmrFileSystem` properties to `false`. For more information, see [Using HCatalog](emr-hcatalog-using.md). Within a Pig script, you can use the `SET mapred.output.direct.NativeS3FileSystem false` and `SET mapred.output.direct.EmrFileSystem false` commands.

The following table lists the version of Pig included in the latest release of the Amazon EMR 7.x series, along with the components that Amazon EMR installs with Pig.

For the version of components installed with Pig in this release, see [Release 7.13.0 Component Versions](emr-7130-release.md).


**Pig version information for emr-7.13.0**  

| Amazon EMR Release Label | Pig Version | Components Installed With Pig | 
| --- | --- | --- | 
| emr-7.13.0 | Pig 0.17.0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-hdfs-zkfc, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, pig-client, tez-on-yarn, tez-on-worker | 

The following table lists the version of Pig included in the latest release of the Amazon EMR 6.x series, along with the components that Amazon EMR installs with Pig.

For the version of components installed with Pig in this release, see [Release 6.15.0 Component Versions](emr-6150-release.md).


**Pig version information for emr-6.15.0**  

| Amazon EMR Release Label | Pig Version | Components Installed With Pig | 
| --- | --- | --- | 
| emr-6.15.0 | Pig 0.17.0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, pig-client, tez-on-yarn, tez-on-worker | 

The following table lists the version of Pig included in the latest release of the Amazon EMR 5.x series, along with the components that Amazon EMR installs with Pig.

For the version of components installed with Pig in this release, see [Release 5.36.2 Component Versions](emr-5362-release.md).


**Pig version information for emr-5.36.2**  

| Amazon EMR Release Label | Pig Version | Components Installed With Pig | 
| --- | --- | --- | 
| emr-5.36.2 | Pig 0.17.0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, pig-client, tez-on-yarn | 

**Topics**
+ [Submit Pig work](emr-pig-launch.md)
+ [Call user-defined functions from Pig](emr-pig-udf.md)
+ [Pig release history](Pig-release-history.md)