

# FlagDuplicatesInColumn 类
<a name="aws-glue-api-pyspark-transforms-FlagDuplicatesInColumn"></a>

`FlagDuplicatesInColumn` 转换会返回一个新列，每行都有指定值，指示该行的源列中的值是否与源列前面的行中的值匹配。找到匹配项后，它们会被标记为重复项。第一次出现不会被标记，因为它与前面的行不匹配。

## 示例
<a name="pyspark-FlagDuplicatesInColumn-examples"></a>

```
from pyspark.context import SparkContext
from pyspark.sql import SparkSession      
from awsgluedi.transforms import *

sc = SparkContext()
spark = SparkSession(sc)

datasource1 = spark.read.json("s3://${BUCKET}/json/zips/raw/data")

try:
    df_output = column.FlagDuplicatesInColumn.apply(
        data_frame=datasource1,
        spark_context=sc,
        source_column="city",
        target_column="flag_col",
        true_string="True",
        false_string="False"
    )
except:
    print("Unexpected Error happened ")
    raise
```

## Output
<a name="pyspark-FlagDuplicatesInColumn-output"></a>

 `FlagDuplicatesInColumn` 转换会将新列“flag\$1col”添加至“df\$1output”DataFrame。此列将包含一个字符串值，表示相应行在“city”列中是否有重复的值。如果某行具有重复的“city”值，则“flag\$1col”将包含“true\$1string”值“True”。如果某行具有唯一的“city”值，则“flag\$1col”将包含“false\$1string”值“False”。

 生成的“df\$1output”DataFrame 将包含原始“datasource1”DataFrame 中的所有列，以及用于表示重复的“city”值的“flag\$1col”附加列。

## 方法
<a name="aws-glue-api-pyspark-transforms-FlagDuplicatesInColumn-_methods"></a>
+ [\$1\$1call\$1\$1](#aws-glue-api-pyspark-transforms-FlagDuplicatesInColumn-__call__)
+ [apply](#aws-glue-api-crawler-pyspark-transforms-FlagDuplicatesInColumn-apply)
+ [name](#aws-glue-api-crawler-pyspark-transforms-FlagDuplicatesInColumn-name)
+ [describeArgs](#aws-glue-api-crawler-pyspark-transforms-FlagDuplicatesInColumn-describeArgs)
+ [describeReturn](#aws-glue-api-crawler-pyspark-transforms-FlagDuplicatesInColumn-describeReturn)
+ [describeTransform](#aws-glue-api-crawler-pyspark-transforms-FlagDuplicatesInColumn-describeTransform)
+ [describeErrors](#aws-glue-api-crawler-pyspark-transforms-FlagDuplicatesInColumn-describeErrors)
+ [describe](#aws-glue-api-crawler-pyspark-transforms-FlagDuplicatesInColumn-describe)

## \$1\$1call\$1\$1(spark\$1context, data\$1frame, source\$1column, target\$1column, true\$1string=DEFAULT\$1TRUE\$1STRING, false\$1string=DEFAULT\$1FALSE\$1STRING)
<a name="aws-glue-api-pyspark-transforms-FlagDuplicatesInColumn-__call__"></a>

`FlagDuplicatesInColumn` 转换会返回一个新列，每行都有指定值，指示该行的源列中的值是否与源列前面的行中的值匹配。找到匹配项后，它们会被标记为重复项。第一次出现不会被标记，因为它与前面的行不匹配。
+ `source_column` – 源列的名称。
+ `target_column` – 目标列的名称。
+ `true_string` – 当源列的值与该列中前面的值重复时，在目标列中插入该字符串。
+ `false_string` – 当源列的值与该列中前面的值不同时，在目标列中插入该字符串。

## apply(cls, \$1args, \$1\$1kwargs)
<a name="aws-glue-api-crawler-pyspark-transforms-FlagDuplicatesInColumn-apply"></a>

继承自 `GlueTransform` [apply](aws-glue-api-crawler-pyspark-transforms-GlueTransform.md#aws-glue-api-crawler-pyspark-transforms-GlueTransform-apply)。

## name(cls)
<a name="aws-glue-api-crawler-pyspark-transforms-FlagDuplicatesInColumn-name"></a>

继承自 `GlueTransform` [名称](aws-glue-api-crawler-pyspark-transforms-GlueTransform.md#aws-glue-api-crawler-pyspark-transforms-GlueTransform-name)。

## describeArgs(cls)
<a name="aws-glue-api-crawler-pyspark-transforms-FlagDuplicatesInColumn-describeArgs"></a>

继承自 `GlueTransform` [describeArgs](aws-glue-api-crawler-pyspark-transforms-GlueTransform.md#aws-glue-api-crawler-pyspark-transforms-GlueTransform-describeArgs)。

## describeReturn(cls)
<a name="aws-glue-api-crawler-pyspark-transforms-FlagDuplicatesInColumn-describeReturn"></a>

继承自 `GlueTransform` [describeReturn](aws-glue-api-crawler-pyspark-transforms-GlueTransform.md#aws-glue-api-crawler-pyspark-transforms-GlueTransform-describeReturn)。

## describeTransform(cls)
<a name="aws-glue-api-crawler-pyspark-transforms-FlagDuplicatesInColumn-describeTransform"></a>

继承自 `GlueTransform` [describeTransform](aws-glue-api-crawler-pyspark-transforms-GlueTransform.md#aws-glue-api-crawler-pyspark-transforms-GlueTransform-describeTransform)。

## describeErrors(cls)
<a name="aws-glue-api-crawler-pyspark-transforms-FlagDuplicatesInColumn-describeErrors"></a>

继承自 `GlueTransform` [describeErrors](aws-glue-api-crawler-pyspark-transforms-GlueTransform.md#aws-glue-api-crawler-pyspark-transforms-GlueTransform-describeErrors)。

## describe(cls)
<a name="aws-glue-api-crawler-pyspark-transforms-FlagDuplicatesInColumn-describe"></a>

继承自 `GlueTransform` [describe](aws-glue-api-crawler-pyspark-transforms-GlueTransform.md#aws-glue-api-crawler-pyspark-transforms-GlueTransform-describe)。