Migration from a third-party schema registry to Amazon Glue Schema Registry - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Migration from a third-party schema registry to Amazon Glue Schema Registry

The migration from a third-party schema registry to the Amazon Glue Schema Registry has a dependency on the existing, current third-party schema registry. If there are records in an Apache Kafka topic which were sent using a third-party schema registry, consumers need the third-party schema registry to deserialize those records. The AWSKafkaAvroDeserializer provides the ability to specify a secondary deserializer class which points to the third-party deserializer and is used to deserialize those records.

There are two criteria for retirement of a third-party schema. First, retirement can occur only after records in Apache Kafka topics using the 3rd party schema registry are either no longer required by and for any consumers. Second, retirement can occur by aging out of the Apache Kafka topics, depending on the retention period specified for those topics. Note that if you have topics which have infinite retention, you can still migrate to the Amazon Glue Schema Registry but you will not be able to retire the third-party schema registry. As a workaround, you can use an application or Mirror Maker 2 to read from the current topic and produce to a new topic with the Amazon Glue Schema Registry.

To migrate from a third-party schema registry to the Amazon Glue Schema Registry:

  1. Create a registry in the Amazon Glue Schema Registry, or use the default registry.

  2. Stop the consumer. Modify it to include Amazon Glue Schema Registry as the primary deserializer, and the third-party schema registry as the secondary.

    • Set the consumer properties. In this example, the secondary_deserializer is set to a different deserializer. The behavior is as follows: the consumer retrieves records from Amazon MSK and first tries to use the AWSKafkaAvroDeserializer. If it is unable to read the magic byte that contains the Avro Schema ID for the Amazon Glue Schema Registry schema, the AWSKafkaAvroDeserializer then tries to use the deserializer class provided in the secondary_deserializer. The properties specific to the secondary deserializer also need to be provided in the consumer properties, such as the schema_registry_url_config and specific_avro_reader_config, as shown below.

      consumerProps.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName()); consumerProps.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, AWSKafkaAvroDeserializer.class.getName()); consumerProps.setProperty(AWSSchemaRegistryConstants.AWS_REGION, KafkaClickstreamConsumer.gsrRegion); consumerProps.setProperty(AWSSchemaRegistryConstants.SECONDARY_DESERIALIZER, KafkaAvroDeserializer.class.getName()); consumerProps.setProperty(KafkaAvroDeserializerConfig.SCHEMA_REGISTRY_URL_CONFIG, "URL for third-party schema registry"); consumerProps.setProperty(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, "true");
  3. Restart the consumer.

  4. Stop the producer and point the producer to the Amazon Glue Schema Registry.

    1. Set the producer properties. In this example, the producer will use the default-registry and auto register schema versions.

      producerProps.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName()); producerProps.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, AWSKafkaAvroSerializer.class.getName()); producerProps.setProperty(AWSSchemaRegistryConstants.AWS_REGION, "us-east-2"); producerProps.setProperty(AWSSchemaRegistryConstants.AVRO_RECORD_TYPE, AvroRecordType.SPECIFIC_RECORD.getName()); producerProps.setProperty(AWSSchemaRegistryConstants.SCHEMA_AUTO_REGISTRATION_SETTING, "true");
  5. (Optional) Manually move existing schemas and schema versions from the current third-party schema registry to the Amazon Glue Schema Registry, either to the default-registry in Amazon Glue Schema Registry or to a specific non-default registry in Amazon Glue Schema Registry. This can be done by exporting schemas from the third-party schema registries in JSON format and creating new schemas in Amazon Glue Schema Registry using the Amazon Web Services Management Console or the Amazon CLI.

    This step may be important if you need to enable compatibility checks with previous schema versions for newly created schema versions using the Amazon CLI and the Amazon Web Services Management Console, or when producers send messages with a new schema with auto-registration of schema versions turned on.

  6. Start the producer.