Saturday, February 25, 2017

Apache Kafka: Multiple ways for Consume or Read messages from Kafka Topic

In our previous post, we are using Apache Avro for producing messages to the kafka queue or send message to the queue. In this post, we are going to create Kafka consumers for consuming the messages from Kafka queue with avro format. Like in previous post, there are multiple ways for producing the messages, same as with consumer, there are multiple ways for consuming messages from kafka topics.

As per previous post, with avro, we are using confluent registry for managing message schema. Before deserializing message kafka consumer read the schema from registry server and deserialize schema with type safety. If we want to check our schema is schema server, we can hit following end point using any rest tool.


  • http://localhost:8081/subjects (List of all save schema)
  • http://localhost:8081/subjects/<schema-name>/versions/1 (For detail structure of schema)

I: Simple Consumer

Like simple producer, we have simple Consumer for consuming messages produced by simple consumer. The best practices are, we need to follow same way for Producing/Consuming messages from topic. 

public class SimpleConsumer {

    private static Properties kafkaProps = new Properties();

    static {
        kafkaProps.put("bootstrap.servers", "localhost:9092");
        kafkaProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
        kafkaProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
        kafkaProps.put(ConsumerConfig.GROUP_ID_CONFIG, "CustomerCountryGroup");

    }

    private static void infinitePollLoop() {
        try(KafkaConsumer kafkaConsumer = new KafkaConsumer<>(kafkaProps)){
            kafkaConsumer.subscribe(Arrays.asList("CustomerCountry"));
            while(true) {
                ConsumerRecords records = kafkaConsumer.poll(100);
                records.forEach(record -> {
                    System.out.printf("Topic: %s, Partition: %s, Offset: %s, Key: %s, Value: %s",
                            record.topic(), record.partition(), record.offset(), 
record.key(), record.value());
                    System.out.println();

                });
            }
        }
    }

    public static void main(String[] args) {
        infinitePollLoop();
    }
}

II: Apache Avro DeSerialization Generic Format: 

Like Generic format producer, we have same Generic format consumer as well. In consumer, the same thing we need like avro schema or generated POJO class, by any built tools generator or plugin. 

public class AvroGenericConsumer {

    private static Properties kafkaProps = new Properties();

    static {
        // As per my findings 'latest', 'earliest' and 'none' values are used with 
kafka consumer poll.
        kafkaProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        kafkaProps.put("bootstrap.servers", "localhost:9092");
        kafkaProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
        kafkaProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, KafkaAvroDeserializer.class);
        kafkaProps.put(ConsumerConfig.GROUP_ID_CONFIG, "AvroGenericConsumer-GroupOne");
        kafkaProps.put("schema.registry.url", "http://localhost:8081");
        kafkaProps.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, true);
    }

    public static void infiniteConsumer() throws IOException {
        try (KafkaConsumer kafkaConsumer = new KafkaConsumer<>(kafkaProps)) {
            kafkaConsumer.subscribe(Arrays.asList("AvroGenericProducerTopics"));

            while (true) {
                ConsumerRecords records = kafkaConsumer.poll(100);

                records.forEach(record -> {
                    CustomerGeneric customer = (CustomerGeneric) SpecificData.get()
.deepCopy(CustomerGeneric.SCHEMA$, record.value());
                    System.out.println("Key : " + record.key());
                    System.out.println("Value: " + customer);
                });
            }
        }
    }

    public static void main(String[] args) throws IOException {
        infiniteConsumer();
    }
}

III: Apache Avro DeSerialization Specific Format One: 

This example is used for deserializer kafka message with specific format. This deserializer is used with corresponding Apache Avro Serialization Specific Format One in our previous post.  In this we are using Kafka Stream from deserialize the message. There is another simple way for deserialize the message which we will look into next example. 

public class AvroSpecificProducerOne {
    private static Properties kafkaProps = new Properties();
    private static KafkaProducer kafkaProducer;

    static {
        kafkaProps.put("bootstrap.servers", "localhost:9092");
        kafkaProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
        kafkaProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class);
        kafkaProps.put("schema.registry.url", "http://localhost:8081");
        kafkaProducer = new KafkaProducer<>(kafkaProps);
    }

    public static void fireAndForget(ProducerRecord record) {
        kafkaProducer.send(record);
    }

    public static void asyncSend(ProducerRecord record) {
        kafkaProducer.send(record, (recordMetaData, ex) -> {
            System.out.println("Offset: " + recordMetaData.offset());
            System.out.println("Topic: " + recordMetaData.topic());
            System.out.println("Partition: " + recordMetaData.partition());
            System.out.println("Timestamp: " + recordMetaData.timestamp());
        });
    }

    public static void main(String[] args) throws InterruptedException, IOException {
        Customer customer1 = new Customer(1001, "Jimmy");
        Customer customer2 = new Customer(1002, "James");

        ProducerRecord record1 = new ProducerRecord<>("AvroSpecificProducerOneTopic",
                "KeyOne", customer1
        );
        ProducerRecord record2 = new ProducerRecord<>("AvroSpecificProducerOneTopic",
                "KeyOne", customer2
        );

        asyncSend(record1);
        asyncSend(record2);

        Thread.sleep(1000);
    }
}

IV: Apache Avro DeSerialization Specific Format Two: 

This example of deserializer is used to deserialize message from kafka queue by Apache Avro Serialization Specific Format One producer. In the previous example we are using Kafka Streams, But in this, we are using simple way for deserialize the message. 

public class AvroSpecificDeserializerThree {

    private static Properties kafkaProps = new Properties();

    static {
        // As per my findings 'latest', 'earliest' and 'none' values are used with kafka consumer poll.
        kafkaProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        kafkaProps.put("bootstrap.servers", "localhost:9092");
        kafkaProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
        kafkaProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, KafkaAvroDeserializer.class);
        kafkaProps.put(ConsumerConfig.GROUP_ID_CONFIG, "AvroSpecificDeserializerThree-GroupOne");
        kafkaProps.put("schema.registry.url", "http://localhost:8081");
        kafkaProps.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, true);
    }

    public static void infiniteConsumer() throws IOException {
        try (KafkaConsumer kafkaConsumer = new KafkaConsumer<>(kafkaProps)) {
            kafkaConsumer.subscribe(Arrays.asList("AvroSpecificProducerOneTopic"));

            while (true) {
                ConsumerRecords records = kafkaConsumer.poll(100);

                records.forEach(record -> {
                    Customer customer = record.value();
                    System.out.println("Key : " + record.key());
                    System.out.println("Value: " + customer);
                });
            }
        }
    }

    public static void main(String[] args) throws IOException {
        infiniteConsumer();
    }
}



V: Apache Avro DeSerialization Specific Format Three: 

In this example we deserialize our messages from kafka with some Specific format. This is used for Apache Avro Serialization Specific Format Two producer in our previous post.

public class AvroSpecificDeserializerTwo {

    private static Properties kafkaProps = new Properties();

    static {
        // As per my findings 'latest', 'earliest' and 'none' values are used with kafka consumer poll.
        kafkaProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        kafkaProps.put("bootstrap.servers", "localhost:9092");
        kafkaProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
        kafkaProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, KafkaAvroDeserializer.class);
        kafkaProps.put(ConsumerConfig.GROUP_ID_CONFIG, "AvroSpecificDeserializerTwo-GroupOne");
        kafkaProps.put("schema.registry.url", "http://localhost:8081");
        kafkaProps.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, true);
    }

    public static void infiniteConsumer() throws IOException {
        try (KafkaConsumer kafkaConsumer = new KafkaConsumer<>(kafkaProps)) {
            kafkaConsumer.subscribe(Arrays.asList("AvroSpecificProducerTwoTopic"));

            while (true) {
                ConsumerRecords records = kafkaConsumer.poll(100);

                records.forEach(record -> {
                    DatumReader customerDatumReader = new SpecificDatumReader<>(Customer.SCHEMA$);
                    BinaryDecoder binaryDecoder = DecoderFactory.get().binaryDecoder(record.value(), null);
                    try {
                        Customer customer = (Customer) customerDatumReader.read(null, binaryDecoder);
                        System.out.println("Key : " + record.key());
                        System.out.println("Value: " + customer);
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                });
            }
        }
    }

    public static void main(String[] args) throws IOException {
        infiniteConsumer();
    }
}

Still there are lots of ways for Produce/Consume messages from kafka. Some of the way we are discuss here, for other, may be we will come with new posts. Please feel free for sending feedback and post the comments. 


References: 

5 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. Hi Nitesh, Thanks for your comments, but this is blog is not for your promotion. That's why I am going to remove your comment.

    ReplyDelete
  3. Hi Harmeet, is it possible to have different serializer/deserializer per topic , in the sense , for topic1 i need deserializer1 and for topic2 i need deserializer2 ?

    ReplyDelete
    Replies
    1. Hi Vinod, I am not sure about this. But may be this is possible. For different serializer / deserializer you need to create new kafkaConsumer. because for one consumer, we will define all properties and start reading the topic from that consumer. So, if you want to use different serializer / deserializer, you will going to use different "kafkaConsumer" with new properties.

      Delete
  4. Hi Harmeet, looks like there few mistakes in the code you have posted here.
    Under the class "AvroGenericConsumer" the attribute "SPECIFIC_AVRO_READER_CONFIG" is set to True and you are actually expecting a schema to be used for deserialization. I believe that attribute should be set to False in this regard.

    And under the sub-post "III: Apache Avro DeSerialization Specific Format One: " you talked about the deserialization, however the code snippet talks about a Kafka Producer, shouldn't it be a Consumer relevant code instead ?

    Please clarify as you have been addressing the same problem I am trying to find the answer for. Thank you.

    And the rest of the code snippets looked jumbled up and below too, please take a look and let me know if I am understanding this correctly.

    Warm Regards,
    Kamesh.

    ReplyDelete