Baeldung

Java, Spring and Web Development tutorials

 

Serialization of Enum Values in Avro
2025-04-28 00:10 UTC by Andrei Branza

1. Introduction

Apache Avro is a data serialization framework that provides rich data structures and a compact, fast, binary data format. When working with Avro in Java applications, we often need to serialize enum values. This can prove to be tricky if we don’t approach it correctly.

In this tutorial, we’ll explore how to properly serialize Java enum values using Avro. Furthermore, we’ll address common challenges we may face when working with enums in Avro.

2. Understanding Avro Enum Serialization

In Avro, enums are defined with a name and a set of symbols. When serializing Java enums, we must ensure our enum definition in the schema matches our Java enum definition.  This is important because Avro validates the enum values during serialization.

Avro uses a schema-based approach, meaning the schema defines the structure of the data, including field names, types, and, in the case of enums, the permitted symbol values. As such, the schema serves as a contract between the serializer and deserializer, thus helping with data consistency.

Let’s start by adding the necessary Avro Maven dependency to our project:

<dependency>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro</artifactId>
    <version>1.12.0</version>
</dependency>

3. Defining Enums in Avro Schema

First, let’s look at how to correctly define an enum when creating an Avro schema:

Schema colorEnum = SchemaBuilder.enumeration("Color")
  .namespace("com.baeldung.apache.avro")
  .symbols("UNKNOWN", "GREEN", "RED", "BLUE");

This creates an enum schema with four available values. The namespace helps prevent naming conflicts. In addition, the symbols define the valid enum values.

Now, let’s use this enum in a record schema:

Schema recordSchema = SchemaBuilder.record("ColorRecord")
  .namespace("com.baeldung.apache.avro")
  .fields()
  .name("color")
  .type(colorEnum)
  .noDefault()
  .endRecord();

This initialization creates a record schema ColorRecord with a field named color of the Enum type we defined earlier.

4. Serializing Enum Values

Now that we’ve defined our enum schema, let’s explore how we can serialize the enum values.

In this section, we’ll discuss the standard approach for basic enum serialization. In addition, we’ll address the common challenge of handling enums within union types, which is often cause for confusion.

4.1. Correct Approach for Basic Enum Serialization

In order to correctly serialize an enum value, we’ll need to create an EnumSymbol object. As such, we’ll use the appropriate enum schema (colorEnum):

public void serializeEnumValue() throws IOException {
    GenericRecord record = new GenericData.Record(recordSchema);
    GenericData.EnumSymbol colorSymbol = new GenericData.EnumSymbol(colorEnum, "RED");
    record.put("color", colorSymbol);
    
    DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(recordSchema);
    try (DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<>(datumWriter)) {
        dataFileWriter.create(recordSchema, new File("color.avro"));
        dataFileWriter.append(record);
    }
}

First, we create a GenericRecord based on our recordSchema. Next, we create an EnumSymbol with our enum schema(colorEnum) and the value “RED“. Finally, we add this to our record and serialize it to a temporary file using DatumWriter and DataFileWriter.

Now, let’s test our implementation:

@Test
void whenSerializingEnum_thenSuccess() throws IOException {
    File file = tempDir.resolve("color.avro").toFile();
    serializeEnumValue();
    DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(recordSchema);
    try (DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(file, datumReader)) {
        GenericRecord result = dataFileReader.next();
        assertEquals("RED", result.get("color").toString());
    }
}

This test confirms that we can successfully serialize and deserialize an enum value.

4.2. Handling Union Types With Enums

Now, let’s see how we can handle a common issue we may face – serializing enums within union types:

Schema colorEnum = SchemaBuilder.enumeration("Color")
  .namespace("com.baeldung.apache.avro")
  .symbols("UNKNOWN", "GREEN", "RED", "BLUE");
    
Schema unionSchema = SchemaBuilder.unionOf()
  .type(colorEnum)
  .and()
  .nullType()
  .endUnion();
    
Schema recordWithUnionSchema = SchemaBuilder.record("ColorRecordWithUnion")
  .namespace("com.baeldung.apache.avro")
  .fields()
  .name("color")
  .type(unionSchema)
  .noDefault()
  .endRecord();

Let’s analyze the defined schemas. We’ve defined a union schema that can be either our enum type or null. This pattern is common when a field is optional. Next, we’ve created a record schema with a field using this union type.

As such, when we serialize an enum within a union, we’ll still use the EnumSymbol but with the correct schema reference:

GenericRecord record = new GenericData.Record(recordWithUnionSchema);
GenericData.EnumSymbol colorSymbol = new GenericData.EnumSymbol(colorEnum, "RED");
record.put("color", colorSymbol);

One important aspect we need to keep in mind here is that we’ve created the EnumSymbol with the enum schema, not the union schema. This is a common mistake that leads to serialization errors.

Now, let’s test our implementation for the union handling:

@Test
void whenSerializingEnumInUnion_thenSuccess() throws IOException {
    File file = tempDir.resolve("colorUnion.avro").toFile();
    GenericRecord record = new GenericData.Record(recordWithUnionSchema);
    GenericData.EnumSymbol colorSymbol = new GenericData.EnumSymbol(colorEnum, "GREEN");
    record.put("color", colorSymbol);
    DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(recordWithUnionSchema);
    try (DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<>(datumWriter)) {
        dataFileWriter.create(recordWithUnionSchema, file);
        dataFileWriter.append(record);
    }
    DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(recordWithUnionSchema);
    try (DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(file, datumReader)) {
        GenericRecord result = dataFileReader.next();
        assertEquals("GREEN", result.get("color").toString());
    }
}

Furthermore, we can also test handling null values in the union:

@Test
void whenSerializingNullInUnion_thenSuccess() throws IOException {
    File file = tempDir.resolve("colorNull.avro").toFile();
    GenericRecord record = new GenericData.Record(recordWithUnionSchema);
    record.put("color", null);
    DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(recordWithUnionSchema);
    assertDoesNotThrow(() -> {
        try (DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<>(datumWriter)) {
            dataFileWriter.create(recordWithUnionSchema, file);
            dataFileWriter.append(record);
        }
    });
}

5. Schema Evolution With Enums

Schema evolution is a particularly sensitive area when dealing with enums, as adding or removing enum values can lead to compatibility issues. In this section, we’ll explore how to update our data structures as requirements change. We’ll focus on working with enum types and maintaining backward compatibility through proper default value configuration.

5.1. Adding New Enum Values

When we have to expand our schema, adding new enum values requires careful consideration. We’ll need to keep compatibility issues in mind. As such, for backward compatibility, adding a default value is crucial:

@Test
void whenSchemaEvolution_thenDefaultValueUsed() throws IOException {
    String evolvedSchemaJson = "{\"type\":\"record\",
                                 \"name\":\"ColorRecord\",
                                 \"namespace\":\"com.baeldung.apache.avro\",
                                 \"fields\":
                                   [{\"name\":\"color\",
                                     \"type\":
                                        {\"type\":\"enum\",
                                         \"name\":\"Color\",
                                     \"symbols\":[\"UNKNOWN\",\"GREEN\",\"RED\",\"BLUE\",\"YELLOW\"],
                                         \"default\":\"UNKNOWN\"
                                   }}]
                                 }";
    
    Schema evolvedRecordSchema = new Schema.Parser().parse(evolvedSchemaJson);
    Schema evolvedEnum = evolvedRecordSchema.getField("color").schema();
    
    File file = tempDir.resolve("colorEvolved.avro").toFile();
    GenericRecord record = new GenericData.Record(evolvedRecordSchema);
    GenericData.EnumSymbol colorSymbol = new GenericData.EnumSymbol(evolvedEnum, "YELLOW");
    record.put("color", colorSymbol);
    DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(evolvedRecordSchema);
    try (DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<>(datumWriter)) {
        dataFileWriter.create(evolvedRecordSchema, file);
        dataFileWriter.append(record);
    }
    
    String originalSchemaJson = "{\"type\":\"record\",
                                  \"name\":\"ColorRecord\",
                                  \"namespace\":\"com.baeldung.apache.avro\",
                                  \"fields\":[{
                                     \"name\":\"color\",
                                     \"type\":
                                         {\"type\":\"enum\",
                                          \"name\":\"Color\",
                                          \"symbols\":[\"UNKNOWN\",\"GREEN\",\"RED\",\"BLUE\"],
                                          \"default\":\"UNKNOWN\"}}]
                                 }";
    
    Schema originalRecordSchema = new Schema.Parser().parse(originalSchemaJson);
    
    DatumReader<GenericRecord> datumReader = 
                    new GenericDatumReader<>(evolvedRecordSchema, originalRecordSchema);
    try (DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(file, datumReader)) {
        GenericRecord result = dataFileReader.next();
        assertEquals("UNKNOWN", result.get("color").toString());
    }
}

Now, let’s analyze the code above. We’ve evolved our schema (evolvedSchemaJson)  and added a new symbol, “YELLOW“. Next, we’ve created a record with the “YELLOW” enum value, and we’ve written it in a file.

Then, we’ve created an “original schema” (originalSchemaJson) but with the same default value. Lest we forget, earlier we had noted that adding a default value is important for backwards compatibility.

Finally, when we’re reading the data with the original schema, we’re verifying that the default value “UNKNOWN” is used instead of “YELLOW“.

For proper schema evolution with enums, we’ll need to specify the default value at the enum type level, rather than at the field level. For our example, this is why we’re using a JSON string to define our schemas, as it gives us direct control over the structure.

6. Conclusion

In this article, we’ve explored how to properly serialize enum values using Apache Avro. We’ve looked at basic enum serialization, handling unions with enums, and addressing schema evolution challenges.

When working with enums in Avro, we should remember some key points. First, we’ll need to define our enum schema with the correct namespace and symbols. Using GenericData.EnumSymbol with the appropriate enum schema reference is important.

Furthermore, for union types, we create the enum symbol with the enum schema, not the union schema.

Lastly, regarding schema evolution, we need to place the default value at the enum type level for appropriate compatibility.

As always, the code is available over on GitHub.

The post Serialization of Enum Values in Avro first appeared on Baeldung.
       

 

Content mobilized by FeedBlitz RSS Services, the premium FeedBurner alternative.