1. Overview

In this tutorial, we’ll explore various ways of iterating through large data sets retrieved with Spring Data JPA.

Firstly, we’ll use paginated queries, and we’ll see the difference between a Slice and a Page. After that, we’ll learn how to stream and process the data from the database, without collecting it.

2. Paginated Queries

A common approach for this situation is to use paginated queries. To do this, we need to define a batch size and execute multiple queries. As a result, we’ll be able to process all the entities in smaller batches and avoid loading large amounts of data in memory.

2.1. Pagination Using Slices

For the code examples in this article, we’ll use the Student entity as the data model:

@Entity
public class Student {

    @Id
    @GeneratedValue
    private Long id;

    private String firstName;
    private String lastName;

    // consturctor, getters and setters

}

Let’s add a method that queries all the students by firstName. With Spring Data JPA, we simply need to add to the JpaRepository a method that receives a Pageable as a parameter and returns a Slice:

@Repository
public interface StudentRepository extends JpaRepository<Student, Long> {
    Slice<Student> findAllByFirstName(String firstName, Pageable page);
}

We can notice that the return type is Slice<Student>. The Slice object allows us to process the first batch of Student entities. The slice object exposes a hasNext() method that allows us to check if the batch we’re processing is the last one of the result set.

Moreover, we can move from one slice to the next one with the help of the method nextPageable(). This method returns the Pageable object needed for requesting the next slice. Therefore, we can retrieve all the data, slice by slice, with a combination of the two methods inside a while loop:

void processStudentsByFirstName(String firstName) {
    Slice<Student> slice = repository.findAllByFirstName(firstName, PageRequest.of(0, BATCH_SIZE));
    List<Student> studentsInBatch = slice.getContent();
    studentsInBatch.forEach(emailService::sendEmailToStudent);

    while(slice.hasNext()) {
        slice = repository.findAllByFirstName(firstName, slice.nextPageable());
        slice.get().forEach(emailService::sendEmailToStudent);
    }
}

Let’s run a short test using a small batch size and follow the SQL statements. We’ll expect multiple queries to be executed:

[main] DEBUG org.hibernate.SQL - select student0_.id as id1_0_, student0_.first_name as first_na2_0_, student0_.last_name as last_nam3_0_ from student student0_ where student0_.first_name=? limit ?
[main] DEBUG org.hibernate.SQL - select student0_.id as id1_0_, student0_.first_name as first_na2_0_, student0_.last_name as last_nam3_0_ from student student0_ where student0_.first_name=? limit ? offset ?
[main] DEBUG org.hibernate.SQL - select student0_.id as id1_0_, student0_.first_name as first_na2_0_, student0_.last_name as last_nam3_0_ from student student0_ where student0_.first_name=? limit ? offset ?

2.2. Pagination Using Pages

As an alternative to Slice<>, we can also use Page<> as the return type of the query:

@Repository
public interface StudentRepository extends JpaRepository<Student, Long> {
    Slice<Student> findAllByFirstName(String firstName, Pageable page);
    Page<Student> findAllByLastName(String lastName, Pageable page);
}

The Page interface extends Slice, adding two other methods to it: getTotalPages() and getTotalElements().

Pages are often used as the return type when the paginated data is requested over the network. This way, the caller will know exactly how many rows are left and how many additional requests will be needed.

On the other hand, using Pages results in additional queries that count the rows meeting the criteria:

[main] DEBUG org.hibernate.SQL - select student0_.id as id1_0_, student0_.first_name as first_na2_0_, student0_.last_name as last_nam3_0_ from student student0_ where student0_.last_name=? limit ?
[main] DEBUG org.hibernate.SQL - select count(student0_.id) as col_0_0_ from student student0_ where student0_.last_name=?
[main] DEBUG org.hibernate.SQL - select student0_.id as id1_0_, student0_.first_name as first_na2_0_, student0_.last_name as last_nam3_0_ from student student0_ where student0_.last_name=? limit ? offset ?
[main] DEBUG org.hibernate.SQL - select count(student0_.id) as col_0_0_ from student student0_ where student0_.last_name=?
[main] DEBUG org.hibernate.SQL - select student0_.id as id1_0_, student0_.first_name as first_na2_0_, student0_.last_name as last_nam3_0_ from student student0_ where student0_.last_name=? limit ? offset ?

Consequently, we should only use Page<> as the return type if we need to know the total number of entities.

3. Streaming From the Database

Spring Data JPA also allows us to stream the data from the result set:

Stream<Student> findAllByFirstName(String firstName);

As a result, we’ll process the entities one by one, without loading them in memory all at the same time. However, we’ll need to manually close the stream created by the Spring Data JPA, with a try-with-resource block. Furthermore, we’ll have to wrap the query in a read-only transaction.

Lastly, even if we process the rows one by one, we’ve to make sure the persistence context isn’t keeping the reference to all the entities. We can achieve this by manually detaching the entities before consuming the stream:

private final EntityManager entityManager;

@Transactional(readOnly = true)
public void processStudentsByFirstNameUsingStreams(String firstName) {
    try (Stream<Student> students = repository.findAllByFirstName(firstName)) {
        students.peek(entityManager::detach)
            .forEach(emailService::sendEmailToStudent);
    }
}

4. Conclusion

In this article, we explored various ways of processing large data sets. Initially, we achieved this through multiple, paginated, queries. We learned that we should use Page<> as the return type when the caller needs to know the total number of elements and Slice<> otherwise. After that, we learned how to stream the rows from the database and process them individually.

As always, the code samples can be found over on GitHub.

Course – LSD (cat=Persistence)

Get started with Spring Data JPA through the reference Learn Spring Data JPA course:

>> CHECK OUT THE COURSE
res – Persistence (eBook) (cat=Persistence)
6 Comments
Oldest
Newest
Inline Feedbacks
View all comments
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.