[JPA] Batch Insert

대량의 데이터를 데이터베이스에 삽입하는 작업은 자주 접할 수 있는 이슈
Spring Boot, PostgreSQL을 사용할 때 대량 데이터 삽입 성능을 최적화하는 방법과 실험을 통해 비교 분석
JDBC Template을 직접 사용한 배치 삽입이 JPA Repository 방식보다 최대 76.3% 더 빠른 성능을 보였다.
PostgreSQL의 `reWriteBatchedInserts` 옵션 활성화만으로도 상당한 성능 향상을 얻을 수 있었다.

실습

Spring Boot, Spring Data JPA, PostgreSQL 환경에서 대량 데이터를 삽입할 때 사용할 수 있는 여러 방법의 성능을 비교
- 개별 엔티티 저장(Repository.save())
- 엔티티 컬렉션 저장(Repository.saveAll())
- JDBC 템플릿을 활용한 배치 삽입 (batchInsert())
PostgreSQL JDBC 드라이버의 reWriteBatchedInserts 옵션과 배치 크기의 영향

엔티티와 데이터를 저장하는 메서드는 다음과 같다. ID 생성 전략을 IDENTITY로 설정하면 batch 옵션이 동작하지 않으므로 주의해야 한다.

@Entity
@SequenceGenerator(
        name = "issued_coupon_code_gen",
        sequenceName = "issued_coupon_code_coupon_code_sn_seq",
        initialValue = 1,
        allocationSize = 100
)
public class IssuedCouponCode {
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "issued_coupon_code_gen")
    private Long couponCodeSn;

    private String serialCode;
}

@Transactional
public void save(int size) {
    List<IssuedCouponCode> issuedCouponCodes = generateCode(size);
    issuedCouponCodes.forEach(issuedCouponCodeRepository::save);
}

@Transactional
public void saveAll(int size) {
    List<IssuedCouponCode> issuedCouponCodes = generateCode(size);
    issuedCouponCodeRepository.saveAll(issuedCouponCodes);
}

@Transactional
public void batchInsert(int size) {
    List<IssuedCouponCode> issuedCouponCodes = generateCode(size);
    issuedCouponCodeRepository.batchInsert(issuedCouponCodes);
}

데이터 추가 방식들

Repository.save() - 개별 저장 방식

가장 기본적인 방식으로 각 엔티티를 개별적으로 저장

각 save() 호출마다 개별 INSERT 문 실행
JPA의 완전한 엔티티 생명주기 관리 적용
영속성 컨텍스트에 모든 엔티티 보관 (메모리 사용량 증가)

Repository.saveAll() - 컬렉션 저장 방식

여러 엔티티를 한 번에 저장하는 방식

내부적으로는 여전히 개별 save() 호출
하지만 Hibernate의 배치 처리 최적화 적용 가능
hibernate.jdbc.batch_size 설정에 따른 배치 그룹화

JDBC Template batchInsert() - 직접 배치 처리

JDBC Template을 사용해 직접 배치 삽입을 구현하는 방식

JPA/Hibernate 레이어 우회
PreparedStatement의 addBatch() 직접 활용
최소한의 메모리 오버헤드로 순수 SQL 실행

옵션

# application.properties

# hibernate 배치 설정
hibernate.jdbc.batch_size=100
hibernate.order_inserts=true
hibernate.order_updates=true
hibernate.jdbc.batch_versioned_data=true

# reWriteBatchedInserts 옵션
jdbc:postgresql://localhost:5432/testdb?reWriteBatchedInserts=false
jdbc:postgresql://localhost:5432/testdb?reWriteBatchedInserts=true

hibernate.jdbc.batch_size를 100으로 설정 후 10만건의 데이터를 추가하는데 걸리는 시간을 측정했다.

테스트 데이터: 10만 건의 쿠폰 코드 엔티티
측정 지표: 총 실행 시간 (밀리초)
반복 횟수: 각 방식별 3회 측정 후 평균값 사용
환경 통제: 동일한 하드웨어, 네트워크, DB 상태에서 측정

결과

삽입 방식	reWriteBatchedInserts=false	reWriteBatchedInserts=true	성능 향상
save	6524ms	5513ms	15.5%
saveAll	5062ms	3498ms	30.9%
batchInsert	2537ms	601ms	76.3%

save() vs saveAll() 비교

saveAll()이 약 23% 더 빠른 성능 (reWriteBatchedInserts=true)
메서드 호출 오버헤드
save()와 saveAll() 모두 배치 처리가 적용되었다.
- 동일한 @Transactional 범위 내에서 save()와 saveAll() 모두 배치 처리가 적용
- Spring의 트랜잭션 관리 : 트랜잭션 종료 시점에 영속성 컨텍스트의 모든 변경사항이 flush되면서 배치 처리 효과를 얻게 된다.

JPA vs JDBC 비교

JDBC batchInsert()가 JPA 방식보다 압도적으로 빨랐다.
JPA의 오버헤드 요소들:
- 엔티티 생명주기 관리
- 1차 캐시에 10만 개 객체 저장
- 더티 체킹 수행
- 영속성 컨텍스트 플러시

PostgreSQL JDBC 드라이버의 reWriteBatchedInserts 가 배치 삽입 쿼리의 성능에 상당한 영향을 주는 것을 확인할 수 있었다. true로 설정할 경우 여러 개별 삽입 문을 하나의 다중 행 삽입 문으로 만들어 준다.

-- reWriteBatchedInserts=false
INSERT INTO issued_coupon_code(serial_code) VALUES('1');
INSERT INTO issued_coupon_code(serial_code) VALUES('2');
INSERT INTO issued_coupon_code(serial_code) VALUES('3');

-- reWriteBatchedInserts=true
INSERT INTO issued_coupon_code(serial_code) VALUES('1'),('2'),('3');

이렇게 다중 행을 한번에 처리함으로써 성능 향상을 극적으로 보였다.

SQL 파싱 오버헤드 최소화 (1회 vs 10만회)
네트워크 전송량 감소
데이터베이스 내부 최적화 가능

성능이 중요한 대량 처리 로직에서는 JPA Repository를 사용하는 것 대신 JDBC 기반의 batchInsert() 방식을 선택하는 것이 최적의 선택일 수 있다.

물론 대량 데이터 처리는 단순히 빠르기만 하면 되는 것이 아니라 비즈니스 요구사항, 시스템 안정성, 유지보수 등 다양한 상황을 고려해서 선택해야한다.

저작자표시 비영리 변경금지 (새창열림)

실습

데이터 추가 방식들

Repository.save() - 개별 저장 방식

Repository.saveAll() - 컬렉션 저장 방식

JDBC Template batchInsert() - 직접 배치 처리

티스토리툴바