계층형 구조와 페이징(feat.댓글)

Notice

Recent Posts

Recent Comments

Link

« 2026/06 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Tags more

Archives

Today

Total

관리 메뉴

look-forest

계층형 구조와 페이징(feat.댓글) 본문

Architecture/대규모 시스템 설계

계층형 구조와 페이징(feat.댓글)

studyHub 2026. 1. 23. 19:08

댓글 기능을 통해 계층형 구조에서 어떻게 페이징을 처리할 수 있는지 알아보자.

댓글 계층 구조를 데이터베이스에 표현하는 주요 설계 방식은 2가지로 나눌 수 있는데,

depth에 따라 Adjacency List(최대 2 depth)와 Path Enumeration(무한 depth)로 나눌 수 있고, 각 방식은 데이터 구조가 다르다.

댓글 목록 조회 - 최대 2 depth

Adjacency List (인접 리스트) 방식

단순히 시간 순으로 나열할 수 없다. 계층에 대한 고려가 필요하다.

단순히 댓글의 생성 시간(commentId)로 정렬하면 안된다. 계층 관계에서는 더 늦게 작성된 대댓글이 먼저 노출될 수 있기 때문.

댓글이 조회되는 규칙을 살펴보자.

1. 상위 댓글은 하위 댓글(대댓글)보다 반드시 먼저 생성된다.

2. 같은 상위 댓글을 공유하는 하위 댓글들은, 생성 시간 순으로 정렬된다.

최상위 댓글(1 depth)의 parent_comment_id는 본인의 comment_id를 넣어주었다.

인덱스

따라서 (parent_comment_id 오름차순, comment_id 오름차순)의 정렬 구조를 가지고 있고,

shard key = article_id 이기 때문에, 단일 샤드에서 게시글별 댓글 목록 을 조회할 수 있다.

이를 인덱스로 생성하면 다음과 같다.

create index idx_article_id_parent_comment_id_comment_id on comment (
    article_id asc,
    parent_comment_id asc,
    comment_id asc
);

페이징 쿼리

계층형 구조의 페이징 쿼리도 일반 게시글의 페이글 쿼리와 비슷하다.

일반 게시글의 페이징 쿼리 정리

페이지번호

1. 페이징 쿼리 - 커버링 인덱스를 사용해 Primary Index까지 두번 타지 않을 것

2. 페이지 번호 활성화를 위한 최소한의 count

무한스크롤

1. 1번 페이지

2. 2번 페이지 이상 - 마지막으로 불러온 데이터를 기준점으로 활용

계층형 구조에서의 페이징 쿼리

페이지번호

1. N번 페이지에서 M개의 댓글 조회

select * 
 from (
        select comment_id 
          from comment
         where article_id = {article_id}
         order by parent_comment_id asc, comment_id asc
         limit {limit} offset {offset}
) t left join comment on t.comment_id = comment.comment_id;

2. 최소한의 갯수

select count(*) 
  from ( 
         select comment_id 
           from comment 
          where article_id = {article_id} 
           limit {limit}
       );

무한스크롤

1. 1번 페이지

select * 
  from comment
 where article_id = {article_id}
 order by parent_comment_id asc, comment_id asc
 limit {limit};

2. 2번 페이지 이상

기준점이 2개: last_parent_comment_id, last_comment_id
parent_comment_id가 last_parent_comment_id와 같으면, comment_id도 비교한다

select * 
  from comment
 where article_id = {article_id} 
   and ( parent_comment_id > {last_parent_comment_id} or
        (parent_comment_id = {last_parent_comment_id} and comment_id > {last_comment_id}) )
 order by parent_comment_id asc, comment_id asc
 limit {limit};

구현

public class CommentService {
 
    //댓글 목록 조회 (페이지 번호 방식)
    public CommentPageResponse readAll(Long articleId, Long page, Long pageSize) {
        return CommentPageResponse.of(
            commentRepository.findAll(articleId, (page - 1) * pageSize, pageSize).stream()
                .map(CommentResponse::from)
                .toList(),
            commentRepository.count(articleId, PageLimitCalculator.calculatePageLimit(page, pageSize, 10L))
        );
    }

    //댓글 목록 조회 (무한 스크롤 방식)
    public List<CommentResponse> readAll(Long articleId, Long lastParentCommentId, Long lastCommentId, Long limit) {
        List<Comment> comments = lastCommentId == null || lastParentCommentId == null ?
            commentRepository.findAllInfiniteScroll(articleId, limit) :
            commentRepository.findAllInfiniteScroll(articleId, lastParentCommentId, lastCommentId, limit);
        return comments.stream()
            .map(CommentResponse::from)
            .toList();
    }

}

댓글 목록 조회 - 무한 depth

이번에는 최대 2 depth가 아니라, 무한 depth를 고려해보자.

depth가 n개이기 때문에 상하위 댓글이 재귀적으로 무한할 수 있어, 단순히 상위 댓글 ID와 댓글 ID만으로는 정렬할 수가 없다.

댓글 1,2,6,4,5,7 순으로 나타내야하는데, 상위댓글ID, 댓글ID로는 1,2,4로 6보다 4가 먼저 조회된다.

모든 상위 댓글의 정보가 필요하다.

만일 모든 상위 댓글의 정보를 인덱스로 생성하려면, 컬럼이 너무 많아지고 인덱스 성능도 떨어진다.

Path Enumeration(경로 열거) 방식

문자열 컬럼 1개를 도입해서, 문자열의 정렬을 이용할 수 있다.

각 depth에서 순서를 문자열로 나타내고, 이러한 문자열을 순서대로 결합하여 경로를 나타내는 것이다.

위와 같이 각 depth 별로 5자리 문자열로 모든 상위 댓글에서 각 댓글까지의 경로 정보를 저장한다. (N depth는 N*5개의 문자열)

각 경로는 상위 댓글의 경로를 상속하며, 각 댓글마다 독립적이고 순차적인(문자열 순서) 경로가 생성된다.

위와 같이 경로 열거 방식을 이용하면 문자열 정렬로 경로를 정렬할 수 있게 된다!

데이터베이스 collation

그런데 각 경로를 depth 별로 5개의 문자로 나타낼때,

10개의 숫자로 나타낸다면 각 경로별로 표현할 수 있는 경로의 범위가 10^5개로 제한된다. (00000~99999)

문자열 정렬에는 알파벳도 활용이 가능하므로, 0~9(10개), A~Z(26개), a-z(26개) 62개의 문자를 사용하면 62^5개(약 9억개)까지 표현 가능하다. (문자열 순서 = 0~9 < A-Z < a-z)

데이터베이스에서 이러한 문자열 순서를 나타내기 위해서는 collation 설정을 해야한다.

collation이란 문자열을 정렬하고 비교하는 규칙의 집합이다.

대소문자 구분, 악센트 포함 여부, 특정 언어의 정렬 순서 등을 포함
데이터베이스, 테이블, 컬럼 레벨에서 설정 가능
mysql default 설정은 utf8mb4_0900_ai_ci 인데, 대소문자 비교를 위해 utf8mb4_bin 설정을 사용해야 한다.
- utf8mb4 = 각 문자 최대 4바이트 utf8 지원
- 0900 = 정렬 방식 버전
- ai = 악센트 비구분
- ci = 대소문자 비구분

테이블 설계

//테이블 생성
create table comment_v2 (
    comment_id bigint not null primary key,
    content varchar(3000) not null,
    article_id bigint not null,
    writer_id bigint not null,
    path varchar(25) character set utf8mb4 collate utf8mb4_bin not null,
    deleted bool not null,
    created_at datetime not null
);

//collation 적용 확인
select table_name, column_name, collation_name
  from information_schema.COLUMNS
 where table_schema = 'comment' and table_name = 'comment_v2' and column_name = 'path';
 
//인덱스 생성
create unique index idx_article_id_path on comment_v2(article_id asc, path asc);

댓글의 경로를 나타내기 위해 path 컬럼을 추가하고, collation 설정을 한다.
개발 편의 및 서비스 제한 사항으로서 5 depth로 제한해 path의 크기는 VACHAR(5*5)로 만들었다.
path는 독립적인 경로를 가지므로, unique index로 생성한다.(애플리케이션에서의 동시성 문제를 막아줄 수 있을 것이다)
path에 인덱스를 생성하여 정렬 데이터를 관리하고, 페이징에 사용한다.

path 생성

경로는 공백("")에서부터 신규 댓글의 path를 만들어 붙이는 방식으로 구현할 것이다.

path는 어떻게 생성할 수 있을까?

신규 댓글의 path 결정하기

00a0z 댓글의 하위로 신규 댓글 작성 요청이 왔다. path는 어떻게 생성하면 될까?

00a0z의 하위 댓글 중에서 가장 큰 path(childrenTopPath) 00a0z 00002를 찾고, 여기에 1을 더하면 된다.

1) childrenTopPath 구하기

childrenTopPath(00a0z 00002)는 어떻게 찾을 수 있을까?

자손 댓글은 prefix가 상위 댓글의 path(parentPath)로 시작한다는 특성을 이용해볼 수 있다.

00a0z의 prefix(parentPath)를 가지는 모든 자손 댓글에서, 가장 큰 path(descendantsTopPath)를 찾아보자. descendantsTopPath는 신규 댓글의 depth와 다를 수 있지만, childrenTopPath를 포함한다.

descendantsTopPath에서 (신규 댓글의 depth * 5)까지만 남기고 잘라내면 childrenTopPath를 구할 수 있다.

descendantsTopPath를 구하는 쿼리

select path 
  from comment_v2
 where article_id = {article_id}
   and path like {parentPath}% // parentPath를 prefix로 하는 모든 자손 검색 조건
   and path > {parentPath} // parent 본인은 미포함 검색 조건
order by path desc limit 1; // 조회 결과에서 가장 큰 path

그런데 위 쿼리에서, 인덱스를 path asc로 설정했는데 order by apth desc가 먹힐까? 내림차순 정렬을 다시 하는 건 아닐까?

우려 사항과 다르게 idx_article_id_path 인덱스가 사용된 것을 확인할 수 있다

Extras=Using index를 통해 커버링 인덱스로 동작했음을 알 수 있다. 그리고 Backward index scan이란게 적혀있다.

Backward index scan는 인덱스를 역순으로 스캔하는 것인데, 인덱스 트리 leaf node 간에 연결된 양방향 포인터를 활용한다.

path > ‘00a0z’ and path like ’00a0z%’ 조건에 의해, 위 기준점 사이의 범위에 대해서 쿼리가 수행된다.

2) path에 1을 더하는 방법

문자열 기반으로 덧셈 연산을 수행해야 한다. 0-9 < A-Z < a-z 대소 관계를 이해하면, 덧셈 연산을 코드로 만들 수 있다.

“00000” -> “00001” -> … -> “AAAA9” -> ”AAAAA” -> … -> “zzzzz”

2가지 방식이 있는데, 우리는 숫자 덧셈으로 구하는 방법으로 구현해본다.

문자열 덧셈으로 구하기
0부터 z까지의 대소 관계를 정의하고, 오른쪽 문자부터 다음 문자로 바꿔준다.(1씩 증가)
carry(올림수)가 있으면(=z가 0으로 바뀌면), 다음 문자도 처리한다.
숫자 덧셈으로 구하기
62진수 문자열을 10진수 숫자로 바꿔서 +1한 후, 다시 숫자를 대하는 문자열로 바꿔준다.

예외 케이스

신규 path 생성 시 몇가지 예외 케이스도 살펴보자.

하위 댓글이 없어서 최초 생성이라면,
descendantsTopPath가 없으므로 00000을 붙인다.
이미 해당 경로에서 childrenTopPath = zzzzz까지 댓글이 생성되어 있다면,
값을 표현할 수 있는 범위(62^5개)를 벗어났기 때문에 더 이상 생성될 수 없다. (overflow)
이러한 문제는, 문자의 표현 개수(0-9,A-Z,a-z)나 각 depth 별 문자 개수(5개)를 늘림으로써 해결해볼 수 있다.

구현

Comment 테이블에서 path는 값객체로 생성

@Getter
@Setter
@NoArgsConstructor(access = AccessLevel.PROTECTED)
@Embeddable
public class CommentPath {
    private String path;

    private static final String CHARSET = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

    private static final int DEPTH_CHUNK_SIZE = 5;
    private static final int MAX_DEPTH = 5;

    // MIN_CHUNK = "00000"
    private static final String MIN_CHUNK = String.valueOf(CHARSET.charAt(0)).repeat(DEPTH_CHUNK_SIZE);
    // MAX_CHUNK = "zzzzz"
    private static final String MAX_CHUNK = String.valueOf(CHARSET.charAt(CHARSET.length() - 1)).repeat(DEPTH_CHUNK_SIZE);

	...
 
    public CommentPath createChildCommentPath(String descendantsTopPath) {
        if (descendantsTopPath == null) {
            return CommentPath.create(path + MIN_CHUNK);
        }

        String childrenTopPath = findChildrenTopPath(descendantsTopPath);
        return CommentPath.create(increase(childrenTopPath));
    }

    private String findChildrenTopPath(String descendantsTopPath) {
        return descendantsTopPath.substring(0, (getDepth() + 1) * DEPTH_CHUNK_SIZE);
    }

    private String increase(String path) {
        String lastChunk = path.substring(path.length() - DEPTH_CHUNK_SIZE);
        if (isChunkOverflowed(lastChunk)) {
            throw new IllegalStateException("chunk overflowed");
        }

        int charsetLength = CHARSET.length();

        int value = 0; //십진수로 변환해 1증가 시킨 후 다시 charset으로 변환
        for (char ch : lastChunk.toCharArray()) {
            value = value * charsetLength + CHARSET.indexOf(ch);
        }

        value = value + 1;

        String result = "";
        for (int i = 0; i < DEPTH_CHUNK_SIZE; i++) {
            result = CHARSET.charAt(value % charsetLength) + result;
            value = value / charsetLength;
        }

        return path.substring(0, path.length() - DEPTH_CHUNK_SIZE) + result;
    }
}

@Service
@RequiredArgsConstructor
public class CommentServiceV2 {
    private final Snowflake snowflake = new Snowflake();
    private final CommentRepositoryV2 commentRepository;

    @Transactional
    public CommentResponse create(CommentCreateRequestV2 request) {
        CommentV2 parent = findParent(request);
        CommentPath parentCommentPath = parent == null ? CommentPath.create("") : parent.getCommentPath();
        CommentV2 comment = commentRepository.save(
            CommentV2.create(
                snowflake.nextId(),
                request.getContent(),
                request.getArticleId(),
                request.getWriterId(),
                parentCommentPath.createChildCommentPath(
                    commentRepository.findDescendantTopPath(request.getArticleId(), parentCommentPath.getPath())
                        .orElse(null))
            )
        );

        return CommentResponse.from(comment);
    }
}

@Repository
public interface CommentRepositoryV2 extends JpaRepository<CommentV2, Long> {
    @Query("select c from CommentV2 c where c.articleId = :articleId and c.commentPath.path = :path")
    Optional<CommentV2> findByArticleIdAndPath(@Param("articleId") Long articleId, @Param("path") String path);

    @Query(
        value = "select path from comment_v2 " +
            "	 where article_id = :articleId " +
            "	 and path > :pathPrefix  " +
            "	 and path like :pathPrefix% " +
            "	 order by path desc limit 1 ",
        nativeQuery = true
    )
    Optional<String> findDescendantTopPath(@Param("articleId") Long articleId, @Param("pathPrefix") String pathPrefix);
}

페이징

페이지 번호 방식

1. 목록 조회: 커버링 인덱스 사용

2. 카운트 조회: 활성화 페이지를 나타내기 위한 최소한의 갯수만 카운트

무한스크롤

1. 첫번째 페이지

2. 마지막 경로

/**
 * 댓글 목록 조회(페이지 번호 방식)
 */
@Query(
    value = "select comment_v2.comment_id, comment_v2.content, comment_v2.path, comment_v2.article_id, " +
        "comment_v2.writer_id, comment_v2.deleted, comment_v2.created_at " +
        "from ( " +
        //comment_id가 PK이므로 covering index 동작
        "    select comment_id from comment_v2 where article_id = :articleId " +
        "    order by path asc " +
        "    limit :limit offset :offset " +
        ") t left join comment_v2 on t.comment_id = comment_v2.comment_id ",
    nativeQuery = true
)
List<CommentV2> findAll(
    @Param("articleId") Long articleId,
    @Param("offset") Long offset,
    @Param("limit") Long limit
);

@Query(
    value = "select count(*) from (" +
        "	select comment_id from comment_v2 where article_id = :articleId limit :limit" +
        ") t",
    nativeQuery = true
)
Long count(@Param("articleId") Long articleId, @Param("limit") Long limit);

/**
 * 댓글 목록 조회(무한 스크롤 방식)
 */
@Query(
    value = "select comment_v2.comment_id, comment_v2.content, comment_v2.path, comment_v2.article_id, " +
        "		comment_v2.writer_id, comment_v2.deleted, comment_v2.created_at " +
        "	from comment_v2 " +
        "	where article_id = :articleId " +
        "   order by path asc " +
        "	limit :limit ",
    nativeQuery = true
)
List<CommentV2> findAllInfiniteScroll(
    @Param("articleId") Long articleId,
    @Param("limit") Long limit
);

@Query(
    value = "select comment_v2.comment_id, comment_v2.content, comment_v2.path, comment_v2.article_id, " +
        "		comment_v2.writer_id, comment_v2.deleted, comment_v2.created_at " +
        "	from comment_v2 " +
        "	where article_id = :articleId and path > :lastPath " +
        "   order by path asc " +
        "	limit :limit ",
    nativeQuery = true
)
List<CommentV2> findAllInfiniteScroll(
    @Param("articleId") Long articleId,
    @Param("lastPath") String lastPath,
    @Param("limit") Long limit
);

참고 자료 & 이미지 출처
스프링부트로 직접 만들면서 배우는 대규모 시스템 설계 - 게시판

저작자표시 (새창열림)

'Architecture > 대규모 시스템 설계' 카테고리의 다른 글

Redis (feat.조회 수) (0)	2026.02.09
동시성 문제 (feat.좋아요 수) (0)	2026.02.08
대용량 데이터의 조회(feat.페이징,인덱스) (0)	2026.01.03
Primary key 생성 전략 (0)	2026.01.03
Distributed Database (0)	2026.01.02

'Architecture/대규모 시스템 설계' Related Articles

look-forest

계층형 구조와 페이징(feat.댓글) 본문

계층형 구조와 페이징(feat.댓글)

댓글 목록 조회 - 최대 2 depth

Adjacency List (인접 리스트) 방식

인덱스

페이징 쿼리

일반 게시글의 페이징 쿼리 정리

계층형 구조에서의 페이징 쿼리

구현

댓글 목록 조회 - 무한 depth

Path Enumeration(경로 열거) 방식

데이터베이스 collation

테이블 설계

path 생성

신규 댓글의 path 결정하기

예외 케이스

구현

페이징

'Architecture > 대규모 시스템 설계' 카테고리의 다른 글

티스토리툴바