ELION Lab

Language Intelligence & Representation

Our research group explores the frontiers of natural language processing and AI systems, guided by the vision of Elucidating Language Intelligence & RepresentatiON (ELION), to understand how language models represent knowledge, reason, and generate meaning toward interpretable and controllable language intelligence for real-world interaction.

Our Research Learn More

모집 중! (NOW HIRING!)

Ongoing Projects

ELION Lab은 언어 모델의 한계를 넘어, 세상을 이해하고 행동하는 차세대 AI를 구축하기 위해 다음의 핵심 과제들을 수행하고 있어요.

1. Continual Representation Learning

언어 모델이 새로운 정보를 배울 때 기존 지식을 잊지 않으면서 계속 업데이트할 수 있는 방법을 연구하고 있어요. 나아가 Life2Vec 스타일로 확장하여, 다양한 유형의 데이터를 활용해 인간과 모델의 생애 전반에 걸친 사건과 리스크를 예측하는 지속학습 시스템을 만들고 있어요.

* Status: 다수의 논문이 글로벌 탑티어 학회 심사 중 (Under Review)

2. Reducing Hallucination & RAG

LLM이 사실과 다른 내용을 생성하는 환각(Hallucination) 문제를 해결하기 위한 연구를 하고 있어요. Calibration과 Uncertainty Estimation으로 모델이 얼마나 확신하는지 측정하고, 검색 증강(RAG) 기반의 Grounding 기술을 개발해 생성 결과의 신뢰성을 높이고 있어요.

* Project: IITP 과제 수행 중 (2024-2026, 생성형 AI 성과물의 신뢰성 및 일관성 연구)

3. Multi-Agent Systems & Orchestration

하나의 모델로는 어려운 문제를, 여러 AI 에이전트가 함께 풀 수 있는 멀티에이전트 시스템을 만들고 있어요. 효율적인 오케스트레이션으로 Planning과 온톨로지 탐색을 연결하고, 작은 모델로도 복잡한 작업을 해낼 수 있는 에이전트 시스템을 목표로 해요.

* Status: 에이전트 관련 신규 연구 과제 지원 및 추진 중
* Collaboration: 고려대학교 Agentic AI Team과 긴밀한 협력 연구 진행

4. Multimodal LM & World Understanding

텍스트뿐 아니라 시각 등 여러 모달리티를 함께 다루며, 현실 세계의 맥락을 이해하는 연구를 하고 있어요. Vision-Language 모델의 Dependency Parsing과 Semantic Chunking 성능을 끌어올려 멀티모달 문서 검색과 이해를 개선하고, AI가 실제 세상의 상식까지 추론할 수 있도록 해요.

* Collaboration: 고려대학교 Document AI & Multimodal 팀과 공동 연구 수행 중

5. World-Interactive Data Augmentation

웹 데이터가 점점 고갈되는 시대에, 현실 환경과의 상호작용을 통해 데이터를 늘리는 방법을 연구해요. 환경과 주고받는 피드백으로 고차원 추론 데이터와 다양한 언어 데이터를 스스로 만들어내는 데이터 엔진 기술을 개발하고 있어요.

* Collaboration: 싱가포르 A*STAR Research 및 Microsoft Research Asia (MSRA)와 글로벌 협업 예정

ELION Lab is dedicated to pushing the boundaries of language models to build next-generation AI that interacts with the real world.

1. Continual Representation Learning

We investigate methods for language models to continuously update and refine their internal knowledge without experiencing "catastrophic forgetting." Expanding this into Life2Vec-style research, we aim to build lifelong learning systems that utilize AnyType data to predict events and risks across both human and model lifecycles.

* Status: Multiple papers currently under review at top-tier global conferences.

2. Reducing Hallucination & RAG

We explore cutting-edge methodologies to detect and mitigate the hallucination phenomenon in LLMs. Internally, we focus on measuring model confidence through calibration and uncertainty estimation. Externally, we strive to maximize the reliability and consistency of generated outputs by developing advanced grounding techniques within precise Retrieval-Augmented Generation (RAG) pipelines.

* Project: Supported by IITP (2024–2026, Research on the Reliability and Coherence of Generative AI Outcomes).

3. Multi-Agent Systems & Orchestration

Moving beyond the limitations of single models, we aim to develop practical and precise multi-agent systems. Through efficient orchestration, we integrate planning and ontology exploration to achieve high performance even with lightweight models, enabling them to execute complex, multi-step tasks.

* Status: New research proposals in progress; actively expanding the agentic AI agenda.
* Collaboration: Joint research with the Korea University Agentic AI Team.

4. Multimodal LM & World Understanding

We conduct research on integrating diverse modalities, such as vision, to understand physical and social contexts beyond text. By enhancing dependency parsing and semantic chunking in Vision-Language Models (VLMs), we innovate multimodal document retrieval and empower AI to reason with real-world common sense.

* Collaboration: Ongoing joint research with the Korea University Document AI & Multimodal Team.

5. World-Interactive Data Augmentation

To address the era of "data exhaustion" as online data becomes saturated, we explore world-interactive data augmentation. We research innovative data engines that autonomously generate high-level reasoning and multi-dimensional language data through direct feedback and interaction with real-world environments.

* Collaboration: Upcoming international collaboration with Singapore A*STAR Research and Microsoft Research Asia (MSRA).

ELION Lab에서 인공지능 연구에 열정을 지닌 인턴, 석사, 박사과정 학생을 모집합니다
(We are now looking for talented M.S/Ph.D students, and research interns.)

APPLY

주요 논문 (Featured Publications)

Selected recent papers from our research group.

ACL 2026 (Oral + Best Paper Award Nomination)

🌟 No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand

Jimin Jung, MyoungJin Kim, Jaehyung Seo*, Heuiseok Lim*

Annual Meeting of the Association for Computational Linguistics (ACL), 2026

Paper (TBD)

ACL 2026 (Oral)

🌟 HiKEY: Hierarchical Multimodal Retrieval for Open-Domain Document Question Answering

Joongmin Shin, Gyuho Shim, Jeongbae Park, Jaehyung Seo*, Heuiseok Lim*

Annual Meeting of the Association for Computational Linguistics (ACL), 2026

Paper (TBD)

ACL 2026

MMAC: A Multilingual, Multimodal Alignment Framework for Cultural Grounding Evaluation

Weihua Zheng, Zhengyuan Liu, Tanmoy Chakraborty, Weiwen Xu, Xiaoxue Gao, Bryan Chen Zhengyu Tan, Bowei Zou, Chang Liu, Yujia Hu, Xing Xie, Xiaoyuan Yi, Jing Yao, Chaojun Wang, Long Li, Rui Liu, Huiyao Liu, Koji Inoue, Ryuichi Sumida, Tatsuya Kawahara, Fan Xu, Lingyu Ye, Wei Tian, Dongjun Kim, Jimin Jung, Jaehyung Seo, Nadya Yuki Wangsajaya, Pham Minh Duc, Ojasva Saxena, Palash Nandi, Xiyan Tao, Wiwik Karlina, Tuan Luong, Keertana Arun Vasan, Roy Ka-Wei Lee, Nancy F. Chen

Annual Meeting of the Association for Computational Linguistics (ACL), 2026

Paper (TBD)

CVPR 2026 (Highlight)

🌟 Evidential Transformation Network: Turning Pretrained Models into Evidential Models for Uncertainty Estimation

Yongchan Chun, Chanhee Park, Jeongho Yoon, Jaehyung Seo*, Heuiseok Lim*

Conference on Computer Vision and Pattern Recognition (CVPR), 2026

Paper (TBD)

CVPR 2026

🌟 M3DocDep: Multi-modal, Multi-page, Multi-document Dependency Chunking with Large Vision-Language Models

Joongmin Shin, Jeongbae Park, Jaehyung Seo*, Heuiseok Lim*

Conference on Computer Vision and Pattern Recognition (CVPR), 2026

Paper (TBD)

EMNLP 2025

🌟 The Impact of Negated Text on Hallucination with Large Language Models

Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim*

Empirical Methods in Natural Language Processing (EMNLP), 2025

Paper

EMNLP 2025 Findings

🌟 KoLEG: On-the-Fly Korean Legal Knowledge Editing with Continuous Retrieval

Jaehyung Seo, Dahyun Jung, Jaewook Lee, Yongchan Chun, Dongjun Kim, Hwijung Ryu, Donghoon Shin, Heuiseok Lim*

Empirical Methods in Natural Language Processing (EMNLP) Findings, 2025

Paper

EMNLP 2025

🌟 MultiDocFusion: Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial Documents

Joong Min Shin, Chanjun Park, Jeongbae Park, Jaehyung Seo*, Heuiseok Lim*

Empirical Methods in Natural Language Processing (EMNLP), 2025

Paper

EMNLP 2025

🌟 Metric Calculating Benchmark: Code-Verifiable Complicate Instruction Following Benchmark for Large Language Models

Hyeonseok Moon, Seongtae Hong, Jaehyung Seo*, Heuiseok Lim*

Empirical Methods in Natural Language Processing (EMNLP), 2025

Paper

ICLR 2025

🌟 K-HALU: Multiple Answer Korean Hallucination Benchmark for Large Language Models

Jaehyung Seo, Heuiseok Lim*

International Conference on Learning Representations (ICLR), 2025

Paper

HCLT 2024 🏆 Best Paper

🌟 Post-negation Text Induce New Hallucinations in Large Language Models

Jaehyung Seo, Aram So, Heuiseok Lim*

Annual Conference on Human and Cognitive Language Technology (HCLT), 2024

Paper

ACL 2024 Findings

🌟 KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models

Jaehyung Seo, Jaewook Lee, Chanjun Park, SeongTae Hong, Seungjun Lee, Heuiseok Lim*

Annual Meeting of the Association for Computational Linguistics (ACL) Findings, 2024

Paper

EMNLP 2023

🌟 CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients

Jaehyung Seo, Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, Heuiseok Lim*

Empirical Methods in Natural Language Processing (EMNLP), 2023

Paper

Knowledge-Based Systems

🌟 PU-GEN: Enhancing generative commonsense reasoning for language models with human-centered knowledge

Jaehyung Seo, Dongsuk Oh, Sugyeong Eo, Chanjun Park, Kisu Yang, Hyeonseok Moon, Kinam Park, Heuiseok Lim*

Knowledge-Based Systems, 2022

Paper

IEEE Access

🌟 Plain Template Insertion: Korean-Prompt-Based Engineering for Few-Shot Learners

Jaehyung Seo, Hyeonseok Moon, Chanhee Lee, Sugyeong Eo, Chanjun Park, Jihoon Kim, Changwoo Chun, Heuiseok Lim*

IEEE Access, 2022

Paper

NAACL 2022 Findings

🌟 A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation

Jaehyung Seo*, Seounghoon Lee*, Chanjun Park, Yoonna Jang, Hyeonseok Moon, Sugyeong Eo, Seonmin Koo, Heuiseok Lim*

North American Chapter of the ACL (NAACL) Findings, 2022

Paper

Mathematics

🌟 Dense-to-Question and Sparse-to-Answer: Hybrid Retriever System for Industrial Frequently Asked Questions

Jaehyung Seo, Taemin Lee, Hyeonseok Moon, Chanjun Park, Sugyeong Eo, Imatitikua D Aiyanyo, Kinam Park, Aram So, Sungmin Ahn, Jeongbae Park*

Mathematics, 2022

Paper

HCLT 2021 🏆 Outstanding Paper

🌟 KommonGen: A Dataset for Korean Generative Commonsense Reasoning Evaluation

Jaehyung Seo, Chanjun Park, Hyeonseok Moon, Sugyeong Eo, Myunghoon Kang, Seounghoon Lee, Heuiseok Lim*

Annual Conference on Human and Cognitive Language Technology (HCLT), 2021

Paper

View All Publications

ELION Lab

최신 뉴스 (Latest News)

모집 중! (NOW HIRING!)

Ongoing Projects

1. Continual Representation Learning

2. Reducing Hallucination & RAG

3. Multi-Agent Systems & Orchestration

4. Multimodal LM & World Understanding

5. World-Interactive Data Augmentation

1. Continual Representation Learning

2. Reducing Hallucination & RAG

3. Multi-Agent Systems & Orchestration

4. Multimodal LM & World Understanding

5. World-Interactive Data Augmentation

주요 논문 (Featured Publications)

🌟 No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand

🌟 HiKEY: Hierarchical Multimodal Retrieval for Open-Domain Document Question Answering

MMAC: A Multilingual, Multimodal Alignment Framework for Cultural Grounding Evaluation

🌟 Evidential Transformation Network: Turning Pretrained Models into Evidential Models for Uncertainty Estimation

🌟 M3DocDep: Multi-modal, Multi-page, Multi-document Dependency Chunking with Large Vision-Language Models

🌟 The Impact of Negated Text on Hallucination with Large Language Models

🌟 KoLEG: On-the-Fly Korean Legal Knowledge Editing with Continuous Retrieval

🌟 MultiDocFusion: Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial Documents

🌟 Metric Calculating Benchmark: Code-Verifiable Complicate Instruction Following Benchmark for Large Language Models

🌟 K-HALU: Multiple Answer Korean Hallucination Benchmark for Large Language Models

🌟 Post-negation Text Induce New Hallucinations in Large Language Models

🌟 KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models

🌟 CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients

🌟 PU-GEN: Enhancing generative commonsense reasoning for language models with human-centered knowledge

🌟 Plain Template Insertion: Korean-Prompt-Based Engineering for Few-Shot Learners

🌟 A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation

🌟 Dense-to-Question and Sparse-to-Answer: Hybrid Retriever System for Industrial Frequently Asked Questions

🌟 KommonGen: A Dataset for Korean Generative Commonsense Reasoning Evaluation