BFSI insights

NMIXX: Domain-Adapted Neural Embeddings for Cross-Lingual eXploration of Finance

Published 7 Nov 2025 · arXiv · Hanwool Lee
arXiv preview

Overview

NMIXX is a suite of domain-adapted neural embedding models designed to improve cross-lingual financial text analysis, particularly for low-resource languages such as Korean. It addresses challenges in capturing financial semantics and vocabulary alignment across languages.

Key Insights

  • Performance Improvement: NMIXX's multilingual bge-m3 variant shows a Spearman's rho gain of +0.10 on English FinSTS and +0.22 on KorFinSTS, outperforming other models.
  • Benchmark Release: KorFinSTS, a Korean financial STS benchmark, is introduced to highlight nuances missed by general benchmarks.
  • Model Adaptation: Models with richer Korean token coverage adapt more effectively, emphasizing the role of tokenizer design.

BFSI Relevance

  • Why Relevant: Enhances multilingual financial analysis capabilities, crucial for global financial operations.
  • Primary Sector: Financial Services
  • Subsectors: Asset Management, Corporate Banking
  • Actionable Implications:
    • Adopt NMIXX for improved cross-lingual financial document analysis.
    • Utilize KorFinSTS for benchmarking and improving financial text processing tools.
researcher peer-reviewed-paper global