NMIXX: Domain-Adapted Neural Embeddings for Cross-Lingual eXploration of Finance
Published 7 Nov 2025 · arXiv · Hanwool Lee
Overview
NMIXX is a suite of domain-adapted neural embedding models designed to improve cross-lingual financial text analysis, particularly for low-resource languages such as Korean. It addresses challenges in capturing financial semantics and vocabulary alignment across languages.
Key Insights
- Performance Improvement: NMIXX's multilingual bge-m3 variant shows a Spearman's rho gain of +0.10 on English FinSTS and +0.22 on KorFinSTS, outperforming other models.
- Benchmark Release: KorFinSTS, a Korean financial STS benchmark, is introduced to highlight nuances missed by general benchmarks.
- Model Adaptation: Models with richer Korean token coverage adapt more effectively, emphasizing the role of tokenizer design.
BFSI Relevance
- Why Relevant: Enhances multilingual financial analysis capabilities, crucial for global financial operations.
- Primary Sector: Financial Services
- Subsectors: Asset Management, Corporate Banking
- Actionable Implications:
- Adopt NMIXX for improved cross-lingual financial document analysis.
- Utilize KorFinSTS for benchmarking and improving financial text processing tools.
researcher peer-reviewed-paper global