MERaLiON is part of Singapore's National Multimodal Large Language Model (LLM) Programme to expand Singapore's capabilities in Artificial Intelligence (AI) research and innovation.The Programme was launched in collaboration with Singapore's Infocomm Media Development Authority (IMDA) and AI Singapore (AISG), leveraging on the high-performance computing resources from the National Supercomputing Centre (NSCC) Singapore.
A cornerstone of this Programme is the development of multimodal LLMs that are localized for Singapore and the region to understand context and values related to the diverse cultures and languages of Southeast Asia.
MERaLiON draws on Institute for Infocomm Research's (I2R) transformative work in speech and language research that has been widely applied in language transcription and translation to support various public agencies and private sector companies.
Developed to enhance the understanding of human communication dynamics through its multimodal integration, MERaLiON marks a significant leap forward in building the next bounds of AI capabilities for Singapore and the Southeast Asia region.
For better contextual understanding and versatility across different tasks, MERaLiON harnesses cutting-edge AI techniques to process and learn complementary patterns from diverse data sources in a single unified framework.The data sources include various forms of verbal, visual, auditory and audiovisual communication.
MERaLiON series excel in speech summarization, stance detection, inference, and contextual understanding, making it a versatile tool to power applications that demand deep understanding of context, intent, and interpretation of speech cues and paralinguistics nuances.
We have designed our data pipelines, model training, and evaluation frameworks with a strong emphasis on scalability, robustness, and adaptability, ensuring the model's effectiveness across different tasks and environments.
The 1st phase leverages on multimodal and multilingual representation learning, alignment for more effective training and better model generalization to comprehend colloquial language and solve downstream tasks.Unique to MERaLiON, the model caters for code-switching and offers key features that include:
Multilingual Speech Transcription and
Translation
Accurately transcribes and translates
speech across multiple languages, ensuring seamless communication in
diverse linguistic settings
Speech Summarization
Generates concise and coherent
summaries of lengthy speech recordings, enhancing accessibility and
productivity
Speech Question and Answer
Provides accurate and
contextually relevant answers to user queries by analysing and
understanding spoken input
Audio Scene Understanding
Identifies and interprets
the auditory environment to provide context-aware insights, such as
recognizing background sounds and events
Para-lingual
Understanding
Analyses non-verbal elements of speech,
such as tone, pitch, volume, intonation and non-lexical vocables to gain
deeper insights into speaker intent and sentiment
Support Local Speech Understanding
Specializes in
accurately processing the diverse linguistic landscape of Singapore and
Southeast Asia, including Singlish, regional dialects, and accents, to
promote inclusive and effective communication across multicultural
communities
Customer Interfacing Automation
Aid the interaction
between callers and call-takers by automatically transcribe and analyse
calls in multiple languages and dialects, extracting critical
information to ensure urgent cases are promptly followed up, to improve
customer satisfaction and overall efficiency such as in retail,
e-commerce, banking and public services
Knowledge Management and Discovery
Enable
businesses to analyse and discover new, valuable insights from
multimodal data (in text, speech, emotion and non-verbal formats) to
gain deeper understanding of customers for providing personalized
experiences and achieve better outcomes, such as in education or
telemedicine
Agentic Decision Making
Enable informed and
autonomous choices for real-time, evidence-based decision making as an
outcome of LLM managing and synthesizing insights from enormous datasets
that exceed human capacity.Applications include anomaly detection for
surveillance applications as well as real-time analysis and
recommendations for workflow automation