Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • People
  • Outputs
  • Organizations
  • Expertise & Skills

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • People
  • Outputs
  • Organizations
  • Expertise & Skills
  1. Outputs

Towards the Automated Population of Thesauri Using BERT: A Use Case on the Cybersecurity Domain

Conference Paper
Publication Date:
2024
abstract:
The present work delves into innovative methodologies leveraging the widely used BERT model to enhance the population and enrichment of domainoriented controlled vocabularies as Thesauri. Starting from BERT's embeddings, we extracted information from a sample corpus of Cybersecurity related documents and presented a novel Natural Language Processing-inspired pipeline that combines Neural language models, knowledge graph extraction, and natural language inference for identifying implicit relations (adaptable to thesaural relationships) and domain concepts to populate a domain thesaurus. Preliminary results are promising, showing the effectiveness of using the proposed methodology, and thus the applicability of LLMs, BERT in particular, to enrich specialized controlled vocabularies with new knowledge.
Iris type:
04.01 Contributo in Atti di convegno
Keywords:
Thesauri; Domain-specific language modeling; Semantic analysis; Knowledge Extraction; LLMs
List of contributors:
Lanza, Claudia; Guarasci, Raffaele; Portaro, Alessio; Taverniti, Maria; Cardillo, Elena
Authors of the University:
CARDILLO ELENA
GUARASCI RAFFAELE
TAVERNITI MARIA
Handle:
https://iris.cnr.it/handle/20.500.14243/450089
  • Use of cookies

Powered by VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)