Despite the presence of many systems for developing and managing structured taxonomies and/or SKOS models for a given domain for which small documents set are accessible, the production and maintenance of these domain knowledge bases is still a very expensive and time consuming process. OSIM proposes a solution for assisting expert users in the development and management of knowledge base, including SKOS and ontologies modeling structures and relationships. The proposed solution accelerates the knowledge production by crawling and exploiting different kinds of sources (in multiple languages and with several inconsistencies among them). The proposed tool supports the experts in defining relationships among the most recurrent concepts, reducing the time to SKOS production and allowing assisted production. The validity of the produced knowledge base has been assessed by using SPARQL query interface and a precision and recall model. The solution has been developed for Open Space Innovative Mind, with the aim of creating a portal to allow industries at posing semantic queries to discover potential competences in a large institution such as the University of Florence, in which several distinct domains are associated with its own departments.
LINK TO THE OSIM KNOWLEDGE ENGINE AND SEARCH INTERFACE.
Open Mind Innovative Space project has as main objective the realization of a portal on which the industries, students and interested researchers can pose questions with the aim of identifying the competences in terms of researchers and groups in the large knowledge of the University of Florence. In the literature, there is a number of systems that have been proposed to solve the above described problem of helping modeling knowledge bases, may be matching the demand (semantic query) against the offer (knowledge about domain). The accessible version is part of a wider project called OSIM, which has been partially founded by Fondazione Monte dei Paschi di Siena. The OSIM project is presently under development, in this page you can see the last public version of OSIM, only.
As previously stated, the main goal of the OSIM is to realize a service to industries on which they can pose questions with the aim of identifying researchers and groups with the needed competences, knowledge among those of the University of Florence. The University of Florence includes more than 50 different departments belonging to all the scientific sectors areas, and hosting about 2000 researchers and more than 400 labs with their web pages. Each researcher may also teach at 2-3 courses; thus about 6000 course programs that may be considered competence descriptors as well. Moreover, the several research departments and researchers participate to research projects, for a total of about other 20.000 descriptors, etc. In such a context, it is very hard to identify a manageable number of people that could be reasonably entitled in terms of skill to create a shared common SKOS. This is due to the fact that the whole knowledge model has to be extracted from a huge amount of information, ranging from health care to geometry and math, from engineering to agriculture, from mechanics to statistic and pharmaceutics, etc. And, the sources of this knowledge may change quite dynamically, every year the courses are updated, the CV of people change, other publications and projects arrive, etc.
OSIM General Architecture
On the basis of the above description, the available information can be ingested from a large amount of different sources. This highly dynamic collection of sources may be automatically gathered through the use of software agents and crawling tasks. The information gained can be used by a semantic search engine to answer user queries with a high degree of precision. For example, by using an assisted semantic query interface with natural language query engines.
The domain knowledge is composed by three self-supporting ontologies which are related by semantic relationships. Therefore, the basic elements of the knowledge base are those regarding:
- Friend of a Friend (FOAF) ontology used to model many properties about Person and Organization class (professor, phd students, students, researchers, contractors, their relationships, research classification as SSD, CUN, etc.): the name, the surname, the e-mail properties and the knows relationship (applicable to individual belonging to the Person class).
-
Academic life ontology is an ontology, we developed by DISIT people specifically for the Italian University case structure and terminology, that defines elements for describing universities and the activities that occur at them (labs, departments, faculties, research centers, groups, projects, courses, curricula, matter, projects, integrated labs, etc.). The main OWL entities and classes described by ontology are:
- Organization class describes physical structures of university like research center, departments and laboratories;
- People and role describe instances likefull-professors, researchers and PhD students, related and derived from FOAF concepts;
- Activity entities that cover concepts like pastprojects, ongoing projects and academic publications; To each person the specific publications are added as well, establishing in this way also relationships among the different authors.
- Competences SKOS: it is the SKOS ontology that describes the hierarchy of the technical skills of structures and people belonging to the given application context, taking into acount the multilingual aspects, synonyms, etc. This part of the knowledge is the most dynamic.
The components related to the Academic life ontology and to the FOAF are initialized and directly populated by gathering information from the University database and from other institutions. Among them the central CINECA servers. This operation is performed with a set of crawling tasks realized by using SOAP Client implemented in JAVA making use of JAX-WS.
On the basis of the described architecture, the most critical aspect is the modeling and population of the above mentioned Competence SKOS for the whole university area. Typically, in these cases the solution proposed is to manually produce a coarse classification. On the other hand, what it is really needed is to arrive at a SKOS strongly related to the real sources of descriptors to allow the automated classification and reasoning.
For these reasons we started with the idea of producing a solution for assisting expert users in the collaborative development and management of a Competence SKOS, the Collaborative SKOS Accelerator and Manager, CoSKOSAM. With the aim of accelerating the process of SKOS production and population. In the next section the identified requirements are presented.
Furthermore, the ontology is produced according to the OWL/RDF/SKOS rules and can benefit from emerging technologies and innovations offered by the semantic web and natural language processing. The generated ontology is used as information domain by a demand and supply system about academic skills. It is currently in connection with a semantic database is queried by performing SPARQL queries allowing:
- semantic search to retrieve ranked information. For computing ranking it is possible to make use of term frequency as a factor weighting within the ranking algorithm thus resulting roiboust on uncertainties;
- semantic indexing for search engine optimization and fuzzy queries, thus correcting eventual typos;
- exploiting inferential engine to increase the system intelligence, increasing roboustness via similarities and relationships among terms;
- improving the engine for providing results to the users and permitting them to navigate in the mesh of relationships among FOAF entities and results.
Project status
The status of the project can be summarized as follows:
- Departments: formerly 49, since 1st January 2013 the University has been reorganized in 24 new departments
- Keywords: 249000 from documents, and 140746 from CV courses, etc.
- Documents, more than 18000 (among them: CV, courses, etc.)
- People with courses: 2344, reseachers more than 1700
- Publications: all those which are present on CINECA data base; for about 80000 publications, 30000 authors, reconstructed from the registrations performed by more than 4000 people (professor, reseachers, phd students, etc.) of the UNIFI on the CINECA database of research product
- the whole semantic database consists of some thens of millions of triples.
Condition of the main departments on which the validation has been early performed (data updated at september 2013).
Dipartimento (UniFI NEW) | n° Keywords | n° Doc. | n° Persone |
Dipartimento di Architettura (DiDA) | 4385 | 1120 | 123 |
Biologia | 5713 | 258 | 42 |
Chirurgia e Medicina Traslazionale (DCMT) | 7231 | 1174 | 62 |
Chimica "Ugo Schiff" | 11147 | 489 | 88 |
Fisica e Astronomia | 12688 | 457 | 64 |
Gestione Sistemi Agrari, Alimentari e Forestali (GESAAF) | 4181 | 324 | 56 |
Ingegneria Civile e Ambientale (DICEA) | 3569 | 341 | 43 |
Ingegneria Industriale | 4796 | 579 | 58 |
Ingegneria dell'Informazione | 6462 | 399 | 59 |
Lettere e Filosofia | 2457 | 645 | 72 |
Lingue, Letterature e Studi Interculturali | 2826 | 647 | 49 |
Matematica e Informatica "Ulisse Dini" | 4597 | 570 | 90 |
Medicina Sperimentale e Clinica | 11738 | 2449 | 157 |
Neuroscienze, Psicologia, Area del Farmaco eSalute del Bambino (NEUROFARBA) | 10649 | 883 | 84 |
Storia, Archeologia, Geografia, Arte e Spettacolo (SAGAS) | 3372 | 882 | 88 |
Scienze Biomediche, Sperimentali e Cliniche | 8744 | 1263 | 100 |
Scienze per l'Economia e l'Impresa | 4527 | 1109 | 107 |
Scienze della Terra | 5685 | 286 | 42 |
Scienze della Formazione e Psicologia | 2114 | 368 | 40 |
Scienze Giuridiche (DSG) | 1448 | 1201 | 90 |
Statistica, Informatica, Applicazioni "G. Parenti" (DiSIA) | 4596 | 563 | 49 |
Scienze delle Produzioni Agroalimentari e dell'Ambiente (DISPAA) | 9099 | 405 | 76 |
Scienze Politiche e Sociali | 2332 | 945 | 51 |
Scienze della Salute (DSS) | 6390 | 830 | 63 |
TOTALE | 140746 | 18187 | 1753 |
The validation has been performed againt queries on these department while all the rest of departments have been processed for keyword extraction and document analysis. The proposed tool has been used to develop the full knowledge base for indexing the knowledge of the whole structures of the University of Florence. It presently consists of 24 departments, about distinct 250000 keywords, 140000 indexed, coming from about 18000 documents (as CV, courses, etc.), and 1753 people that have courses and CV, while the total amount of researchers is much larger. Moreover, the publications collected from CINECA area are about 80000, with about 30000 authors including professors, PhD students, visiting professor, temporary researchers, contractors, etc.
References
- A. Bellandi, P. Bellini, A. Cappuccio, P. Nesi, G. Pantaleo, N. Rauch, "ASSISTED KNOWLEDGE BASE GENERATION, MANAGEMENT AND COMPETENCE RETRIEVAL", International Journal of Software Engineering and Knowledge Engineering, World Scientific Publishing Company, press, vol.32, n.8, pp.1007-1038, Dec. 2012, DOI: 10.1142/S021819401240013X
- Pierfrancesco Bellini, Antonio Cappuccio, Paolo Nesi, Collaborative and Assisted SKOS Generation and Management, proc of the 17th international conference on Distributed Multimedia Systems, DMS2011, Convitto della Calza, Florence, Italy, 18-20 August 2011. organized and pressed by Knowledge Systems Institute, KSI, Skokie, IL, USA. The paper has been the recipient of the best paper award of the conference.
- OSIM technical reference manuals, confidential documentation of DISIT team, available under request.
- CoSKOSAM user manuals, accessible document to the groups that are working on the SKOS manager
Project team
OSIM framework is part of the project "Development of a Connection Platform Between University and Small and Medium Enterprises"; it has been partially founded by Fondazione MPS and is coordinated by Prof. M. Lombardi. The teams of Prof. M. Lombardi and Prof. P. Rissone are working on creating the competence knowledge models for their departments by using the CoSKOSAM tools. The design and development of the OSIM infrastrcuutre and solution has been performed by DISIT of DSI (https://www.disit.org/disitmn/) under the coordination of Prof. Paolo Nesi. The OSIM project is presently under development and validation in its new shape and semantic model. Several other aspects not mentioned here are under development and will be presented when they will be integrated in the public accessible version of OSIM!
The activity of OSIM project has been partially moved into the new SACVAR project.