Building a Universal Ontology for Vietnamese Language

Abstract- An ontology is a method to gestate a peculiar cognition sphere into a standard signifier that allows user and computing machine systems to pass on briefly by back uping information exchange based on semantic instead than merely syntax. With the purpose of lending to Vietnamese linguistic communication processing researches, the first version of OVL ( Ontology for Vietnamese Language ) has been developed under a undertaking in the Natural Language Processing of Knowledge Engineering Group at the University of Information Technology. In this paper, an enhanced methodological analysis utilized to develop OVL is illustrated, including OVL construction and its rating method.

Keywords- Ontology, Knowledge Engineering, Natural Language Processing.

Introduction

An ontology is a method to gestate a peculiar cognition sphere into a standard signifier that allows user and computing machine systems to pass on briefly by back uping information exchange based on semantic instead than merely sentence structure. An ontology consists of entities, properties, relationships and maxims to supply a common apprehension about specific scope of cognition [ 2 ] . A powerful ontology fundamentally is the one which can keep a tremendous cognition sphere and depict its internal constructs in the most specific degree. In the last decennary, there are several surveies in this country to develop some largest English ontology that have been published such as SUMO[ 1 ], Yago[ 2 ], etc. Up to show, there are some surveies and applications for Vietnamese linguistic communication nevertheless, none of them have been widely published ( harmonizing to the current statistics ) by Vietnamese ontology research community.

Therefore, with the purpose of lending to Vietnamese linguistic communication processing researches, the first version of OVL ( Ontology for Vietnamese Language ) has been developed under a undertaking in the Natural Language Processing of Knowledge Engineering Group at the University of Information Technology. In this paper, an enhanced methodological analysis utilized to develop OVL is illustrated, including OVL construction and its rating method. As a consequence of this first version, the OVL will be kept on widening at following versions in the hereafter.

Structure of Ontology for Vietnamese Language

Structure of OVL ontology is defined as:

Oxygen: = { C, I, E, T, R, A }

In which,

Degree centigrades: set of categories ( or constructs ) . Concepts can be crude constructs or defined from others. For case:

“ Tha»?_thao_tri_tu ”

“ Tha»?_thao_vo_thua?­t ”

“ Tha»?_thao_A‘a»“ng_A‘a»™i ”

“ Tha»?_thao_d?°a»›i_n?°a»›c ”

“ Tha»?_thao_ma??o_hia»?m ”

“ Tha»?_thao_bai_bia»?n ”

I: set of persons, besides called cases of category. It is considered as ‘data ‘ of OVL

E = , where Tocopherol is set of entities in OVL. Classs, constructs and persons are by and large called as entities.

Thymine: construct hierarchy ( Taxonomy ) , it organizes constructs into sub-concepts or ace constructs. The procedure of forming constructs into hierarchy manifests the generalization or particular of each construct in OVL.

Roentgen: set of dealingss between entities in OVL. These dealingss are besides called as properties of given category, they will use to cases of categories.

A: set of limitations of categories, persons, dealingss. These limitations are represented through quantitative notations ; logical operations ; and cardinalities supported by OWL-DL e.g. .

Constructing an Ontology for Vietnamese Language

Methodologies for ontology development have been capable to research for old ages and the research consequences, so far, have been a assortment of different attacks [ 1, 3, 4, 6, 7, 8 ] . In 1997, Fernandez et Al. proposed methodological analysis, named METHONTOLOGY [ 3 ] , it is reasonably elaborate and contains phases in ontology development such as cognition acquisition, integrating portion, execution, rating, certification. Each stage consequences in a papers that describes the ontology developed so far. However, this methodological analysis lacks of care stage for ontology.

Noy and McGuiness are the first 1s that discuss appellative conventions and explicate why this is of import [ 5 ] . Their methodological analysis was proposed in 2001, it specifically references detailed stairss in an ontology life rhythm except rating and care stage.

In the twelvemonth of 2005, Annika A-hgren and Kurt Sandkuhl present an ontology development theoretical account dwelling of four stages: demand analysis stage, constructing stage, execution stage, rating and care stage. However, rating and care stage have non clearly discussed put to deathing methods and back uping tools.

Based on the analysis advantages and disadvantages of these attacks, we propose an enhanced methodological analysis along with specific stages to guarantee the life rhythm and possibility to widen OVL easy.

Requirements analysis stage

The range of OVL is cosmopolitan cognition in Vietnamese linguistic communication. This sort of ontology is developed to supply one of the most of import cognition in the Fieldss of Vietnamese linguistic communication processing. The beginning of OVL is collected from the most esteemed web sites in Vietnam [ 9 ] .

In OVL, all of conventions, constructs and dealingss are defined. We purely adhere to these conventions because non merely it makes our ontology easier to understand but besides helps other ontology developers in the hereafter attack OVL more convenient. For illustration, we capitalize category names in the first letters and utilize lower instance for slot names ( the system is case-sensitive ) . The belongingss in OVL frequently start with brief letters to which unveil specific category it belongs. The infinite between words will be replaced by particular character ( ‘_ ‘ ) because our back uping ontology editor – Protege version 3.4.1, does non let utilizing infinite to divide words.

OVL is besides developed by OWL-DL, as standardised linguistic communication. It is one of OWL linguistic communication types ( e.g. OWL-Lite, OWL-DL, or OWL-Full ) recommended by W3C ( World-Wide-Web Consortium ) to construct ontology. OWL-Lite advocates the needing in a categorization hierarchy and simple restraint characteristics whereas OWL-Full for users who need maximal expressiveness and the syntactic freedom of RDF with no computational warrants. It is improbable that any concluding package will be able to back up every characteristic of OWL-Full. And OWL-DL is chosen because it balances two types of these linguistic communications when it offers maximal expressiveness without losing computational completeness and decidability.

Building stage

In the beginning of this stage, we try to do a list of common constructs in general cognition ( the beginning of this cognition has been determined in the first stage ) . From this list, we seek out for possible dealingss and any belongingss which relate to one of these constructs.

The following two undertakings in this stage – specifying the categories, organizing category hierarchy ; specifying belongingss for each construct – are closely intertwined. It is difficult to make one of them foremost and so do the other. Typically, we create a few definitions of the constructs in the hierarchy, and so, go on by depicting belongingss of these constructs and so on. These two stairss are besides the most importantA 1s in the ontology-design procedure. They are developed iteratively in the life rhythm of OVL until it fundamentally satisfies the initial ontology ‘s demands.

Specify the categories and the category hierarchy

It is utile to form the constructs in a hierarchy instead than a level list because it will back up better pilotage and enable ontology-users easy figure out the degree of generalization or specificity of the construct that is appropriate for the state of affairs. Therefore, we create a category hierarchy in OVL establishing on the list of defined constructs. There are several possible attacks in developing a category hierarchy [ 5 ] :

Top-down: This development procedure starts with the definition of the most general constructs in the sphere and subsequent specialisation of the constructs.

Bottom-up: On the contrary with top-down procedure, it begins with the definition of the most specific categories, the foliages of the hierarchy, with subsequent grouping of these categories into more general constructs.

Combination: It fundamentally is a combination of the top-down and bottom-up attacks: We define the more outstanding constructs foremost, so, generalise and detail them suitably.

None of these three methods is better than any of the others. Choosing one attack depends strongly on the position of the ontology ‘s sphere. In OVL, we employ the combination attack since the constructs “ in the center ” have a inclination to be the more descriptive constructs in the sphere.

Whichever attack we choose, we normally start by specifying categories. From the list created in old measure, we extend or analyze constructs into much more constructs. These constructs will be named categories in the OVL ontology so as to form into a category hierarchy. In the procedure of making category hierarchy, we besides need to find whether a category has sub-classes or non. Two popular dealingss helps us to verify subclasses of a category are ‘is-a ‘ and ‘kind-of ‘ relation. Besides, there are some standards to specify subclasses in OVL ontology:

Subclasss of a category normally have extra belongingss that the ace category does non hold.

Restrictions in subclasses different from those of the ace category.

Subclasss participate in different relationships that their ace category does non.

Classs frequently declare to disassociate with each other, these thing is advocated in OWL-DL every bit good as in ontology editor – Protege . These separate categories will non accept any common case ( i.e. cases to which belong a category will non be in other category if these categories are disjointed categories ) .

Specify the belongingss of categories – slots

This stage, we create a list of belongingss based on named categories. Each component of belongings list will be fixed into some categories in OVL.

There are two types of belongings in OVL: informations type belongings and object belongings.

Data type belongings: Its spheres are named categories whereas their scope is one of the XML Schema informations types ( e.g. int, boolean, float, twine, day of the month, datetime ) . Data type belongings describes the relation between cases of categories and specific value ( such as RDF misprints ) .

Object belongings: We set its sphere and scope in some particular one. Besides, OWL-DL allows users to delegate possible features to this sort of belongings ( functional, reverse functional, symmetric, or transitive ) . In OVL, these features support us holding more close limitations. They are besides one of the chief factors to carry through more peculiar undertaking as logical thinking, inquiry answering.

Create limitation

From types of belongingss ( object property/data type belongings ) in conformity with limitations which are supported by OWL-DL ( allValuesFromiˆ? , someValuesFromiˆ¤ , hasValuei?Z , Cardinality = , minCardinality & gt ; = , maxCardinality & lt ; = ) , we define possible category limitations between categories and user defined belongingss. These categories will be marked as defined categories and their limitations will assist make their cases more logical and faster. In add-on, these category limitations partially expose the semantic in OVL ontology.

Create instances/ persons

Making cases belong to categories in category hierarchy is the concluding measure of fixing informations for ontology development. In OVL, specifying cases ( or persons ) have to take these stairss:

Choice appropriate category which holds an case.

Create case.

Fill out belongingss of this case. Obviously, it along with its keeping category has common belongingss.

Deciding whether a peculiar construct is a category in an case or ontology depends on what the possible applications of the ontology are. Cases are the most specific constructs represented in OVL. It is considered as “ informations ” for it. The more figure of “ informations ” , the more OVL competency will label semantic note in Vietnamese texts. These persons besides contribute to capableness to react -Wh inquiry types from Vietnamese question.

Execution stage

Until now, we have already had “ informations ” dwelling of defined categories, belongingss and limitations. This stage, we decide to utilize Protege as the most convenient ontology editor. Before inputting defined categories, belongingss and limitations into Protege , it is required to be standardise them decently ( taking or replacing particular characters which seem to be no significance into lower elan ) . Protege besides generate automatically theoretical accounts that represent named elements in OVL ontology through frames.

Evaluation, care stage

After OVL is wholly developed, it is necessary to see methods to measure the ontology ‘s rightness.

In the procedure of rating, an OWL 2 ratiocinator – named Pellet[ 3 ]is used to guarantee OVL non incorporating contradictory facts and to look into if it is possible for a category to hold any cases. If category is inappropriate, so specifying an case of the category will do the whole ontology to be inconsistent. Furthermore, we have to chalk out a list of inquiries that a cognition base based on the ontology should be able to reply. These are called as competence inquiries [ 4 ] . These inquiries will function as the hints to measure: Does the ontology contain adequate information to reply these types of inquiries? Do the replies require a peculiar degree of item or representation of a peculiar country? These competence inquiries are merely a study and do non necessitate to be thorough. By taking full advantage of tabular array of competence inquiries, we can calculate out our ontology is deficiency of what facets of cognition, which construct is still equivocal or non defined yet.

Pellet is a complete and capable of infering with OWL-DL with acceptable public presentation. It is an open-source Java under a really broad licence. Pellet is used in a figure of commercial undertakings every bit good as academic researches. It non merely inspects contradictory constructs but besides brings to us a sentence structure checker through the development of OVL. It besides base on the limitation of categories to make up one’s mind whether a category hierarchy of ontology is logical yet.

In response to the alteration in footings of information every twenty-four hours, we realize that OVL need to be maintained on a regular basis. It is besides utile to qualify for basic operations such as adding, taking and redacting inside OVL. The care procedure will be performed monthly by one of members of our squad. In this period, new cognition beginnings, feedbacks from debuggers will be accumulated and transferred to maintenance employee at specific clip.

Documentation

Documentation should be done after each stage in the development of OVL ontology, any jobs or occurred options must be stored in footings of paperss as package undertaking direction. These paperss are hints for debuggers to follow inspected mistakes in our ontology. It is besides more convenient for OVL ontology to be extended, developed by entire fresh members subsequently.

Development

In the procedure of developing this cosmopolitan ontology for Vietnamese linguistic communication, we besides anticipate capablenesss to take full advantage of ontologies that meet our proposed standards. Resources from these ontologies ( if any ) will be selected or filtered into appropriate entities that are extended so as to incorporate to larger scale ontology. In the instance of these resources are developed by dissimilar criterion of linguistic communication, the process to modify them to our standards ontology linguistic communication is wholly imaginable. The most good from this integrating helps us shorten cost to achieve voluminous cognition base. However, there is still no developed ontology which can fit the purposes of OVL up to this clip. Consequently, the ability to incorporate from other beginning should be eliminated.

The version 1.0 of OVL has been built in the span clip from October- 2009 to February- 2010, it consists of:

Classs: A A 395.

Persons: 5780,

Properties: 60,

Restrictions: 30,

Protege : version 3.4.1.

Besides, we besides developed public-service corporation tool to pull off and lend to measure our OVL ontology. It was written in Java along with the usage of unfastened API from Protege to interact intensively into owl extension exporting from OVL. Statistical informations on sum of categories, limitations, persons, belongingss and basic operations such as redaction, adding are the chief intents of this direction tool. Furthermore, questioning ontology ( utilizing SPARQL questioning linguistic communication ) and sing a portion of OVL in footings of graphs are besides advocated by this tool.

Decisions

In this paper, the construction of OVL is represented, a cosmopolitan ontology for Vietnamese linguistic communication with the purpose of constructing an extended cognition base in many Fieldss. By analysing proposed methodological analysiss for ontology development along with demands that have been early expressed, we besides plan specific procedures to vouch the applicable competency and the advantage to be built by many other members in OVL.

At the present, a peculiar faculty which is supported to bring forth automatically multiple pick inquiry based on OVL are besides be aftering to be implemented. This faculty has been constructed to parallel with the ongoing development of this ontology, partially showing applicable possibility of OVL to real-life, partially giving tool to measure the degree of rightness in depicting constructs of our ontology.

The following of import mission is using this proposed methodological analysis to multiply the measure of informations in OVL. It is an indispensable measure to beef up the competency of any ontology. In the 2nd version of OVL, we are traveling to widen OVL up to 5,000 categories, 20,000 persons and at least 10,000 dealingss around these entities.

Leave a Reply

Your email address will not be published. Required fields are marked *