A Natural Language Generation is a subfield of Computational Linguistics and language-oriented Artificial Intelligence research devoted to analyzing and imitating the production of written or spoken discourse. The survey of human linguistic communication coevals is a multidisciplinary endeavor, necessitating expertness in countries of linguistics, psychological science, technology and computing machine scientific discipline. One of the cardinal ends is to look into how computing machine plans can be made to bring forth high-quality natural linguistic communication text from computer-internal representations of information.
Natural linguistic communication coevals frequently is characterized as a procedure that has to get down from the communicative ends of the author or talker and needs to use some kind of planning to increasingly change over them into written or spoken words. In this position, the general purposes of the linguistic communication manufacturer are refined into ends that are progressively lingual in nature, climaxing in low-level ends to bring forth peculiar words. Normally, a modularization of the coevals procedure is assumed which approximately distinguishes between a strategical ( make up one’s minding what to state ) and a tactical ( make up one’s minding how to state it ) portion. This strategy-tactics differentiation is partially mirrored by a differentiation between text planning and sentence coevals. Text planning is concerned with working out the large-scale construction of the text to be produced and may besides consist content choice. The consequence of this subprocess is normally taken to be a tree-like discourse construction, which has at each foliage an direction to bring forth a individual sentence. These instructions are so passed in bend to a sentence generator, whose undertaking can be farther subdivided into sentence planning, i.e. forming the content of each sentence, and the concluding measure of surface realisation, i.e. change overing sentence-sized balls of representation into grammatically right sentences.
The different types of coevals techniques can be classified into four chief classs:
Canned text systems constitute the simplest attack for single-sentence and multi-sentence text coevals. They are fiddling to make, but really inflexible.
Template systems, the following degree of edification, rely on the application of pre-defined templets or scheme and are able to back up flexible changes. The templet attack is used chiefly for multi-sentence coevals, peculiarly in applications whose texts are reasonably regular in construction.
Phrase-based systems employ what can be seen as generalised templets. In such systems, a phrasal form is foremost selected to fit the top degree of the input, and so each portion of the form is recursively expanded into a more specific phrasal form that matches some subportion of the input. At the sentence degree, the phrases resemble phrase construction grammar regulations and at the discourse degree they play the function of text programs.
Feature-based systems, which are as yet restricted to single-sentence coevals, represent each possible minimum option of look by a individual characteristic. Consequently, each sentence is specified by a alone set of characteristics. In this model, coevals consists in the incremental aggregation of characteristics appropriate for each part of the input. Feature aggregation itself can either be based on fusion or on the traverse of a characteristic choice web. The expressive power of the attack is really high since any differentiation in linguistic communication can be added to the system as a characteristic. Sophisticated feature-based generators, nevertheless, require really complex input and do it hard to keep feature interrelatednesss and command characteristic choice.
Many natural linguistic communication coevals systems follow a intercrossed attack by uniting constituents that utilize different techniques.
hypertext transfer protocol: //en.wikipedia.org/wiki/Natural_language_generation
Natural Language Generation ( NLG ) is the natural linguistic communication treating undertaking of bring forthing natural linguistic communication from a machine representation system such as a cognition base or a logical signifier.
In a sense, one can state that an NLG system is like a transcriber that converts a computing machine based representation into a natural linguistic communication representation. However, the methods to bring forth the concluding linguistic communication are really different from those of a compiler due to the built-in expressivity of natural linguistic communications.
NLG may be viewed as the antonym of natural linguistic communication apprehension. The difference can be put this manner: whereas in natural linguistic communication understanding the system needs to disambiguate the input sentence to bring forth the machine representation linguistic communication, in NLG the system needs to do determinations about how to set a construct into words.
The simplest ( and possibly fiddling ) illustrations are systems that generate signifier letters. Such systems do non typically affect grammar regulations, but may bring forth a missive to a consumer, e.g. saying that a recognition card disbursement bound is about to be reached. More complex NLG systems dynamically create texts to run into a communicative end. As in other countries of natural linguistic communication processing, this can be done utilizing either expressed theoretical accounts of linguistic communication ( eg, grammars ) and the sphere, or utilizing statistical theoretical accounts derived by analyzing human-written texts.
NLG is a fast-evolving field. The best individual beginning for up-to-date research in the country is the SIGGEN part of the ACL Anthology. Possibly the closest the field comes to a specializer text edition is Reiter and Dale ( 2000 ) [ 1 ] , but this book does non depict developments in the field since 2000.
The procedure to bring forth text can be every bit simple as maintaining a list of transcribed text that is copied and pasted, perchance linked with some glue text. The consequences may be satisfactory in simple spheres such as horoscope machines or generators of individualized concern letters. However, a sophisticated NLG system needs to include phases of planning and meeting of information to enable the coevals of text that looks natural and does non go insistent. Typical phases are:
Contented finding: Deciding what information to reference in the text. For case, in the pollen illustration above, make up one’s minding whether to explicitly reference that pollen degree is 7 in the south E.
Discourse planning: Overall administration of the information to convey. For illustration, make up one’s minding to depict the countries with high pollen degrees foremost, alternatively of the countries with low pollen degrees.
Collection: Merging of similar sentences to better readability and naturalness. For case, unifying the two sentences Grass pollen degrees for Friday have increased from the moderate to high degrees of yesterday and Grass pollen degrees will be about 6 to 7 across most parts of the state into the individual sentence Grass pollen degrees for Friday have increased from the moderate to high degrees of yesterday with values of about 6 to 7 across most parts of the state.
Lexical pick: Putt words to the constructs. For illustration, make up one’s minding whether medium or chair should be used when depicting a pollen degree of 4.
Mentioning look coevals: Making mentioning looks that identify objects and parts. For illustration, make up one’s minding to utilize in the Northern Isles and far nor’-east of mainland Scotland to mention to a certain part in Scotland. This undertaking besides includes devising determinations about pronouns and other types of anaphora.
Realization: Making the existent text, which should be right harmonizing to the regulations of sentence structure, morphology, and writing system. For illustration, utilizing will be for the future tense of to be.
The popular media has been particularly interested in NLG systems which generate gags ( see computational wit ) . But from a commercial position, the most successful NLG applications have been data-to-text systems which generate textual sum-ups of databases and informations sets ; these systems normally perform informations analysis every bit good as text coevals. In peculiar, several systems have been built that produce textual conditions prognosiss from conditions informations. The earliest such system to be deployed was FoG [ 3 ] , which was used by Environment Canada to bring forth conditions prognosiss in French and English in the early 1990s. The success of FoG triggered other work, both research and commercial. Recent research in this country include an experiment which showed that users sometimes preferable computer-generated conditions prognosiss to human-written 1s, in portion because the computing machine prognosiss used more consistent nomenclature [ 4 ] , and a presentation that statistical techniques could be used to bring forth high-quality conditions forecasts [ 5 ] . Recent applications include the ARNS system used to summarize conditions in US ports.
In the 1990s there was considerable involvement in utilizing NLG to summarize fiscal and concern informations. For illustration the SPOTLIGHT system developed at A.C. Nielsen automatically generated clear English text based on the analysis of big sums of retail gross revenues data. [ 6 ] . More late there is turning involvement in utilizing NLG to summarize electronic medical records. Commercial applications in this country are get downing to look [ 7 ] , and research workers have shown that NLG sum-ups of medical informations can be effectual decision-support AIDSs for medical professionals [ 8 ] . There is besides turning involvement is utilizing NLG to heighten handiness, for illustration by depicting graphs and informations sets to blind people.
[ edit ] Evaluation
As in other scientific Fieldss, NLG research workers need to be able to prove how good their systems, faculties, and algorithms work. This is called rating. There are three basic techniques for measuring NLG systems:
task-based ( extrinsic ) rating: give the generated text to a individual, and measure how good it helps him execute a undertaking ( or otherwise achieves its communicative end ) . For illustration, a system which generates sum-ups of medical informations can be evaluated by giving these sum-ups to physicians, and measuring whether the sum-ups helps physicians make better determinations [ 8 ] .
human evaluations: give the generated text to a individual, and inquire him or her to rate the quality and utility of the text.
prosodies: comparison generated texts to texts written by people from the same input informations, utilizing an automatic metric such as BLEU.
By and large talking, what we finally want to cognize is how utile NLG systems are at assisting people, which is the first of the above techniques. However, task-based ratings are time-consuming and expensive, and can be hard to transport out ( particularly if they require topics with specialized expertness, such as physicians ) . Hence ( as in other countries of NLP ) task-based ratings are the exclusion, non the norm.
In recent old ages research workers have started seeking to measure how good human-ratings and prosodies correlate with ( predict ) task-based ratings. Much of this work is being conducted in the context of Generation Challenges shared-task events. Initial consequences suggest that human evaluations are much better than prosodies in this respect. In other words, human evaluations normally do predict task-effectiveness at least to some grade ( although there are exclusions [ 9 ] ) , while evaluations produced by prosodies frequently do non foretell task-effectiveness well. These consequences are really preliminary, hopefully better informations will be available shortly. In any instance, human evaluations are presently the most popular rating technique in NLG ; this is contrast to machine interlingual rendition, where prosodies are really widely used.
hypertext transfer protocol: //adsabs.harvard.edu/abs/1988STIN… 8912800M
The end of natural linguistic communication coevals is to retroflex human authors or talkers: to bring forth fluent, grammatical, and consistent text or address. Produced linguistic communication, utilizing both expressed and inexplicit agencies, must clearly and efficaciously show some intended message. This demands the usage of a vocabulary and a grammar together with mechanisms which exploit semantic, discourse and matter-of-fact cognition to restrain production. Furthermore, particular processors may be required to steer focal point, extract presuppositions, and maintain coherence. As with reading, coevals may necessitate cognition of the universe, including information about the discourse participants every bit good as cognition of the specific sphere of discourse. All of these procedures and cognition beginnings must collaborate to bring forth well-written, unambiguous linguistic communication. Natural linguistic communication coevals has received less attending than linguistic communication reading due to the nature of linguistic communication: it is of import to construe all the ways of showing a message but we need to bring forth merely one. Furthermore, the productive undertaking can frequently be accomplished by transcribed text ( e.g. , error messages or user instructions ) . The coming of more sophisticated computing machine systems, nevertheless, has intensified the demand to show multisentential English.
hypertext transfer protocol: //www.cs.umanitoba.ca/~ckemke/74.793/public_html/Notes-2004/Presentations/NLG_Final_Copy.pdf
hypertext transfer protocol: //www.aaai.org/AITopics/pmwiki/pmwiki.php/AITopics/NaturalLanguageUnderstanding
hypertext transfer protocol: //www.osmania.ac.in/sanskritacademy/Research/data/E-LIB/E-books/nlp-panini.pdf
Natural Language Processing
A Paninian Position
hypertext transfer protocol: //www.itri.brighton.ac.uk/topics/cl/generation.html
Natural Language Coevals
Many issues need to be addressed before a computing machine can be made to bring forth linguistic communication. For illustration: How does a talker produce a text to accomplish some end? What makes a text or a duologue coherent? What linguistics procedures and resources are required and how can they be obtained? What computational tools can be employed?
Our current research addresses these issues largely in the context of automatic package certification, the coevals of instructions, and summarization. In making so, we look at bring forthing non merely English, but besides assorted other European linguistic communications. We therefore study the phenomena involved from a multilingual position. We are besides concerned with placing how linguistic communication varies depending on its context of usage, which includes the individual for whom it is intended. For illustration, how is a set of instructions different when aimed at different readers. Such fluctuation is important in order to plan utile and flexible systems. Our research involves active interaction between AI specializers, linguists, psycholinguists, and experts in proficient authorship and translating.
Multilingual coevals of instructions
In two closely related undertakings, DRAFTER and GIST, we are concerned with the automatic drafting of multilingual instructional texts from an underlying cognition base that represents relevant objects, maps, procedures and actions. Both undertakings are committed to presenting prototype systems, in which an writer or a sphere expert specifies the information to be included in a given subdivision of the text from the underlying cognition base. The different linguistic communication versions are so automatically generated in analogue, harmonizing to defined demands which might fit a house manner or a controlled linguistic communication. These bill of exchanges can be revised as necessary. The AGILE undertaking aims to widen DRAFTER to three Eastern European linguistic communications ( Bulgarian, Czech and Russian ) .
As a development of this attack the CLIME undertaking aims to supply natural linguistic communication replies and accounts to legal questions. Using a natural alnguage generator interfaced to a legal illation engine, it will bring forth responses to questions associating to transporting ordinances and Torahs, from ship applied scientists and surveyors.
In add-on to these applied coevals undertakings, we besides continue more generic, theoretical research in natural linguistic communication coevals. In the RAGS undertaking, we aim to develop ( in coaction with the Unversity of Edinburgh ) a standard ‘reference architecture ‘ for coevals systems, and to supply standard resources to back up the development, proving and rating of such systems. The GNOME undertaking ( besides with Edinburgh and Durham ) aims to develop specific techniques for one peculiarly of import constituent of coevals engineering – the coevals of nominal looks.
In another undertaking, we are looking at the procedures involved in bring forthing sum-ups. More specifically, we are concerned with issues of concision of the text and explicitness of the information. Using discourse theory, we are analyzing the structural and grammatical determinations involved in the compaction of a text. This undertaking is being conducted jointly with the Instituto de Fisica, SA?o Carlos – USP, Brazil. Financial support is from the National Council for Scientific and Technology Development ( CNPq ) , and the Fapesp Project, Brazil.
For farther information, delight contact Donia Scott ( +44 1273 642901 ) – see our contact page for full contact inside informations.
Maintained by Roger Evans ( Roger.Evans @ itri.brighton.ac.uk ) .
hypertext transfer protocol: //www.calvin.edu/~kvlinden/distributions/vanderlinden-nlg-draft-1999.pdf
Kernighan oa‚¬a‚¬a‚¬ Ritchie The C Programming Language
hypertext transfer protocol: //books.google.com.my/books? hl=en & A ; lr= & A ; id=VoOLvxyX0BUC & A ; oi=fnd & A ; pg=PA147 & A ; dq=natural+language+generation & A ; ots=wse0KN0Kn2 & A ; sig=2mo0pEbWl6ULtzIK3_HjORcRU0M # v=onepage & A ; q=natural % 20language % 20generation & A ; f=false