Using Corpus Analysis Software to Analyse Specialised Texts
What is a corpus?
In corpus linguistics, a corpus can be generally defined as… ‘a collection of naturally-occurring texts in a computer-readable format which can be retrieved and analyzed using corpus analysis software’ (Kennedy, 1998; McEnery & Wilson, 2001; O’Keeffe, A., McCarthy, M., & Carter, R. , 2007; Teubert & Cermakova, 2007)
Sources of language corpora
http://www.lexically.net/wordsmith/) ‘Paraconc’
Designing a specialized corpus
Corpus size
· There are no fixed ruled; depending on research purposes, availability of data and time.
· Large, general corpora may be less useful than small, focused corpora if searches are made on context-specific terms.
· There are limitations of ‘too small’ corpora e.g. not enough concepts, terms, or patterns under investigation.
· It is preferable to create a ‘monitor’ or ‘open’ corpus because specialized words/usage are dynamic.
Text extracts vs. full texts
· Depends on the aim of corpus compilation.
· Whole text offers more coverage because words or terms to be looked at may be randomly distributed throughout the text.
· Specific sections may be helpful if we are looking for words or phrase under particular content areas or want to create purposeful sub-corpora.
Number of texts
· Choices can be made between collect few texts of large size or a number of texts with smaller sizes.
· Choices can also be made between selecting texts written by one or two key writers or sources, or texts retrieved from different sources or written by different authors.
· Depends on your research focus e.g. to study overall language use or to study idiosyncrasy or linguistic choices preferred by particular writers.
Medium
· Can be spoken or written texts or mixed.
· Depends on research questions.
· Some practical factors should also be considered e.g.compiling spoken corpora can be time-consuming and needs special types of tagging.
Subject and text type
· Should mainly focus on the specialized text under investigation, although this is less clear-cut in multidisciplinary subjects.
· Texts may come from different subject if the research focus is on the study of particular language features rather than term extraction.
· Text types within a specialized subject field may vary from‘expert-to-expert’ texts to ‘expert-to-non-expert’ texts, or in other words, from technical to popular texts.
Other considerations
· Authorship: Texts written by experts in a field tend to present more reliable and authentic examples of specialized language.
· Language: Specialized texts can be stored and retrieved in the form of monolingual, comparable, or parallel corpora.
· Publication date: Texts should come from recent publications unless queries are made in relation to particular periods of time.
Sources of specialized texts
·Printed materials
· Word document
· CD-ROMs
· Texts on the Web
· Online databases
Getting started with Antconc
0 ความคิดเห็น:
แสดงความคิดเห็น