วันอังคารที่ 14 สิงหาคม พ.ศ. 2561

Using Corpus Analysis week 3

Using Corpus Analysis Software to Analyse Specialised Texts


      What is a corpus?
 In corpus linguistics, a corpus can be generally defined as… ‘a collection of naturally-occurring texts in a computer-readable format which can be retrieved and analyzed using corpus analysis software (Kennedy, 1998; McEnery & Wilson, 2001; OKeeffe, A., McCarthy, M., & Carter, R. , 2007; Teubert & Cermakova, 2007)

Sources of language corpora

·                     http://www.natcorp.ox.ac.uk/ ·  
·                     http://corpus.leeds.ac.uk/protected/query.html 
·                     http://corpus.byu.edu/
·                     http://lextutor.ca/conc/eng/ 
·                     Antconc’ (http://www.antlab.sci.waseda.ac.jp/software.html)
http://www.lexically.net/wordsmith/) Paraconc’ 

Designing a specialized corpus
           Corpus size

·                      There are no fixed ruled; depending on research purposes, availability of data and time.
·                         Large, general corpora may be less useful than small, focused corpora if searches are made on context-specific terms.
·                        There are limitations of too small corpora e.g. not enough concepts, terms, or patterns under investigation.
·                          It is preferable to create a monitor or open corpus because specialized words/usage are dynamic.
Text extracts vs. full texts

·                          Depends on the aim of corpus compilation.
·                         Whole text offers more coverage because words or terms to be looked at may be randomly distributed throughout the text.
·                         Specific sections may be helpful if we are looking for words or phrase under particular content areas or want to create purposeful sub-corpora.
Number of texts

·                         Choices can be made between collect few texts of large size or a number of texts with smaller sizes.
·                           Choices can also be made between selecting texts written by one or two key writers or sources, or texts retrieved from different sources or written by different authors.
·                          Depends on your research focus e.g. to study overall language use or to study idiosyncrasy or linguistic choices preferred by particular writers. 
Medium

·                     Can be spoken or written texts or mixed.
·                      Depends on research questions.
·                      Some practical factors should also be considered e.g.compiling spoken corpora can be time-consuming and needs special types of tagging.
Subject and text type

·                     Should mainly focus on the specialized text under investigation, although this is less clear-cut in multidisciplinary subjects.
·                      Texts may come from different subject if the research focus is on the study of particular language features rather than term extraction.
·                      Text types within a specialized subject field may vary fromexpert-to-expert texts to expert-to-non-expert texts, or in other words, from technical to popular texts.
Other considerations

·                     Authorship: Texts written by experts in a field tend to present more reliable and authentic examples of specialized language.
·                      Language: Specialized texts can be stored and retrieved in the form of monolingual, comparable, or parallel corpora.
·                      Publication date: Texts should come from recent publications unless queries are made in relation to particular periods of time.

              Sources of specialized texts

·Printed materials
· Word document
· CD-ROMs
· Texts on the Web
· Online databases

     Getting started with Antconc
       Download the latest version of Antconc watch YouTube tutorials from http://www.antlab.sci.waseda.ac.jp/antconc_index.html


0 ความคิดเห็น:

แสดงความคิดเห็น

Music

 

Computer Application for English Language Teaching Template by Ipietoon Cute Blog Design