Here’s my notes from week 1 of Lancaster University’s MOOC ‘Corpus Linguistics’ (Haven’t got time to do the practical exercise, but this is twigging some thinking re my PhD thesis database!):

Screen Shot 2014-10-01 at 18.03.52 (3)

What is a corpus?

Why might I use corpus linguistics?

What is your research question/hypothesis?

What is annotation and markup?

Computers do not have the cultural knowledge that we have, so we have to mark-up the text so it can read the nuances, etc.

Types of Corpora

Come in different flavours, so different things can be assessed – e.g. date, time, genre, etc. Specific = outline the areas, but are also general ‘corpora’ – especially language (note difference between spoken/written).

Frequency data, concordances and collocation

A search, how often does it appear, but also how frequently per million words, and what kind of documents/context does it appear within.

Corpora and Language Teaching

This is less relevant to me, but interesting that need to identify the right words that help people understand which words are used frequently, so which to come first within a textbook – could be helpful within digital literacy training.

What can’t we do with corpus?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.