Completing #CORPUSMOOC

During interactions with JISC and ALT in particular, MOOC’s have been hot news for quite some time. MOOC is an acronym for ‘Massive Open Online Courses’ – freely available to all. They don’t have the best reputation for completion rates, which has opened up a number of discussions at JISC/ALT events as to whether completion, and particularly full completion, of a MOOC is the point of these things. In 2012, JISC ran a session ‘What is a MOOC?’ – one of the early slides here:

whatisamooc

Picking a Course

Last year, I decided to get my head around these, wondering whether CODEC could usefully develop a MOOC (as the financial imperative is not clear, except as a marketing exercise, for many of these courses). I cheerily signed up to about 3 courses, and… didn’t get started on a single one of them, as other work priorities took over. As we were developing potential funding bids earlier this year related to ‘religious identity online’, Pete suggested that I undertake the ‘Corpus Linguistics’ module that he’d had a go at last year. I had no idea what that was – so as all good academic researchers do, popped across to Wikipedia first for a definition:

Corpus linguistics is the study of language as expressed in samples (corpora) of “real world” text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely derived by an automated process.

so I could see how this would be useful for analysing words collected from Twitter, Facebook, etc. to analyse large social and cultural questions.

Corpus Linguistics Online

The course, hosted on Futurelearn, and presented by Tony McEnery, Lancaster University, was designed as a practical course for humanities and social science researchers, with the following stated aims:

  • Demonstrate that corpus approaches to social science can offer valuable insight into social reality by investigating the use and manipulation of language in society.
  • Equip social scientists with skills necessary for collecting and analysing large digital collections of text (corpora).
  • Provide educational support for those who want to use the corpus method.
  • Demonstrate the use of corpus linguistics in the humanities, especially History.
  • Give a sense of the incredibly wide uses that corpora have been put to.
  • Allow those with an interest in language, who have not heard of the corpus approach before, a new way of looking at language.

Combined with all the recent changes in CODEC, the first week’s material seemed a little overwhelming, so early on I made a conscious decision to focus on the theoretical explanations each week (likely to take around 60-90 minutes per week, rather than 3-4 hours), so that I could grasp an understanding of the method, and the kinds of questions it can allow us to ask, rather than the practical aspects of the software (also provided), although having just finished the course, I am now checking out some of the optional videos, especially one from Claire Hardaker re trolling, as recently I’ve been asked to contribute to debates about trolling, bullying (and the place of restorative justice in these debates):

trolling-etc

The Process

First things first was making time in the diary for this. Originally expecting to take around 3 hours a week, this did drop, but I wanted the process to not just be about the course content, but about thinking how a MOOC works, and what it contributes to our learning, although much of this may have been absorbed at gut level, rather than laying it all out here, so this is more of a ‘quick and dirty’ response!

  • Sign up for the course on Futurelearn (there are other providers, including independently hosted MOOCS, e.g. this ‘open theology‘ module I’m undertaking with WTC)
  • Make time, and put reminders in my todoist! Get clear on what I actually wanted to get out of this, so that would focus time/energy on those areas.

It was really easy to sign up for Futurelearn, and everything comes in via email, so it was simple to search and find the course I wanted (and you’ll see from the screenshot how easy it is to leave it too):

sign-up

Week by Week

Each week the available material would appear. Clearly, it is technologically possible for all the material to appear at the same time, but there’s a need to encourage people to work on the material together, with a start date, etc. encouraging use of the well-structured (and well-used) ‘commenting’ space, which Tony himself contributes to frequently (and is clearly gaining insights into his own research), and with a number of mentors who have been assigned and are highly active (I’ve had replies to several, but without having asked for permission, thought I should just share my own comment!):

comments

 

So, the material appeared each week, looking like this (I’m assuming if I’d completed all the practical activities, those lines underneath would have got longer!), with the most basic, introductory material (usually in the form of videos from Tony) at the top – which was the stuff I was really interested in.

 

weekly

 

Assessment

Being able to see how much more to go is always a good incentive – below the fold there was much extra material – more videos, readings, practical software help, etc. but I usually finished at the point of the quiz (which isn’t assessed, but helps you “know” that you have “learnt” some material that week (and where one might want to go back and re-assess):

quiz

 

 

Journalling

Users are encouraged to keep a journal throughout the project, which I did through notes kept in ‘Word’ and then transferred across to this blog, and shared using the hasthag #CorpusMOOC.

What have I gained?

Well, I may have more to say about this in the longer-term, but for now

  1. I’ve started a MOOC
  2. I’ve finished a MOOC
  3. I’ve done the bits of the MOOC I wanted (if you know me, you know I’m a bit of an completer/over-achiever, and initially thought I can’t just do the bits I want!) and no more.
  4. I think I’ve got a good sense of what Corpus Linguistics is capable of, and could see that I could use it in my research, although I would have to spend more time learning the practicalities/partner with A.N.Other.

I thought the material was well-presented, manageable (once I stuck to the first bits), the intentional interaction was good as well as the usability of the software, and I can see how more can be done in this subject area.

40percent

Thank you Tony and team!

 

#CORPUSMOOC: Week 8: A Swearing Extravaganza

This week looking at ‘swearing’ as it is used within language .. so there’s a disclaimer, some of the comments:

The use of ‘bad language’ seems to me to be very cultural specific. For example, young people seems to use it more often than old people. And I see variation of what’s considered as ‘bad language’ between registers and dialects. For example, the same person would never use bad language at work but he probably uses it when he is with friends; and what’s considered bad in some areas would not have this consideration in others.

Of course, you have to define what is meant by ‘bad language'; obscenity is very culturally specific (Northern Europeans: body parts, coition and excrement, Southern Europeans religion, mothers, aspersions re sexuality – the Victorians found the phrase ‘what a cunning hat’ rather racy). The point is well put, though.

Oh dear, the warm up activity is to listen out for the use of bad language in conversation around us … probably more than you’d expect even in my own context! Interesting conversations online about whether language teachers should teach this, as students will come across it (don’t we all remember how funny it was once we learnt ‘merde’ in French classes!)

amazing what you can get used to after a while and how much these words lose their strength through over use.

Part 1: Looking at Bad Language

Why say ‘bad language’ and not ‘swearing’? Definitions of what is ‘swearing’ = complex!

swearing

Words developed for the Lancaster Corpus of abusive words – including animal, intelligence, sexuality focused insults. Then had to develop an annotation system for the material – including class, gender, age, etc. Can provide some quite useful distinctions that can be researched. Metalinguistic word – am not using the word, but I’m talking about it/describing it, or quoting someone else saying something.

Who knew there were so many different ways to use ‘fuck’ – fascinating…

fuck

Final category = a ‘dustbin category’ for those that didn’t fit any of these categories, and didn’t really need further work.

Commentator suggests that video http://youtu.be/BsRUQCN2lak helps gives further insights into the use of swearing in language – jocular, and ‘fillers’ have been mentioned by other commentators.

Another kind of ‘MOOC’ – http://www.urbandictionary.com/define.php?term=mook&defid=6165831 – such dictionaries allows us to see language develop.

Part 2: Swearing and Gender

We can use such corpora to see how such language is actually used – but we’ll likely approach such questions with a number of assumptions – e.g. that men swear more than women. In early 1990s, there was no statistical difference in usage, but in looking at the individual words themselves, these are different… words used by men tended to be stronger. 

gender

There are levels of ‘strength’ seen, but there are possibilities that these might be used differently … e.g. ‘religious people’ more offended by God/Jesus than the general population [Note to second year housemates, yes…]

Commentators mentioning encouraging people to rethink phrases that have become everyday

  • Someone being ‘a bit gay’
  • Someone having ‘a blonde moment’
  • Someone ‘running like a girl’

Is there ‘surgical cleaning’ where such words become sanitised? Corpus tools, of course, are good at identifying the change in language of words e.g. ‘gay’.

Different people will probably see some of the words as more offensive than others… e.g. people say ‘God’ without thinking – probably more offensive to ‘religious people’ than many realize.

Part 3: Swearing and Interaction

How do the genders interact when it comes to the use of ‘bad language’ words? Is there a difference between or across? Intra-gender use of swearing is the norm (e.g. men direct swearing at other men more than at females and vice-versa), but men do this much more than women (have they been cultured to swear less in front of women?

What kind of words are targeted? E.g. ‘cow’ exclusively at women…

gendered-terms

Wow… so much complex!

Part 4: Strength of Swearing

Different categories of words (e.g. general annoyance) = much milder words, but ‘destinational category’ (reached end of tether = “go away”) = much stronger!

Discussions mentioning new British National Corpus coming this year, where it will be interested to see how words are used/re-used and reclaimed – e.g. African-Americans claiming ‘n****r’, gay people claiming ‘queer’ and women claiming ‘bitch’ as positive interaction words. Also lots of discussion as to regional/cultural differences and how the right corpus might help explore those.

Part 5: Swearing and Age

Assumption is that younger people tend to swear more, and data seems to bear that out:

age-bands

Is it down to age? It’s not necessarily their age that is the issue. The cultural environment may have meant that swearing was less accepted, so don’t swear less as get older! Are they possibly using ‘swear words’ that are so mild that they’ve not been measured as swear words (e.g. golly, blimey), although this doesn’t exist, either. What about the strength of swear words/categories? Mirrors the distribution from the graph above. Frequency/strength distribution are similar.

Commentator notes: ‘When angry, count to four; when very angry, swear.’ (Mark Twain). Also questioning whether the extra drop-off is down to being in the presence of children/grandchildren, when people seek to reign themselves in.

Part 6: Swearing and Class

How do we draw out the nuances here? Do lower classes select stronger words, and higher class = weaker ones?

AB: 1.81, C1 1.76, C2 2.16, DE 2.47 (General pattern, but AB = stronger than C1)…

What about the type of bad language use? AB/C1 and C2/DE = inverted.

Lots of discussion about whether upper classes = rules don’t apply, and middle classes more cautious…

Part 7: Combining Factors

What happens when try and combine the data – e.g. male AB aged 25-34 = use most? BNC was balanced to get roughly similar amounts of data on single data. May be no examples combining particular factors… that particular group = 2,259 words uttered in the spoken BNC.

How many types of speakers are in the BNC? Not many, but we can combine particular types of data to give insights.

Part 8: Combining Factors – 2 Case Studies

Age/Gender combination:

combinations

Class/Gender combination:

combination-class

Class/Gender/Age combination:

combination3

Do you want to argue – are women pre-disposed to use less swearwords? Surely socially constructed, it’s an artifact of the society within which these 2 genders are operating, nothing to do with genetics. Debate? Where did the distinctions come from? What were the social processes that constructed this?

Commentator: People are willing to say things in other languages they’re not prepared to say in their own – http://io9.com/why-its-easier-to-swear-at-people-in-another-language-1536262864.

Final Words from Tony

The start of a journey into language .. with an overview of the kind of things you should have learned, and in a position to build your own corpora [though I didn’t use the practical elements!]… and don’t think that this course has given you everything…

We often want to study language in their social contexts, rather than in isolation. Contemporary social issues or historical issues typically the most interesting.

Overwork = Less Creativity? Milking the academic cow dry?

mflfn0I (1)A fascinating piece on the culture of overwork within academia, finishes:

We need coalitions of the sane to lead discussions about what can reasonably be expected of academics, to recruit and promote accordingly and to mentor younger academics into a way of thinking that says: “Enough is enough. If you want to do extra, we won’t reward you for it.”

You might assume that institutions run by coalitions of the sane would automatically fall behind those run by further achievers. But think again. Universities vitally depend on academics’ ability to productively use their intellect, curiosity and creativity. In business-speak, ensuring a sane working environment therefore safeguards their supply of academic human resources.

A dairy farmer might streamline his delivery routes or negotiate discounts on milk bottles. He won’t run the health of his cows into the ground by demanding that they produce ever greater yields. But that, in essence, is what universities are currently doing to their academics. Fingers crossed that voices like Schell’s will wake them up to how counterproductive that is – preferably before the cows come home.

Read full story.

WTC Mooc: Week 3: Living the Christian Story – Exodus (@WTCTheology)

exodus

Core Reading

Exodus 1:1-15 (or chapters 1-15, but there’s only 10!) through to the story of the Plagues … but overall a story of oppression leading to recreation?

Dempster

Israel is becoming ‘many’ as ordained at creation… but there are several obstacles to be overcome first. Seen alone, misses the ‘new humanity’ destined to restore creation blessing to the world. With Moses, see how it harks back to e.g. Noah, and forward to Israelite experiences. Moses says ‘I am’, which has led to a lot of discussions as to whether he was divine – or God’s representative. Either way it’s important to understand that God was present.

Moving from a time of chromos to kairos (exile is almost over):

Such is the case with chronos and kairos. Both are Greek words which mean time, but they imply different things.

Chronos refers to minutes and seconds. It refers to time as a measurable resource.

Kairos is the word used for time in Ephesians 5:16 (which I examined in more detail here). Kairos means an appointed time, an opportune moment, or a due season.

http://www.lifeofasteward.com/chronos-kairos/

Constant battle between Israel/Egypt, pre-ordination of what will happen when Jesus comes, and defining the Israelite nation – the importance of the ‘firstborn’ sons, the gradual admittance of the ‘superior power of Yahweh’. The parting of the waves of the ‘Reed Sea’ answers the question of who is in control. The goal of the journey to re-establish the Abrahamic covenant, having passed through the water, now to pass through the human gauntlet, to re-establish ‘Edenic sanctuary’, where God can dwell again with his people. The importance of Mount Sinai to the Torah… a new covenant with God – where obedience leads to blessing/fullness of life, disobedience to curse and death – set apart as a ‘holy nation’, imaging God to the nations. Moses as the mediator between God/the people as God is too powerful, a covenant marked by blood/a sacrifice. An important part of this covenant is to work and rest. The development of the Tabernacle = similarities with Eden, but even before the covenant is made, the Israelites have broken it with false idols. The freedom emphasised by the divine is ‘grace and mercy’. In the 10 commandments, the judgements are given first, but the ‘rear view’ of God emphasises mercy and forgiveness. The golden calf doesn’t represent God, but human beings in converse with him.

Ekblad

Pharaoh felt threatened by immigrants (such as Jacob’s family) – so became ruthless in the tasks they imposed upon them – but despite this – the numbers increased. Considers how this is in parallel with the Latin American experiences (of men currently in jail) – especially when they are given ‘education’ but not opportunities to take jobs, so go back to what they know – a life of crime – in order to live. The ‘baby boys’ were killed before they could become a problem. The Egyptians needed the immigrants because they did the work that the Egyptians didn’t want to do, but the general thinking was for deportation or down-trodden-ness. The midwives were expected to kill the boys, but refused, so were blessed by God. By worldly standards, Pharaoh was at the top, but God was on the side of the oppressed/weak (not the oppressors) and they survived.

Often when people read the Bible, they read it through oppressive interpretations, which can be subverted by careful reading of the text itself. Guided readings can question assumptions and invite unexpected identifications. When Moses impulsively kills a harsh taskmaster, he has to flee – originally a ‘saviour’ he is now absent from the scene, and failing to intervene – how many see God also… but a deeper reading can see a bigger picture coming to fruition.

When working with Latino prisoners, the author – a Caucasian pastor – representative of the prison system, and of God … many in the prison see God as hyper-sovereign – distant judge who has pre-ordained everything, so lives cannot be re-mapped… all negative aspects of their lives are ‘God’s will’. Their theology assumes that God is just/good and therefore that they must be bad/deserving of all the calamities that have befallen them. No redemption is expected. .. and people attend the sessions for reasons such as social interaction, especially once they find that ‘accepting Christ’ does not instantly solve all their problems. They may think that attending will give a lighter sentence. Belief is, however – people are hungry for an authentic encounter whatever the original reason.

The facilitator has a careful role, which subverts those barriers, replacing the old, paralysing theology. .. with a need to distance himself from ‘taskmasters’, and make the prisoners realise that he’s on their side. God’s will happens through covert disobedience, non-compliance, etc… God listened to the groans of slaves, but they remained slaves … author discusses his experience of working with those seeking to find liberation from e.g. heroin addiction – not in terms of the ‘heroic victor’, but weakness/ignorance on how to heal. In Mexico – often become violent – taking frustration out on someone – as Moses did – direct experience of poverty/oppression = violent. No long-term respect however… required for that = respect and humility. God shows up where Moses is (wherever that wilderness is). Moses when called to go back, said that he wasn’t worthy (he was human). “There’s another really important guy in Israel’s history who didn’t feel cut out for this. Look, God used him. God can use me too.” [Imposter syndrome?]

Session 3:1 Exodus: Setting up the Conflict (Matt Lynch)

Theology and Social Action…. Exodus – reflected in the way they rebuilt their society – institutionally built into their laws, etc. Moving from being a family, to becoming a nation. There was fear-based oppression … being fruitful and multiplying = in the wrong place/time = threatening to Egyptians. Pharaoh does not know God, and does not let the people rest. God cares and hears the cry of the afflicted. God hears it and responds.

Session 3:2: Exodus – The Plagues

God has power of creation, has concern for poor and the afflicted. Conflict between Pharaoh/God. God displays his power over Pharaoh and over creation with the boils. God “you can’t act this way towards my people and get off the hook.” Why does God harden Pharaoh’s heart? The plagues are not to get the Israelites out of Egypt, but to show his power to Pharaoh… who has fraudulent power… shows future generations his power?

Session 3:3 Exodus – Concern for the Vulnerable (Bob Ekblad)

What actions lead to liberation (Exodus 1)? Do we see the presence of God in this text? Confusing – God blesses people who lie in order to protect the innocent? [Makes me think of Corrie Ten Boom – ‘they are under the table’]. What does God do? He’s on the side of the underdog? How does God liberate? Who are the main characters from most powerful to weakest? Who is God with?

exodus-power

  • Shiprah and Puah (midwives) à non-compliance and deception
  • Moses’ Mother à Hides the baby not obey the law
  • Moses’ Sister à Spies for Moses (proactive)
  • Pharaoh’s Daughter à Seeing/having compassion

Session 3.4: Exodus: Liberation Today

Understanding contemporary law enforcement. A challenge to those who believe that deception is a sin – most characters are undertaking this in this story. What does the use of these weak characters tell us about God? How does God save now? It tends to indicate non-compliance. Pharaoh’s daughter uses her privilege to rescue. What would that look like today? Funding people to look after other people? Should we be giving work to denied asylum seekers? Educating them? What are the limits of compliance for us?

Session 3.5: Exodus: Seeing the Vulnerable

The importance of “seeing” human beings (rather than ‘the mass’). [Seeing individuals/telling people’s stories?]. Moses (and most on the margins) assume that God is on the side of law enforcement and the status quo, but Moses’ violence response doesn’t disqualify him from God’s mission to be an agent of liberation. God is looking for similar characters in contemporary culture.

#CorpusMooc: Week 7 Notes

What languages did you learn and how?

Only test I’ve ever got 100% on is a language aptitude test – apparently I’m good at identifying patterns and working it out from there … which probably have noticed “in real language interactions”

French to GCSE level, text books, but to get through the exams = extra spoken lessons, where saying the correct thing was abandoned for getting ‘the right word’

German for a couple of years – got confused between that and French, very particular words and grammar focused

Latin for 3 years – grammar grammar grammar and vocab

Italian – tried an online course – didn’t need to put it into practice

Brazilian Portuguese – Linkwords (linking words to really silly sentences), gave me something to start with for 5 months in Brazil, then have to use language to progress. Now using an iPad app to get back into things – where everything is gamified – largely vocab focused.

Learner Corpora

Contains data by those learning a particular language… Native corpora don’t refer to the problems that learners tend to encounter (as natives don’t tend to make the mistakes that learners do). Identifying errors in essays, etc. allows development of new leaner corpora. There can be bigger complications than frequency, and what is the background of the original language, so what translates/makes sense, etc.? What about under/overuse of words (especially compared to native speakers).

Interesting differences between the keywords that were used in discussions re the use of mobile phones – where are the different cultural emphases?

language-keywords

 

Interesting – Americans tend to use personal pronouns (I/individual experiences), whilst Polish tended to use (we/group) – speculates whether Polish is more ‘academic’ writing style [or is it the cultural expectations – definitely assume that Americans talk individually] … or Polish have less mobile phones so probably use in the group, and they rely on abstract nouns anyway = more generalisations. Rhetorical style – can be practical reasons, can be teaching style/vocabulary, societal differences.

More common to come across written than spoken corpa data … more difficult to capture, and also captures a larger range of words than there are, as computer doesn’t recognise spelling mistakes, etc. If analysis is just at a lexical level misses the range of uses. Too much research is not shared.

#CORPUSMOOC Week 6 (Notes)

regular_6d7d092b-68a0-40e1-8787-caaeaa0ea753Before you watch the lecture, create two short dictionary definitions: One is for the word ‘threadbare’ the other is for the word ‘luckily’. Do not consult a dictionary or other reference resource – just use your own intuitions. If you do not think you know either word, just make a list of words that you think may be associated with each. Then watch the lecture. 

Threadbare: A condition in which clothes are worn through, nearly to rags.

Luckily: Where a situation could have gone wrong, but the outcome was positive.

VIDEO MATERIAL: History & Development of Corpus Linguistics.

Use large corpora to identify the words that are most frequently used. The most efficient form of language learning ties to the words that people use most frequently. Studies are corpus based in their philosophy.

Early – most was written rather than spoken data, and much was not on contemporary texts (e.g. 19th Century novels and the Bible). By end of 1950s from teaching words to teaching rules (grammar). Verbs = 60% of what we use, but are hard to teach, + irregular verbs. Look for the popular/typify speech words.

Listening to these videos as a piece of history, as the studies have developed over time, identifying various elements of text, speech, and how focus on the words that people actually use etc – a very small number, with a large number of common lexical bundles (less common in academic writing). Developments of dictionary – large numbers of words, especially rare words is not helpful – that’s is required for [e.g. Countdown]. Writing definitions – need examples of how the word is used in context… I like what https://www.wordnik.com/ is able to do with this in making the dictionary digital.

I’m not a linguist (but wanting to interrogate tweets), so I’m multi-tasking on this material and taking fewer notes!

 

 

 

#CORPUSMOOC Week 5 (Notes)

When taking a statement from a witness or suspect, what kinds of factors about them, the crime, or the larger social context should we take into account? One example to get you started: the interviewee’s age – children and the very elderly should be treated especially carefully.

Suspect many would say ideally classless, but their suspected role in the crime, the level of evidence, age, race, gender, religion, class, education level, the recency of the crime?

Forensic Logistics (Claire Hardaker)

Narrow View – forensics = court room views, etc.

Broad View = anything from criminal/civil trial or part of the investigative procedure (they may not have been expected to be forensic data, but they become some).

What is the meaning of this text? (what is the purpose of it?)

Who authored a given text? (actually written by x)

Language of legal texts/processes (e.g. was consent truly informed). Huge area, restricted only by the questions you ask…

Physical Evidence

Analyse ink/paper, etc. to see if appropriate to era, etc.

Historical Evidence

Knowing what the author of a particular document knew, usually most people a specialise in only one author as depth of knowledge

Cipher-based decryption

Author has deliberately encoded their name into texts (particular to Bible studies/Shakespeare studies). Not particularly serious method of analysis

Manual/Qualitative analysis

Conversation/discourse analysis, syntax, stylistic choices, etc. Look in depth at the language being used. Drawback = cherry picking – in a court can support offense/defence.

Automated/quantitative analysis

Computational linguistics, computational stylomotry, and today’s focus… Multi-variate approaches…

Combining forensic linguistics and corpus linguistics

What are the benefits or drawbacks? Combining approaches – don’t just celebrate the strengths, but also understand the pitfalls (especially if it’s evidence for a court case).

Looking at ‘disputed authorship’…

Corpus data = large datasets, that has often been cleaned for consistency of spelling, etc.

Forensic data = often small, e.g. a text, so difficult to analyse. Often quite messy.

May allow to set e.g. a text against a larger dialect set.

Looking at ‘style’, are looking for things that are ‘unconscious’ and therefore unchanged from general style – e.g. a forged suicide note. Can be hard to identify unconscious material.

Corpus – easy to search large datasets, whereas forensic information is difficult to encode – e.g. a thread, sarcasm, etc.

Adopting a corpus means that have made assumptions – e.g. that you are going to have something to count/that count will be meaningful. On e.g. Twitter how account for variations of e.g. ‘and’ = +, &, n, etc… Looking at texts, if always seems to write xx at end of texts, but doesn’t on this text –therefore not theirs = needs more context.

If no restriction on author, not going to be able to identify this. Corpus forensics works better at narrowing between a & b, rather than across the sector. Words have range of meanings, can end up with redundant data.

fine

These 2 can still work together

  • Shared goal = objective, quantitative facts of yes/no
  • Is it common? What does it typically mean? Is it significant?
  • Corpus = how likely is it that this occurred by pure chance alone.

The Case of Derek Bentley: the crime

Diagnosed with mental age of 10, reading age of 8, 66 on IQ test (unusual). Who armed him (with knife/knuckle dusters), what he said, ‘diminished responsibility’ was not recognized.

http://en.wikipedia.org/wiki/Derek_Bentley_case

The case of Derek Bentley: the evidence

Saying ‘the gun’ = was assumed (shared knowledge) that there was a gun that Bentley knew about. Police had to write down longhand, but couldn’t ask substantive questions (ask for repeat, but not, what time was that?). Bentley ‘witness statement’ was presented to court = a faithful witness of what he’d said. Throughout the trial Bentley said he didn’t write it himself, but 3 police officers said he did. The statement clearly demonstrates that a conversation has been turned into a witness statement = crucial to his conviction.

The Case of Derek Bentley: the analysis and conclusions

Note the use of ‘then’ (temporal = sequences of events). Typically monologic statements don’t display this, so suggests that there was intervention.

 

 

unusual

Can’t use this alone, but is another indicator, also see the pre/post-positioning and which is un/usual constructions (I then/then I). 1,000 times more often in Bentley’s statement than in entire Cobeld corpus.

Along with other features, if it becomes clear that Bentley hadn’t written the statement, and he was convicted largely on ‘the gun’, then how reliable is that evidence. Not fully pardoned til 1998. Can’t give Bentley back his life, but can challenge a miscarriage of justice.

PART 6: Other cases and datasets

http://www.mirror.co.uk/news/world-news/california-shooting-roommates-virgin-killer-3610816

Look at language he used – doesn’t help prevent a crime, but does help understand triggers, etc. and may provide notifications for other crimes.

wordcloud

Offers a set of forensic data, including Old Bailey, Unabomba, OJ Simpson, Harold Shipman, David Irving vs Penguin, Enron, Anders Breivik, Paul Ceflia vs Mark Zuckerberg, Conrad Murray, etc.

Be aware of version control, ethical nature of the material, whether edits have been made, etc. Ensure rigorous nature of the work that you do, as other’s prison sentences could depend upon it.

#CORPUSMOOC – Corpus Linguistics Week 3

Find an article in which the word ‘refugee’ is mentioned – make notes about how refugees, migrants, asylum seekers, etc are talked about. Chose: http://www.ibtimes.co.uk/lebanon-stems-influx-refugees-minister-claims-countrys-internal-security-risk-1470743

  • Referred to in terms of numbers (large numbers)
  • Range of words indicating a ‘problem’ to be solved, stemmed, halted, stop them infiltrating, as a danger, etc.
  • Refugees = a destabilising influence
  • Humanitarian refugees (criteria unknown) only allowed.
  • ¼ people in Lebanon = refugees, highest number in the world = straining infrastructure and driving down wages.
  • Need ££ to deal with “influx”.

Oh, maybe it was supposed to be a British newspaper – ah well, pretty familiar!

Video 1: Refugees and Asylum Seekers in the UK Press

Methodologically – need large amounts of data, frequency data, hunt for co-occurrences, annotation/grouping, quantification and statistical significance.

Merits – helps us get ‘the big picture’, identify the ‘aboutness’/areas of interest that can be interrogated – can work qualitatively/quantitatively and check on ‘gut instinct’

Core terms – keywords, cluster, collocation, semantic prosody, discourse prosody.

Video 2: Building the Corpus and Initial Analysis

In UK universities is access to many newspapers, but need to define the keywords [x OR x OR x AND NOT x]

How derive a query? Collected a quick corpus of texts from a pilot study, then compared to ‘general English’ to define the ‘aboutness’, then used keywords/intuitions/concordancing to include/exclude from collection. Data was split into ‘tabloids’ and ‘broadsheets’ (interesting distinction). More data in the broadsheets, but articles in broadsheets = longer (so they are not ‘more obsessed’ about them.

Finding ‘topoi’ = finding key ‘theme’ in the data. How do ‘collocates’ (associated words) help construct that theme?

Statistical significance important. Red = tabloids; blue = broadsheets.

entry

TOPOI:

  • Generally about entry (mode, place, legality) – discourse largely established by the TABLOIDS
  • Number, Abuse, Numbers, Finance (cost/abuse), threat – also tabloids (except large numbers)
  • Residence, legality, issues with system, unwelcome (authentic and legitimacy only mentioned by broadsheets).
  • PLIGHT – much larger in the broadsheets (so more sympathetic?)

VIDEO 3: Tabloids, Broadsheets and Key Clusters

High probability for collocates. Red = tabloids, blue = tabloids; black = equal.

plight

Related to numbers/quantity – different ways of doing it, but both speak in quantity metaphors, and also in the idea of ‘plight’ (based on number of collocates).quantity

To look with the word ‘illegal’ – manually checked it, then right-sorted to see what followed the word illegal. Identifying origin, ethnicity, religion, age, type of work, etc.

right-align

Number of clusters – some are more ‘emblematic’ of tabloids…

illegal

Equivalence is being ‘forced’ – terrorism, crime, fraud, etc. all being brought together in the discourse, rather than representing ‘reality’.

How many occurrences per million ‘normalised’ amongst words? Expect to see more in the tabloids than the broadsheets.

illegal-residence

VIDEO 4: IN FOCUS. The expression ‘pose as’.

Who uses the term ‘Pose As’ in relation to RASIM? Tabloids use it 8 x times more than broadsheets…

Beggars, crooks, etc. are identified as ‘posing as RASIM’ = taken ‘as fact’, and therefore positive stance towards ‘tougher measures’ – this is particularly in the tabloids. It’s there in the broadsheets too, but the opposite view is presented (if with less words).

Identifying problems in the asylum system by police/reporters ‘posing as’ RASIM.

The tabloids focus particularly upon asylum seekers ‘posing’ as nurses, etc…

Criminals may pose as RASIM to harm RASIM – also in tabloids, but very low numbers…

VIDEO 5: Summing Up

Focus upon words ‘suffocated’ and ‘drowned’ – focus upon whether they were represented as ‘illegal’ – directly (illegal immigrants) or indirectly (sneaking)?

suffocated-drowned

Dictionary may have a range of different meanings, but the press gives a range of terms that ‘mean similar’ … used in a particular way continuously.

Remember that there are distinctions within newspapers, rather than labelling ‘tabloids’. Question how helpful your distinctions are.

Move between largescale analysis, and closer/more-detailed readings of the text.

#CORPUSMOOC : Week 2 Notes from @drbexl

The second week of the MOOC ‘Corpus Linguistics‘ via Lancaster University:

Exercise:

I want you to think of two words – ‘diamond’ and ‘cause’.

Without consulting anybody else, or looking at any reference resources, write two short definitions for these words. Take no more than two minutes to complete this task.

A diamond is a compressed mineral whose rarity ensures that it has high value. It has gained meaning in recent centuries as a valuable gift, especially to signify love, and is commonly used in engagement rings. As an anniversary it signifies a long marriage.

The word cause may refer to ‘a cause’ that one supports, including charitable causes, or ‘to cause’, as in cause something to happen.

Recap and Introduction to Collocation

  • How can we manipulate and exploit that frequency data in order to gain insights?
  • Collocation is one way to do this – systematic co-occurrence of words in use, and may influence each other’s meanings, e.g. back/front, telephone/operator = the result of hunches.
  • Hunches can be right, but not always, as things may be more/less important than we think they are.
  • See, e.g. diamond, and we’ll be reminded of a range of meanings.

Screen Shot 2014-10-10 at 19.44.08

  • How close do these have to be to collocate? +/- 5 words seems to work, with a minimum of 10 occurrences, and be aware of sentence boundaries.
  • Know these words ‘by the words that they keep’, and can be before/after.
  • Frequency can’t be the only measurement – seek mutual information value, and identify if words rarely occur with other words.

Collocation, colligation and related features

  • What about grammatical words? Words do have strong affinities for certain prepositions or occasional articles. Colligation – affinity with a grammatical class (rather than meaning).
  • “For now, word form refers to any word that you may find in a corpus. So ‘fighting’ and ‘fought’ are both word forms. On the other hand, a lemma is what we might call the base form of a word – so the lemma ‘fight’ gives rise to multiple word forms, including ‘fighting’ and ‘fought’.”
  • Semantic preferences – e.g. diamond (often part of a class of gems), but ‘a glass of’ includes drinkable liquids.
  • Discourse Prosody – expresses speaker attitude = important for ‘discourse analysis’.
    • ‘Cause’ often associated with trouble, pain, suffering – subconsciously the word has negative discourse prosody.
    • The way that words in a corpus can collocate with a related set of words or phrases, often revealing (hidden) attitudes.”

Keywords

  • Are there words that appear more frequently in Corpus A than they are in Corpus B? Can use statistical significance tests.
  • What words are ‘unusually frequent’ in this particular dataset? [I’m thinking here if we did research into words used by those of different religions on Twitter – what words would appear ‘unusually frequently’ in each religion?]
  • Analysts often cut off the top 50-100 keywords to create manageable data, and there must be 20+ keywords, and those distributed across the range of texts (and not bunched in one text/paragraph)
  • Typical keywords: Proper nouns (names), Style/genre markers (grammatical words), spelling idiosyncrasies (British/American English) – for discourse analysis = “the aboutness” of the text – the gist of the text.
    • Once identifying salient words – identifies interesting factors and explain ‘meaning’ and why those words are there.
    • Discover words (especially once run through computing power) that our conscious cognitive abilities would not identify as salient.
    • Can the experiment be replicated – follow the same process, and it should come out the same ‘objectively’.

Change over time and lock words

  • Which words have become steadily less/more frequent – or stayed the same (locked in place) – and what this tells us about cultural values.
  • The Brown Corpus – what were the key shifts happening in language 1931-2006 (4 sample points). E.g. Mrs down, health up and money largely ‘locked’.
  • What have declined?
    • A more informal society as less use of Mr, Miss, etc.
    • A modal verb – less comfortable with ‘imposing’ on people, so this is declining also.
    • Longer forms are contracting – as people seek to squeeze as much as possible into a short a space as possible [e.g. Twitter!]
  • What are lock-words?
    • Weaker modality
    • Wh – question words
    • Body parts
    • Other nouns, including money (we’re still obsessed)
  • Increased use
    • Contracted forms, such as it’s
    • Numbers as 34, rather than thirty-four
    • Social terms
  • Why has the word ‘children’ increased over time?
    • 1990s – fear of danger to children, promoting/supporting children and families… children are being problematized… [That fits with Raising Children in a Digital Age]
    • 2006 corpus – lots of moral panics…

Screen Shot 2014-10-10 at 20.27.09

  • Dominant discourse arising in Britain relating to children.

Multi-Methods

  • Corpora give us insights into the mechanics of language, and of the society within which that language is being used.
  • They can answer some questions really well, but others not so much – be mindful!
  • Corpora should be linked with other methods for study of language, society, history, etc… which expand the range of studies/findings?
  • Mesh qualitative/quantitative data…
  • Toolbox – use the right tool, in the right combination…

Should McCann Twitter abuser have been doorstepped on TV? for @ConversationUK

mccann

A recent piece, published for The Conversation UK, under Creative Commons licence (republished on Durham University):

Should McCann Twitter abuser have been doorstepped on TV?

By Bex Lewis, Durham University

Brenda Leyland, a 63-year old woman from Leicestershire who had been accused of publishing a stream of internet abuse about the family of missing child Madeleine McCann, has been found dead in a hotel room.

Her death raises important questions about the wrongs and rights of how we handle people who express unpalatable views online.

Leyland had been exposed in a Sky News report as the person behind the Twitter account @sweepyface, which had been used to post offensive messages about the McCanns. These included the accusation that Madeleine’s parents were responsible for her disappearance. When confronted by a Sky News reporter about whether she should have posted such messages, Leyland said: “I’m entitled to do that.”

Days before Leyland’s death, BBC Radio 4 ran a story about how the police were investigating abusive social media messages sent to, or published about, the McCanns. Madeleine’s father Gerry McCann featured, suggesting that these messages are fuelled by press reporting. He added that he thinks more people should be charged for internet abuse and revealed that his family tends to avoid the internet because of the nature of threats and insults they receive.

For obvious reasons, the McCanns had encouraged a high-profile press campaign after Madeleine’s disappearance. But without answers about what happened to Madeleine, conspiracy theories have abounded. Brenda Leyland was one of many to discuss the McCann case online. As Rev Pam Smith, one of my Facebook connections said, are we really saying that people are not “entitled” to share adverse views online?

Leyland said she “hoped she hadn’t broken any laws”, but the Malicious Communications Act 1988, which covers Twitter, notes that it is an offence to send messages to another person which are “indecent or grossly offensive”, threatening or false. If the message is intended to cause distress or anxiety to the recipient, they breach the law.

We have to consider whether Sky has a case to answer in this particular situation too though. The broadcaster’s correspondent approached Mrs Leyland on her own doorstep in a live broadcast. She evidently had no idea that she was going to be confronted or that the footage would be broadcast to the world.

Whether or not we like what Leyland had been doing, she was clearly just one of several people who had been expressing their opinions online. She was certainly not the worst. Is doorstepping people, outing them on TV, and ensuring that their face circulates the internet, really the answer? Had Sky done any research into this woman before they put her face in the public domain? Did they know anything about her mental state? Did she just have the misfortune to be the first person who could be made an example of?

Her case carried echoes of the recent media treatment of Cliff Richard. The BBC was heavily rapped for broadcasting live from his home as police raided it. The police of course need to investigate such stories but it is a worrying sign of our culture that trial by media and even trial by gossip appear to have become acceptable.

Media ethics are typically concerned with truthfulness, accuracy, objectivity, impartiality, fairness, public accountability and limitation of harm. After the Leveson inquiry, there has been increased emphasis on press responsibility. But in a time of rapid media change and fast-moving news, broadcasters must ensure they too meet their ethical responsibilities.The Conversation

Bex Lewis does not work for, consult to, own shares in or receive funding from any company or organisation that would benefit from this article, and has no relevant affiliations.

This article was originally published on The Conversation.
Read the original article.

#CORPUSMOOC : Week 1 Notes from @drbexl

Here’s my notes from week 1 of Lancaster University’s MOOC ‘Corpus Linguistics’ (Haven’t got time to do the practical exercise, but this is twigging some thinking re my PhD thesis database!):

Screen Shot 2014-10-01 at 18.03.52 (3)

What is a corpus?

  • A collection of words?
  • It’s a methodology but not a theory of language.

Why might I use corpus linguistics?

  • Look at language ‘as it is’
  • Large amounts of data which are difficult to ID with intuition/anecdotes
  • Large amounts of data show us things we’re doing that we don’t even realise.
  • ID rare/exceptional cases not identifiable in others ways
  • Human beings are slower/less accurate than computers for purposes of this kind of research.

What is your research question/hypothesis?

  • Is the corpus ‘off the shelf’ useful to your question?
  • If you’re developing a corpus – how will you need to define it?
  • 30,000 à billions of words.
  • Needs to be representative of the corpus – e.g. http://www.natcorp.ox.ac.uk
  • Must be machine-readable (not just a photo of the text) – so that the computer can identify the words
  • It may act as a standard reference for what is typical in language.
  • May be annotated with extra linguistic codes (e.g. grammar)

What is annotation and markup?

Computers do not have the cultural knowledge that we have, so we have to mark-up the text so it can read the nuances, etc.

  • Delimit particular sections as e.g. a ‘heading’, a ‘sentence’, etc. allowing computer to analyse just those areas, etc.
  • Understand how this is done, as the computer can automatically do this, etc. then allows sophisticated searches through the data.

Types of Corpora

Come in different flavours, so different things can be assessed – e.g. date, time, genre, etc. Specific = outline the areas, but are also general ‘corpora’ – especially language (note difference between spoken/written).

  • Think about the shape of spoken language – especially the differences between e.g. the different people you talk to.
  • Parallel, new language, historic material, on-going corpus…

Frequency data, concordances and collocation

A search, how often does it appear, but also how frequently per million words, and what kind of documents/context does it appear within.

  • Think you see a pattern emerging, can ‘sort’ so can start to see patterns emerging [on the basis of which attain themes to identify].
  • Needs a cycle of extraction of data, and analysis, and close reading of relevant parts of the text.
  • Collocation – co-occurance – from which meaning (and possibly) grammar appears – words are not randomly put together – words ‘shade one another’s meanings’ and ‘co-construct meaning’ – seek patterns in language.

Corpora and Language Teaching

This is less relevant to me, but interesting that need to identify the right words that help people understand which words are used frequently, so which to come first within a textbook – could be helpful within digital literacy training.

What can’t we do with corpus?

  • Just because it doesn’t exist in the corpus doesn’t mean it can’t be used – may be rare.
  • As with any scientific method, we are making deductions, not facts.
  • No visual information (pictures or body language) – traditionally people have set aside the visual and focused on the written language, but tools are being developed. * See database methodology – visual material collated in the 1990s for PhD research.

Speaking Up Required in Higher Education…

mhAJ3VA

A really interesting piece about the place of ‘introverts’ in higher education where action learning means not speaking up is no longer an option:

However, university students are no longer allowed to be shy. “Active learning” has become a modern mantra. Students must ask questions, express opinions, lead oral presentations and participate enthusiastically in community projects. To collaborate is sacrosanct. Passivity, on the other hand, is considered the enemy of learning. They must be vocal, expressive and assertive. The extrovert ideal, as Cain calls it, is all the rage.

Read full story, and see my PGCLTHE work re: group work… and this letter response.

Summer Fruitfulness in Academia?

ook5dkSGreat piece on the joys of academic pressure:

It’s not true, of course, that September is entirely grim. Being back at work means colliding with cheery colleagues in the corridor, each of us ruefully clutching course-packs and exchanging sympathetic smiles; it also means spotting suntanned students with newly stylish “second-year haircuts” sauntering around all the familiar spaces. I’ve missed them all. But there is a certain pang that strikes you like the memory of a lost youth – or, more precisely, the memory of an article you were meant to finish, a book proposal you intended to start, a research proposal you thought you would concoct and never did as July sank into August and then, whoops, suddenly September started impudently banging on your door.

Read full article.

Value in a PhD?

An interesting piece on the ‘value’ and employability of a PhDphdgrad

Who would do a PhD? Who would willingly submit to spending endless hours, over three or four years, in the laboratory or library, racked by self-doubt and money worries, in preparation for a career for which vacancies were never more oversubscribed? …

But do doctoral students really feel prepared for life beyond the ivory tower? And how ready are they to embrace it? Here, we speak to five current and former doctoral students from a range of disciplines and universities about why they did their PhDs, what their experience was like and where they see their futures now.

Read full piece, with a couple of more positive responses 28/811/9.

 

Group Work: Not Pulling Your Weight?

Enthusiastic Study GroupI’m really interested in group dynamics (see mini project undertaken as part of PGCLTHE), so an interesting case study here:

“Academically ambitious students contribute to moving the average up – unenthusiastic students and shirkers do just the opposite,” he added.

Indeed, some hard-working students may even defect to “the dark side” of the slackers if they saw their classmates getting away with minimal effort, he continued. To combat this problem, students were offered the chance to gain a higher mark for their group assignment if they managed to raise the grade scored by the weakest student in individual tests on the same subject.

Read full piece.

Work less, do more, live better (@timeshighered)

botanical-garden-gazebo-1430498-mI am SO EXCITED to be reading this piece in Times Higher Education – over the last year I’ve been seeking to work in a healthier pattern (although ironically this week has been a 6 day-week & I need to do some more over the weekend so that I can take a week off… to write a book proposal … carefully planned this is though!)

Great Intro:

Some years ago, I heard that a colleague characterised me as “someone who didn’t work weekends”. This description was not meant as a compliment. It’s true that I make a concerted effort to keep something approximating normal working hours of 9 to 5, Monday to Friday. But I haven’t always worked like this. As a postgrad, I anxiously counted my hours and consulted with fellow students, worried that I wasn’t spending enough time at my desk. Eventually, I allowed myself one full day off weekly. When I became a lecturer, I stayed in the office until seven or eight in the evening, in part imitating the working patterns of my new colleagues, and continued to work weekends. Yet when I reduced my hours at the desk some years ago, my productivity did not decline. Instead, my mindfulness to follow regular hours means that my productivity is the same as or even greater than it was before, when I worked 50, 60 or whatever hours it was per week.

Further down, there’s a series of historical figures, and their living styles (most were writers)

The common feature in these workday schedules is walking, bipedalism, that form of locomotion that distinguishes us from the other primates. Walking and thinking seem to go together so naturally that perhaps it’s walking that made us thinkers. Aristotle famously taught while walking along the colonnade connecting the temple of Apollo and the shrine of the Muses. That link between philosophy and walking has stuck and was memorably parodied in Monty Python’s sketch about the Philosophers’ Football Match. Rebecca Solnit, author of Wanderlust: A History of Walking (2000), concurs that walking is good for thinking: she concludes “a desk is no place to think on a large scale”.

Exercise and sleep are highlighted as of key importance to being creative (and I’ve certainly been working on my sleep), and then  How to Write a Lot: A Practical Guide to Productive Academic Writing (2007) is quoted: 

His suggestions are simple: write and do your research daily in small blocks of time (schedule it in and don’t cheat on that schedule); keep track of what you do in that time; stay attentive to your writing goals and, ideally, get yourself a group that will help you keep to these goals. You might protest, what good are small blocks of time? But small, regular amounts of work build up to significant productivity. A few pages often make a big difference. If you were learning how to tap dance or play the French horn, you wouldn’t set aside one full day a week for practice or cram it into your Saturday afternoons; instead you’d practise for short periods, daily. Why should research and writing be any different?

As academics, we are used to research, so we should research our own habits (oh yes – and being ‘completely detached’ from the good and the bad of a job – is key) –

It is in our best interest to not only be productive but satisfied with our work, because work is vital to our identity and self-definition. We need work not just to put bread on the table but to feel of use, to serve, to contribute, to make and to connect. But the long-hours culture and the cult of busyness saps meaning away, as we tick through never-ending “to do” lists, becoming chronically tired and working less efficiently with each overtime hour.

There’s mention that even in the factory shorter hours have demonstrated increased productivity .. and I remember this from my research in the Second World War – it was SO essential to get arms out, that an extra day was added to the ‘week-cycle’ … productivity went down! And, even in times of crisis, this is key:

Before, Red Cross workers put in as many hours as necessary until the job was finished. Now the Red Cross recognises that workers need breaks in order to be able to respond effectively to the humanitarian crises they face.

Yes, yes, yes… (read full article – and also the opinion piece Know Your Worth).

How Not To Treat a Guest Speaker (Via @TimesHigherEd)

o9z97qS

This is an enjoyable hop, at which am wincing in recognition at 10 ways that it is possible to badly treat a guest speaker, who has usually spent quite some time (and probably years of study) in preparation of a talk for your delectation… but also some thoughts for academic speakers as to ways around some issues:

I later realised that my title of professor had probably intimidated them. Most bird-keepers are working men: miners, steelworkers, bricklayers and decorators, with little experience of the academy. Once they knew I had kept birds myself and cared about the same things that they cared about, there was no problem. Their close contact with birds gave them extraordinary insight and several of them went on to help me with my research on this topic.

Read full story.

A student’s lecture to professors (via @timeshighered)

amphitheather-667184-mThis looks like it could be worth a read – a student tells his lecturers what they should do with their lectures in order to ‘allow learning':

The question “Why am I here?” often strikes in the 73rd minute of a droning lecture. Don’t misunderstand – I love lectures. But only if the person delivering it knows how to allow learning. And yes, I do mean “allow”, for academics don’t create learning – only the student can do that. Unfortunately, most if not all lecturers are crippled by misunderstandings about their students and ill-founded assumptions about education itself. If we can filter the mud from the Pierian Spring, then they will have far less frustration in their lives and students will stop wishing that they were somewhere else. So one afternoon, after a particularly frustrating day with my professors, I sat down and wrote my lecture to them. I pray that they are taking notes.

Read full story in Times Higher Education.

The value of PhD supervision…

o1v7Rq6

Interesting piece on PhD supervision:

When a PhD supervision session constitutes just another blocked-out hour in a besieged diary, it can be all too easy to forget that it could make an impression that stays with the student for the rest of their research career.

We asked five academics for their recollections of the PhD supervision they received, and the way it had informed their own approach to tutoring. Three had enjoyed excellent supervision that had deeply influenced their own practice. But two had not. One recalls exchanges with their tutor characterised by yawns and silences, while another was treated with a “cutting harshness”, valuable only as an exemplar of how not to conduct yourself.

Read full story, and accompanying editorial.