CODEC BOOK CLUB: How We Think by Katherine Hayles

Reblogging from the CODEC blog, posted earlier today:

Thow-we-thinkhis week the CODEC team focused upon the third chapter of Katherine Hayle’s How We Think: Digital Media and Contemporary Technogenesis (University of Chicago Press, 2012). The third chapter focuses upon ‘How We Read: Close, Hyper, Machine’, and certainly gave us lots to chew on.

Initial comments were that we liked what was written, but found the emphasis on all the negative reports about digital as frustrating. An oft heard argument is that our reading is worse ‘because of digital things’, and some members of the team felt that there were broader cultural factors at work rather than solely technological factors. There was agreement that the forms of technology may be changing the manner of reading, as we referring to the ‘F-Shaped Pattern for Reading Web Content’, noting that the further a user scolls, the more eye attention tends to drop off. On p.66, Hayles noted that “Canny web designers use this information to craft web pages, and reading such web pages further intensifies this mode of reading” – so in a self-reinforcing manner, as this form of reading becomes common, more people write for it, so it becomes more common.

There was a short debate re p55, that people “are doing more screen reading of digital materials than ever before”, referring to other kinds of screen reading such as OHP, reel-to-reel cinema, and microfiche, but these were largely seen as recent forerunners of ‘screen’. Debating the etymology of the word ‘screen’, we wondered when/how it stopped being a word that stopped a user being burnt by fire, and became something projected upon a screen. Was it the safety screen at the theatre? We noted that many words for technology come from the analogue world, and referred to an earlier conversation that day in which CODEC prepares to run the worship for Cranmer Hall later this month, that we a) didn’t want it to be a service of gimmicks (the technology serving the theology and not vice versa) and b) did not want the experience to be too far from the established format for the service (otherwise risking being seen as irrelevant). We wondered if screen was a printing technology term, as it certainly still uses a lot of the structure of print.

On p78, Hayles referred to Wordle as a form of machine reading. We debated whether Wordle’s are machine reading, or whether it is a visual method that provides visual data, offering insights into e.g. frequency rather than tone. The @bigbible’s Tumblr, which takes each chapter of the Bible and represents it visually, has received feedback that it gives new insights into reading the Bible, and others that it is not the Bible, and shouldn’t be interpreted as such. We then questioned how often we use these kind of tools in our everyday theological reflection, rather than as ‘something special’, and if they are ‘tools of our time’, whether we should be using them more to be relevant for our culture. If we read faster, is it necessarily distracted reading. Is this a different way of reading – this is part of what the chapter seeks to address. Are the programmes that exist signposts of the digital age? Why did someone create a programme that allows text to be grouped like Wordle? Was it for fun, or was culture changing and this was something produced as a response? Are they beneficial, and do they help us?

With Pete having recently been to an event on DarkNet, we turned to looking at recent events in which GCHQ and NSA have been identified as downloading emails. They indicate that they are not reading them, but just storing them. The machine ‘reads’ those emails, and algorithms will have been set up in which keywords identified with particular terrorist activities will send up a red flag. A level of human reading is then required to contextualise that email to identify if it is a threat or not. Surveillance and machine reading aren’t an either/or, they are different and complementary. Machine reading gets through vast amounts of data, close reading gives deeper insights into human intricacies.

Josh referred to the classic ‘How to a read a book’ as we questioned whether hyper-reading is something particularly digital, or whether it is in fact a very familiar form of reading to academics, particularly those who need to get through a large volume of data. Bex noted that her PhD, focused on 20th century history, required hyper-reading of vast quantities of data, whereas earlier periods of history have to work with sparse sets of data. She noted that when she started her PhD, the Public Record Office (now the National Archive) used paper-based indexing systems, whereas part-way through it converted to digital indexing. We concluded this hadn’t specifically changed the way she read, but allowed easier access of (even more) material.

As we return to notions of what machine reading is, we referred to the fact that CPUs don’t currently match human brain-power, but there is an expectation that in 15-20 years it will do. As we listened to some computer-reading, we questioned what the loss of intonation changes. We got involved in a debate about as to in what sense does a machine understand something? Can it close read and “understand”, or does it have to work within the limits of the fact that it is programmed by humans? Speech recognition software, what is it set up to recognise? Does Siri work in understanding meaning? Are we born tabula rasa, and how do we learn language – can a computer do the same? We are typically limited in thinking about the machines on our desk, but we need to think about bigger systems such as Watson, what do they understand? Do they just understand what they are told to understand, or can the AI take over and self-learn? What about the film Robot & Frank, in which a human-computer relationship developed – until the computer was rebooted? How AI already become scary? The computer is still asking one question, but is learning more efficient ways of gathering that data. How do SatNavs use data to produce a coherent narrative? AIs typically question if something is good/bad, now starting to question ethical decisions or say “I don’t know”: they are moving beyond cognitive binary decisions.

Earlier that morning, a story had circulated that “teens who use screens more sleep less”, which Bex – drawing on her book – and the fact that she grew up screen free, but stayed up late reading books – whether all the variables had been considered, and whether it was the screen, or the staying up late that was the problem (acknowledging that the issue of Melatonin changing body clocks has been well researched) – see this opposing view article last year. Is there a difference between staying up to read a print book, in which case you will be tired, or if reading a screen, will the body-clock have been fooled into thinking that it is daytime, and be impacted in other ways? Have we changed our behaviours in many ways because technology is available, but also to make technology more workable? The world has become technologized so that we can get more out of the workers, and we think we can cope with this, but we have seen that songbirds are being negatively affected by a 24/7 lifestyle. Some bloggers have referred to the invention of the lightbulb (rather than the screen) as disrupting our sleep patterns, but we can see further back that activities continued with candles/firelight.

For all the reflection there is on technology, is part of the role of digital theology to try and get a birdseye view of the situation? There’s no sign of it slowing it down, what is the harm that we may do if we are not aware of the affordances and constraints of digital technology (in a similar way to the way that smoking was publicised as healthy)? Do we need to look more deeply, think about what we have been ‘forced’ to do, where we have choices, and how much information we need to make decisions (and where the information comes from to inform those decisions). Are our brains being rewired, and is that a problem is so? Where are the positives – for example, with mobile devices, typically people are reading significantly more as are not tied to a desk top machine.

With the book now three years since publication, does this well-written text already feel a little old fashioned? If we look at efferent/aesthetic readings, are there more modern ways of approaching this? Are questions in the digital realm moving so fast that we need to be focusing on on-going technical reports, rather than books?

Dr Bex Lewis, Research Fellow in Social Media and Online Learning

Professor Katherine Hayles is IAS Fellow at St Mary’s College, Durham University (January – March 2015)

“Good Works” = “Academic Citizenship”


The realities of academic life – more than research and teaching … and incredibly difficult with short-term funding/contracts

For Mary Evans, centennial professor at LSE’s Gender Institute, the rewards of fully engaging in these diverse areas of academic life have been personal and political. “For many women of my generation it was very important to construct networks within the academy, hence motivation for ‘citizenship’ was very much about establishing a ‘voice’,” she says, adding that “building friendships through work as a ‘citizen’ is a huge help in limiting that sense of isolation that is part and parcel of being an academic”.

Susan Bassnett, professor of comparative literature at the University of Warwick, outlines her “citizenship” workload in the first half of 2015: “I have three PhDs to examine, only one in the UK; an appointing committee overseas; three meetings of projects on whose advisory board I serve; two plenary lectures abroad and a week-long workshop in Italy; plus reviewing for journals, funding bodies, references, etc.”

Read full article.

CODEC BOOK CLUB: Defining Digital Humanities

Reblogging from CODEC blog, posted earlier today:

Defining-Digital-Humanities-TerrasBack in October the CODEC team discussed Melissa Terras’ inaugural professorial lecture on digital humanities, whilst this week we focused on the associated book Defining Digital Humanities: A Reader (Ashgate, 2013) edited by Terras, Nyhan, Vanhoutte. We focused upon the introduction (pp1-7), and a series of definitions of ‘digital humanities’, covering the years 2009 to 2012 (pp279-297).

Discussions started with a questioning of which of the definitions most resonated with members of the CODEC team, seeking to clarify that what CODEC is doing is actually “digital humanities”. Bearing in mind that the text indicates that “we make no attmpt to imply that one view is more correct than another, nor do we believe this to be the case” (p279), it is unsurprising that there were a range of views. Digital humanities allows us to pursue questions humanists have always pursued, but faster and on a larger scale; it allows us to focus on digital culture, including cyberculture and posthumanism; and other projects allow us to create new online materials for future use.

Stephen Ramsay, from the University of Nebraska-Lincoln, indicates that digital humanists both built and theorise the built, demonstrating optimistic futurism, and scepticism of the posthuman condition. Kim Lacey from Wayne State University focused on increased cross-pollination between and across academic fields, whilst new interactive tools allow us to rediscover ideas in a new way, whilst ‘Mia’ noted that we need to think critically about the impact of digitality on scholarly practice.

We questioned what it is about the subject that makes it hard to define, although funding has determined that themaking of tools has been prioritised, so it’s tactical convenience to follow those kind of projects. Are we truly burning down academic walls, or simply jumping on the latest fashionable bandwagon (as the number of centres defined as ‘digital humanities’ has grown hugely in recent years) – whether that be digital humanities or interdisciplinarity. The book seems to focus more on what technology can bring to the humanities, rather than the other way round. Is the computer simply ‘slave labour’, producing data that humans then analyse? What about the computer’s ability to interact – how does that change things?

Bearing in mind that ‘digital humanities’ emerged from the field ‘Humanities Computing’, the definitions given by a range of academics appears to follow an interesting line between 2009 and 2012. In 2009, the majority of definitions were focused upon ‘the computer’, by 2010, the focus had moved to ‘functionalism’, by 2011, to experimentation, collaboration and interdisciplinarity, whereas by 2012 the conversation appeared to have moved onto changing practice, and questioning the impact of digital, with more resistance to the idea of that ‘digital humanities’ was any more than a phase in the humanities. In many ways this follows the same route as “e-learning” conversations, which view the “e” as simply a transitional term, as expertise is on the learning and teaching.

We discussed whether ‘digital’ has rebooted the humanities, and whether there are stop/start point, or whether it is part of a continuum (which doesn’t take away the history of what has already happened). We agreed that we are definitely building upon what has been done before, and that the range of expertise in the room (theological, biblical, IT, history, media) offered co-laboratory opportunities and new ways of doing. Rather than rebooting, the digital provides a new ‘operating system’ on top of what has already happened. The majority of those engaging in this field have been trained in the humanities and then start tinkering with computers and code. Within CODEC we’re certainly looking for collaborations for the back-end technology, as we focus more on the front-end questions.

If we throw computer science and theology together, or put theology into the digital spaces, what happens? How do we bring the expertise that we already have into the digital age/spaces and change the conversations? How do we define ‘digital theology’? Theology involves talking about God, in historical, contemporary, experiential and sociological ways (to name but a few). Does all of this have to be interdisciplinary by nature? Not necessarily. It’s also important to remember that those things that appeared cutting edge a few years ago (such as online journals), are now part of everyday practice.

Interdisciplinarity offers opportunities to break out of disciplinary silos, potentially seeking to make research more meaningful to society. We examined how far development in the digital humanities is driven by the disciplines or funding organisations. Technological tools offer opportunities to make money, so is it all driven by politics or economics, rather than academic drivers? How far has the ‘impact agenda’ changed the questions we ask, and the research outcomes we seek? Within academic systems such as the REF, interdisciplinarity, although officially encouraged, can make it hard to place those who truly work across disciplines.

We have to ask whether we are actually asking something new with the research that we’re doing? Why do others see value in the work that we do? Opportunities may be driven (or constrained) by funding, but the desire has to be there amongst those doing the work, and amongst those we are working with to produce the most interesting results. When we look at ‘digital discipleship’, what is digital about it? Is it just the technology we’re using? What are the bigger questions about digital culture that changes the way we do things/think? With regards to the Bible, Erasmus led to the production of the Bible/texts, leading to the production of scholarly resources which influence modernist Bible techniques and scholarly Bible interpretation. The changes are pedagogical rather than sociological.

Are we in an age of post-postmodernism , is “the” digital age a new umbrella term? We have seen this across “the ages” of history, including the development of writing in 5th Century Athens, the printing press in the reformation/renaissance, or was it pre-dated by post-modernism in the 1960s/70s? Are we part of the first movement towards a ‘greater age’, or is this, as Janison would say, the last blast of capitalism? We drew on Toderov (the reader is in charge of interpreting the text), Derrida and Richard Worthy in questioning whether the ‘consumer’ can read a text how they choose to, rather than it having a fixed meaning.

If there is a ‘digital age’, can we define it? Is it possible to create a simple list? What are the characteristics of e.g. ‘digital theology’? What are the differences between a/the digital age? Is it a new “age”? Does it have to be global? Is it difference from ‘the information age’? Has it/does it have the potential to be distinct? Both volume and accessibility have increased with computer. Was transportation a building block of where we are now at, leading to the industrial age, cities, societies in which skills could be aglomorated? It’s a modernist notion to place time in ‘ages’.

Digitial Humanities: Part of its way of thinking is to interrogate what it is. Most self-define, but do we need to agree? Funding places those constraints, and we can expect challenges to our definitions. Are there better things to do than argue over this definition? It was agreed that it’s important to have this conversation, as it’s happening, although it can be hard to define such things whilst riding the wave itself, anticipating self-reflection later on. It’s also important to think about who you are in conversation with, seeking points of connection with the disciplines, seeking to challenge/work-with the assumptions pre-coded into each discipline (e.g. sociology focuses on texts, materiality and communities). CODEC places itself firmly within the boundaries of digital humanities, and continues to work to develop ‘digital theology’, drawing on the range of expertise in the team.

Dr Bex Lewis, Research Fellow in Social Media and Online Learning

WTC Theology: Week 5: The Holy Spirit (@WTCTheology)



“The term pneumatology comes from two Greek words, namely, pneuma meaning “wind,” “breath,” or “spirit” (used of the Holy Spirit) and logos meaning “word,” “matter,” or “thing.” As it is used in Christian systematic theology, “pneumatology” refers to the study of the biblical doctrine of the Holy Spirit. Generally this includes such topics as the personality of the Spirit, the deity of the Spirit, and the work of the Spirit throughout Scripture.”

Pinnock, Clark. Flame of Love: A Theology of the Holy Spirit.

Understandingthe anointing of the spirit is more central than much theological study makes it. The Spirit was visible in all of Jesus’ life and mission – he was anointed, and a man of the Spirit.

Logos Christology has ‘towered’ over other interpretations, the divine word became flesh.

Spirit Christology – views Christ as an aspect of the Spirit’s mission, rather than the Spirit as part of Christs.

“The Almighty has inserted himself into history and humanity in Jesus – as weak, powerless and dependent on the Spirit – in order to become what we were meant to be, the communion of God and humanity. By the Spirit he has also become through resurrection the first fruits of the new humanity.” (p81)

Jesus coming opened a dfoor for humanity to enter God’s presence – transformed and glorified – unity with God, which is the destiny of creation

The Spirit is not subordinate to the Son, the two are partners in the redemptive process.

Luke, in describing the birth of Jesus, describes the Spirit of God hovering over Mary – reminders of the Spirit brooding over the waters of creation. (Lk 1:35). The Spirit has aways been present/working in the world. Jesus offered the same grace that has always been there, but is being explicitly offered – unambiguously.

Creator Spirit? Brought forth intelligenct creations for communication and fellowship with God – capable of appreciating more dimensions of the sound. The Spirit is working to orient people towards the ‘mystery of divine love’.

(p84) As shown in the story of the prodical son. “Love is not forced on the beloved, who is allowed freedom to make his own choices, even if it means siding with the darkness.” In having ;allowed’ evil, God made salvation available, creating hope, reissuring an invitation to glory. We can see evidence of the Spirit in the Old Testament – rescuing Israel from danger and distress again and again. Jesus in coming to earth made himself as dependent on the Spirit as everyone that he was living with[and us].

Jesus rarely spoke about the Spirit, but demonstrated its reality (rather than developing a doctrine). The conception of Jesus was an act of new creation. There is a fresh start for humanity, restored into communion with God.

Baptism of the Spirit in water … baptism … Noah and Jesus’ anointing – see the spiritual and the physical flowing together. Jesus’ time of temptation – the Spirit led him down the path of suffering, as with us, not “let” to avoid it. Jesus was only able to resist temptation because of his dependency on the Holy Spirit (was not play-acting, or living as a God = fully human) – modelling a lifestuyle of faith and trust for us all. Defining Kenosis: “He emptied himself, taking the form of a bond-servant, and being made in the likeness of men.”.

The Spirit enabled Jesus to live within the limits of human nature – to be ‘truly human’, rather than drawing on divine attributes. The Spirit was clearly with Jesus in his Ministry – providing healting – “a God who wills human wholeness”, setting people frree from entrapment, bringing hope, and liberating relationships. Because the relationship between the Spirit and Jesus is so critical, those who dismiss/discredit it (or see it as Satanic) are judged harshly. Miracles, etc. were not performed to impress, but for salvation of the body and soul – demonstrating the entrance of the Kingdom into the present. Jesus’ activities of liberation were powerful, but a threat to the status quo – leading to the Cross, where Jesus had to rely on the Spirit to get him through, and the Spriit enabled the Resurrection.

Spirit/Logo Christology are complementary, not antiethical. Logos = the Person, Spirit = his work. Neither are subordinate to the other, but reciprocal.

Why did the outpouring of the Spirit/Pentecost wait until Jesus’ death? “End time salvation could not come into play until this mission was completed and this representation had taken place.” Salvation == becoming one person with Christ, dying with him to sin and sharing the promise of his resurrection. Jesus’ death was an act of atonement that includes us (not excludes) rather than replaces us. “Christ became what we are in order that we might become what he is.” It can hard to grasp that Christ came to stand alongside us, as we tend towards individualistic thinking. Others act on our behalf all the time – e.g. govt leaders.

Soteriology is the study of the doctrine of salvation. Soteriology discusses how Christ’s death secures the salvation of those who believe. It helps us to understand the doctrines of redemption, justification, sanctification, propitiation, and the substitutionary atonement.

Theologically – “once we grasp the fact that we are saved by Christ’s life, we may be open to fresh thinking about how we are saved by his death.” (p99). Too much theology focuses on the idea that we are saved byChrist’s DEATH on the Cross, whereas it’s his resurrection that saves us… his power OVER death. Both sin and death are problems for humankind, because death entered the world with sin. The RESURRECTION is not just proof of God’s divinity, but cause for salvation and transformation as we share life with him.

“We were created in the image of God, with a view to growing into the likeness of God”. Lost with Adam, Christ restores this likeness., is our representative on the journey.

(p101) “Humans are open to the future. They make plans and strive to realize goals. But we are mortal, and ultimately the future lies in God’s hands.”

Representation/solidarity – God entered deeply into the human situation to overcome all our alienation. Spirit Christology – centralises resurrection – conceptulise the cross as recapitulation, and give the Spirit back the world of atonement.

Popular view sees Father as Judge and Son as victim, rather than understanding that this was a united action.. the enemy defeated by an act of defenceless love. Jesus did not give himself as an appeasement, but surrendered himself to God on our behalf. C.S. Lewis spoke of Christ as the carrier of good infection – we need to get close enough to catch the virus of new life. Delicate topic = “divine wrath” – but Christ was both victim and victor on the Cross. Strange theologically as grace then appears conditional upon penal satisfaction… but it was the Father who took the initiative in reconciling the world. Jesus was not the solitary victim but representative of the whole of Adam’s race.

Vindictive anger … no … God’s saving action = serving grace. The Cross and resurrection is a trinitatarian event. Doesn’t discount, but wants to raise the question of a judge that loves us and desires our friendship – the two can work together.

Fee, Gordon D. Paul, the Spirit, and the People of God. (Chapter 2 ‘ God Revisits his People.’

[Interesting in a digital age] – presence (what people miss, and what cannot be taken place by any form of media (phone calls, photos, etc) – for shared life, loved ones need to be present. God made us in his own image because he is a personal, relationship being – we have lost our vision of God, and therefore our relationship with God. For Paul – the coming of Christ/the Spirit changed this forever. The Spirit represents both continuity and discontinuity between the old and new covenants. The Bible focuses a lot on Presence – from Genesis to Revelation – The Israelites saw themselves as a people of Presence, among whom God had chosen to dwell on earth. Old Testament = Tabernacle/Temple = where God is seen as most Present. The Fall of Jerusalem – lost the presence of God in their midst. For Paul it was important that this was seen as the Holy Spirit.

If people reject Paul’s call to holy living, they are essentially rejecting the Holy Spirit in the New Testament – as this is where the New Covenant is. .. the Holy Spirit will indwell in individuals – who are the new temple(s)? The gathered/corporate church is God’s temple in the community = incredibly important.

Hellenistic dualism – understanding amongst the Corinthians about the distinctions between physical, material reality and the immaterial, invisible realm – the human spirit not affected by what is happening with the body. God, however, created us in his image – in body as well as spriit. We have been purchased by God, and therefore minds/bodies are not their own to do with as they please. The Spirit filled life is not just contemplation, but the ethical life that the Spirit produces. The Spirit has removed the veil – between us and God, so that we can face the glory of God in full. We are in the image of God in the ‘now but not yet”.

The Spirit is not an impersonal force or influence or power – it is the fulfilment of the promise that God would once again be present with his people. Do we therefore need to downplay the impersonal images of wind & fire, etc and think instead of the Spirit as the personal presence of the eternal God.

Session 5.1: The Spirit in the Gospels

Jesus doesn’t have a God-switch to turn on e.g. miracles – he does his works through the Holy Spirit as he is fully man. Jesus wants us to know that it’s to our advantage to go away… if we wanted to meet him face-to-face we may be able to meet with him once in our lifetime, queing with billions of others – because the Holy Spirit means that He lives in each of us individually.

Session 5.2: The Holy Spirit Given

We will go on to be God’s representatives within the world – looking at various readings from Acts to show how the Holy Spirit of God can be received by all.

Session 5:3: The Spirit’s Love Gifts

Some of the gifts include actual people? (Ephesians) These people are sent to help equip us and train us to represent God. Ministries (1 Cor), Spirit’s graces (Romans). The Holy Spirit characters – love above all (1 Cor) and fruits of the spirit (Galations). If not giving to the poor/information/knowledge, etc. but if it’s not done through love, it’s pointless. Fruits of the Spirit are more than ‘emotions’. The character of Christ imparted to us as we grow – so we can deal with those we don’t like, want to hit, etc. but still deal gently and lovingly with them.

Session 5.4: The Spirit in Luke and Paul

We are given the Holy Spirit at conversation as a gift of the New Covenant, (baptismal) although Pentecostals would note a ‘second blessing’ for those who were already believers. The Holy Spirit as character, but also ‘being filled with’ (is it given, came upon, them, received it, , etc.) .

Session 5.5: The Spirit and Christian Identity

The Nicene Creed – we believe in the Holy Spirit (the triune God). In some churches known as a troublemaker, in heaven he’s known as God. When I’m praying, am I praying as me, or is the Spirit praying – once we can’t tell, that’s a good place to be. Is our flesh in communion with God’s spirit, or in communion with the world? We have each been anointed, and have the teacher within us… relationship not mediated through the Bible, priests, etc (which may help), but within you. 1 John 2 – identifying discernment. There are Christians or false prophets – as all Christians have the prophetic Christian spirit within them.


Completing #CORPUSMOOC

During interactions with JISC and ALT in particular, MOOC’s have been hot news for quite some time. MOOC is an acronym for ‘Massive Open Online Courses’ – freely available to all. They don’t have the best reputation for completion rates, which has opened up a number of discussions at JISC/ALT events as to whether completion, and particularly full completion, of a MOOC is the point of these things. In 2012, JISC ran a session ‘What is a MOOC?’ – one of the early slides here:


Picking a Course

Last year, I decided to get my head around these, wondering whether CODEC could usefully develop a MOOC (as the financial imperative is not clear, except as a marketing exercise, for many of these courses). I cheerily signed up to about 3 courses, and… didn’t get started on a single one of them, as other work priorities took over. As we were developing potential funding bids earlier this year related to ‘religious identity online’, Pete suggested that I undertake the ‘Corpus Linguistics’ module that he’d had a go at last year. I had no idea what that was – so as all good academic researchers do, popped across to Wikipedia first for a definition:

Corpus linguistics is the study of language as expressed in samples (corpora) of “real world” text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely derived by an automated process.

so I could see how this would be useful for analysing words collected from Twitter, Facebook, etc. to analyse large social and cultural questions.

Corpus Linguistics Online

The course, hosted on Futurelearn, and presented by Tony McEnery, Lancaster University, was designed as a practical course for humanities and social science researchers, with the following stated aims:

  • Demonstrate that corpus approaches to social science can offer valuable insight into social reality by investigating the use and manipulation of language in society.
  • Equip social scientists with skills necessary for collecting and analysing large digital collections of text (corpora).
  • Provide educational support for those who want to use the corpus method.
  • Demonstrate the use of corpus linguistics in the humanities, especially History.
  • Give a sense of the incredibly wide uses that corpora have been put to.
  • Allow those with an interest in language, who have not heard of the corpus approach before, a new way of looking at language.

Combined with all the recent changes in CODEC, the first week’s material seemed a little overwhelming, so early on I made a conscious decision to focus on the theoretical explanations each week (likely to take around 60-90 minutes per week, rather than 3-4 hours), so that I could grasp an understanding of the method, and the kinds of questions it can allow us to ask, rather than the practical aspects of the software (also provided), although having just finished the course, I am now checking out some of the optional videos, especially one from Claire Hardaker re trolling, as recently I’ve been asked to contribute to debates about trolling, bullying (and the place of restorative justice in these debates):


The Process

First things first was making time in the diary for this. Originally expecting to take around 3 hours a week, this did drop, but I wanted the process to not just be about the course content, but about thinking how a MOOC works, and what it contributes to our learning, although much of this may have been absorbed at gut level, rather than laying it all out here, so this is more of a ‘quick and dirty’ response!

  • Sign up for the course on Futurelearn (there are other providers, including independently hosted MOOCS, e.g. this ‘open theology‘ module I’m undertaking with WTC)
  • Make time, and put reminders in my todoist! Get clear on what I actually wanted to get out of this, so that would focus time/energy on those areas.

It was really easy to sign up for Futurelearn, and everything comes in via email, so it was simple to search and find the course I wanted (and you’ll see from the screenshot how easy it is to leave it too):


Week by Week

Each week the available material would appear. Clearly, it is technologically possible for all the material to appear at the same time, but there’s a need to encourage people to work on the material together, with a start date, etc. encouraging use of the well-structured (and well-used) ‘commenting’ space, which Tony himself contributes to frequently (and is clearly gaining insights into his own research), and with a number of mentors who have been assigned and are highly active (I’ve had replies to several, but without having asked for permission, thought I should just share my own comment!):



So, the material appeared each week, looking like this (I’m assuming if I’d completed all the practical activities, those lines underneath would have got longer!), with the most basic, introductory material (usually in the form of videos from Tony) at the top – which was the stuff I was really interested in.





Being able to see how much more to go is always a good incentive – below the fold there was much extra material – more videos, readings, practical software help, etc. but I usually finished at the point of the quiz (which isn’t assessed, but helps you “know” that you have “learnt” some material that week (and where one might want to go back and re-assess):





Users are encouraged to keep a journal throughout the project, which I did through notes kept in ‘Word’ and then transferred across to this blog, and shared using the hasthag #CorpusMOOC.

What have I gained?

Well, I may have more to say about this in the longer-term, but for now

  1. I’ve started a MOOC
  2. I’ve finished a MOOC
  3. I’ve done the bits of the MOOC I wanted (if you know me, you know I’m a bit of an completer/over-achiever, and initially thought I can’t just do the bits I want!) and no more.
  4. I think I’ve got a good sense of what Corpus Linguistics is capable of, and could see that I could use it in my research, although I would have to spend more time learning the practicalities/partner with A.N.Other.

I thought the material was well-presented, manageable (once I stuck to the first bits), the intentional interaction was good as well as the usability of the software, and I can see how more can be done in this subject area.


Thank you Tony and team!


#CORPUSMOOC: Week 8: A Swearing Extravaganza

This week looking at ‘swearing’ as it is used within language .. so there’s a disclaimer, some of the comments:

The use of ‘bad language’ seems to me to be very cultural specific. For example, young people seems to use it more often than old people. And I see variation of what’s considered as ‘bad language’ between registers and dialects. For example, the same person would never use bad language at work but he probably uses it when he is with friends; and what’s considered bad in some areas would not have this consideration in others.

Of course, you have to define what is meant by ‘bad language'; obscenity is very culturally specific (Northern Europeans: body parts, coition and excrement, Southern Europeans religion, mothers, aspersions re sexuality – the Victorians found the phrase ‘what a cunning hat’ rather racy). The point is well put, though.

Oh dear, the warm up activity is to listen out for the use of bad language in conversation around us … probably more than you’d expect even in my own context! Interesting conversations online about whether language teachers should teach this, as students will come across it (don’t we all remember how funny it was once we learnt ‘merde’ in French classes!)

amazing what you can get used to after a while and how much these words lose their strength through over use.

Part 1: Looking at Bad Language

Why say ‘bad language’ and not ‘swearing’? Definitions of what is ‘swearing’ = complex!


Words developed for the Lancaster Corpus of abusive words – including animal, intelligence, sexuality focused insults. Then had to develop an annotation system for the material – including class, gender, age, etc. Can provide some quite useful distinctions that can be researched. Metalinguistic word – am not using the word, but I’m talking about it/describing it, or quoting someone else saying something.

Who knew there were so many different ways to use ‘fuck’ – fascinating…


Final category = a ‘dustbin category’ for those that didn’t fit any of these categories, and didn’t really need further work.

Commentator suggests that video helps gives further insights into the use of swearing in language – jocular, and ‘fillers’ have been mentioned by other commentators.

Another kind of ‘MOOC’ – – such dictionaries allows us to see language develop.

Part 2: Swearing and Gender

We can use such corpora to see how such language is actually used – but we’ll likely approach such questions with a number of assumptions – e.g. that men swear more than women. In early 1990s, there was no statistical difference in usage, but in looking at the individual words themselves, these are different… words used by men tended to be stronger. 


There are levels of ‘strength’ seen, but there are possibilities that these might be used differently … e.g. ‘religious people’ more offended by God/Jesus than the general population [Note to second year housemates, yes…]

Commentators mentioning encouraging people to rethink phrases that have become everyday

  • Someone being ‘a bit gay’
  • Someone having ‘a blonde moment’
  • Someone ‘running like a girl’

Is there ‘surgical cleaning’ where such words become sanitised? Corpus tools, of course, are good at identifying the change in language of words e.g. ‘gay’.

Different people will probably see some of the words as more offensive than others… e.g. people say ‘God’ without thinking – probably more offensive to ‘religious people’ than many realize.

Part 3: Swearing and Interaction

How do the genders interact when it comes to the use of ‘bad language’ words? Is there a difference between or across? Intra-gender use of swearing is the norm (e.g. men direct swearing at other men more than at females and vice-versa), but men do this much more than women (have they been cultured to swear less in front of women?

What kind of words are targeted? E.g. ‘cow’ exclusively at women…


Wow… so much complex!

Part 4: Strength of Swearing

Different categories of words (e.g. general annoyance) = much milder words, but ‘destinational category’ (reached end of tether = “go away”) = much stronger!

Discussions mentioning new British National Corpus coming this year, where it will be interested to see how words are used/re-used and reclaimed – e.g. African-Americans claiming ‘n****r’, gay people claiming ‘queer’ and women claiming ‘bitch’ as positive interaction words. Also lots of discussion as to regional/cultural differences and how the right corpus might help explore those.

Part 5: Swearing and Age

Assumption is that younger people tend to swear more, and data seems to bear that out:


Is it down to age? It’s not necessarily their age that is the issue. The cultural environment may have meant that swearing was less accepted, so don’t swear less as get older! Are they possibly using ‘swear words’ that are so mild that they’ve not been measured as swear words (e.g. golly, blimey), although this doesn’t exist, either. What about the strength of swear words/categories? Mirrors the distribution from the graph above. Frequency/strength distribution are similar.

Commentator notes: ‘When angry, count to four; when very angry, swear.’ (Mark Twain). Also questioning whether the extra drop-off is down to being in the presence of children/grandchildren, when people seek to reign themselves in.

Part 6: Swearing and Class

How do we draw out the nuances here? Do lower classes select stronger words, and higher class = weaker ones?

AB: 1.81, C1 1.76, C2 2.16, DE 2.47 (General pattern, but AB = stronger than C1)…

What about the type of bad language use? AB/C1 and C2/DE = inverted.

Lots of discussion about whether upper classes = rules don’t apply, and middle classes more cautious…

Part 7: Combining Factors

What happens when try and combine the data – e.g. male AB aged 25-34 = use most? BNC was balanced to get roughly similar amounts of data on single data. May be no examples combining particular factors… that particular group = 2,259 words uttered in the spoken BNC.

How many types of speakers are in the BNC? Not many, but we can combine particular types of data to give insights.

Part 8: Combining Factors – 2 Case Studies

Age/Gender combination:


Class/Gender combination:


Class/Gender/Age combination:


Do you want to argue – are women pre-disposed to use less swearwords? Surely socially constructed, it’s an artifact of the society within which these 2 genders are operating, nothing to do with genetics. Debate? Where did the distinctions come from? What were the social processes that constructed this?

Commentator: People are willing to say things in other languages they’re not prepared to say in their own –

Final Words from Tony

The start of a journey into language .. with an overview of the kind of things you should have learned, and in a position to build your own corpora [though I didn’t use the practical elements!]… and don’t think that this course has given you everything…

We often want to study language in their social contexts, rather than in isolation. Contemporary social issues or historical issues typically the most interesting.

Overwork = Less Creativity? Milking the academic cow dry?

mflfn0I (1)A fascinating piece on the culture of overwork within academia, finishes:

We need coalitions of the sane to lead discussions about what can reasonably be expected of academics, to recruit and promote accordingly and to mentor younger academics into a way of thinking that says: “Enough is enough. If you want to do extra, we won’t reward you for it.”

You might assume that institutions run by coalitions of the sane would automatically fall behind those run by further achievers. But think again. Universities vitally depend on academics’ ability to productively use their intellect, curiosity and creativity. In business-speak, ensuring a sane working environment therefore safeguards their supply of academic human resources.

A dairy farmer might streamline his delivery routes or negotiate discounts on milk bottles. He won’t run the health of his cows into the ground by demanding that they produce ever greater yields. But that, in essence, is what universities are currently doing to their academics. Fingers crossed that voices like Schell’s will wake them up to how counterproductive that is – preferably before the cows come home.

Read full story.

WTC Mooc: Week 3: Living the Christian Story – Exodus (@WTCTheology)


Core Reading

Exodus 1:1-15 (or chapters 1-15, but there’s only 10!) through to the story of the Plagues … but overall a story of oppression leading to recreation?


Israel is becoming ‘many’ as ordained at creation… but there are several obstacles to be overcome first. Seen alone, misses the ‘new humanity’ destined to restore creation blessing to the world. With Moses, see how it harks back to e.g. Noah, and forward to Israelite experiences. Moses says ‘I am’, which has led to a lot of discussions as to whether he was divine – or God’s representative. Either way it’s important to understand that God was present.

Moving from a time of chromos to kairos (exile is almost over):

Such is the case with chronos and kairos. Both are Greek words which mean time, but they imply different things.

Chronos refers to minutes and seconds. It refers to time as a measurable resource.

Kairos is the word used for time in Ephesians 5:16 (which I examined in more detail here). Kairos means an appointed time, an opportune moment, or a due season.

Constant battle between Israel/Egypt, pre-ordination of what will happen when Jesus comes, and defining the Israelite nation – the importance of the ‘firstborn’ sons, the gradual admittance of the ‘superior power of Yahweh’. The parting of the waves of the ‘Reed Sea’ answers the question of who is in control. The goal of the journey to re-establish the Abrahamic covenant, having passed through the water, now to pass through the human gauntlet, to re-establish ‘Edenic sanctuary’, where God can dwell again with his people. The importance of Mount Sinai to the Torah… a new covenant with God – where obedience leads to blessing/fullness of life, disobedience to curse and death – set apart as a ‘holy nation’, imaging God to the nations. Moses as the mediator between God/the people as God is too powerful, a covenant marked by blood/a sacrifice. An important part of this covenant is to work and rest. The development of the Tabernacle = similarities with Eden, but even before the covenant is made, the Israelites have broken it with false idols. The freedom emphasised by the divine is ‘grace and mercy’. In the 10 commandments, the judgements are given first, but the ‘rear view’ of God emphasises mercy and forgiveness. The golden calf doesn’t represent God, but human beings in converse with him.


Pharaoh felt threatened by immigrants (such as Jacob’s family) – so became ruthless in the tasks they imposed upon them – but despite this – the numbers increased. Considers how this is in parallel with the Latin American experiences (of men currently in jail) – especially when they are given ‘education’ but not opportunities to take jobs, so go back to what they know – a life of crime – in order to live. The ‘baby boys’ were killed before they could become a problem. The Egyptians needed the immigrants because they did the work that the Egyptians didn’t want to do, but the general thinking was for deportation or down-trodden-ness. The midwives were expected to kill the boys, but refused, so were blessed by God. By worldly standards, Pharaoh was at the top, but God was on the side of the oppressed/weak (not the oppressors) and they survived.

Often when people read the Bible, they read it through oppressive interpretations, which can be subverted by careful reading of the text itself. Guided readings can question assumptions and invite unexpected identifications. When Moses impulsively kills a harsh taskmaster, he has to flee – originally a ‘saviour’ he is now absent from the scene, and failing to intervene – how many see God also… but a deeper reading can see a bigger picture coming to fruition.

When working with Latino prisoners, the author – a Caucasian pastor – representative of the prison system, and of God … many in the prison see God as hyper-sovereign – distant judge who has pre-ordained everything, so lives cannot be re-mapped… all negative aspects of their lives are ‘God’s will’. Their theology assumes that God is just/good and therefore that they must be bad/deserving of all the calamities that have befallen them. No redemption is expected. .. and people attend the sessions for reasons such as social interaction, especially once they find that ‘accepting Christ’ does not instantly solve all their problems. They may think that attending will give a lighter sentence. Belief is, however – people are hungry for an authentic encounter whatever the original reason.

The facilitator has a careful role, which subverts those barriers, replacing the old, paralysing theology. .. with a need to distance himself from ‘taskmasters’, and make the prisoners realise that he’s on their side. God’s will happens through covert disobedience, non-compliance, etc… God listened to the groans of slaves, but they remained slaves … author discusses his experience of working with those seeking to find liberation from e.g. heroin addiction – not in terms of the ‘heroic victor’, but weakness/ignorance on how to heal. In Mexico – often become violent – taking frustration out on someone – as Moses did – direct experience of poverty/oppression = violent. No long-term respect however… required for that = respect and humility. God shows up where Moses is (wherever that wilderness is). Moses when called to go back, said that he wasn’t worthy (he was human). “There’s another really important guy in Israel’s history who didn’t feel cut out for this. Look, God used him. God can use me too.” [Imposter syndrome?]

Session 3:1 Exodus: Setting up the Conflict (Matt Lynch)

Theology and Social Action…. Exodus – reflected in the way they rebuilt their society – institutionally built into their laws, etc. Moving from being a family, to becoming a nation. There was fear-based oppression … being fruitful and multiplying = in the wrong place/time = threatening to Egyptians. Pharaoh does not know God, and does not let the people rest. God cares and hears the cry of the afflicted. God hears it and responds.

Session 3:2: Exodus – The Plagues

God has power of creation, has concern for poor and the afflicted. Conflict between Pharaoh/God. God displays his power over Pharaoh and over creation with the boils. God “you can’t act this way towards my people and get off the hook.” Why does God harden Pharaoh’s heart? The plagues are not to get the Israelites out of Egypt, but to show his power to Pharaoh… who has fraudulent power… shows future generations his power?

Session 3:3 Exodus – Concern for the Vulnerable (Bob Ekblad)

What actions lead to liberation (Exodus 1)? Do we see the presence of God in this text? Confusing – God blesses people who lie in order to protect the innocent? [Makes me think of Corrie Ten Boom – ‘they are under the table’]. What does God do? He’s on the side of the underdog? How does God liberate? Who are the main characters from most powerful to weakest? Who is God with?


  • Shiprah and Puah (midwives) à non-compliance and deception
  • Moses’ Mother à Hides the baby not obey the law
  • Moses’ Sister à Spies for Moses (proactive)
  • Pharaoh’s Daughter à Seeing/having compassion

Session 3.4: Exodus: Liberation Today

Understanding contemporary law enforcement. A challenge to those who believe that deception is a sin – most characters are undertaking this in this story. What does the use of these weak characters tell us about God? How does God save now? It tends to indicate non-compliance. Pharaoh’s daughter uses her privilege to rescue. What would that look like today? Funding people to look after other people? Should we be giving work to denied asylum seekers? Educating them? What are the limits of compliance for us?

Session 3.5: Exodus: Seeing the Vulnerable

The importance of “seeing” human beings (rather than ‘the mass’). [Seeing individuals/telling people’s stories?]. Moses (and most on the margins) assume that God is on the side of law enforcement and the status quo, but Moses’ violence response doesn’t disqualify him from God’s mission to be an agent of liberation. God is looking for similar characters in contemporary culture.

#CorpusMooc: Week 7 Notes

What languages did you learn and how?

Only test I’ve ever got 100% on is a language aptitude test – apparently I’m good at identifying patterns and working it out from there … which probably have noticed “in real language interactions”

French to GCSE level, text books, but to get through the exams = extra spoken lessons, where saying the correct thing was abandoned for getting ‘the right word’

German for a couple of years – got confused between that and French, very particular words and grammar focused

Latin for 3 years – grammar grammar grammar and vocab

Italian – tried an online course – didn’t need to put it into practice

Brazilian Portuguese – Linkwords (linking words to really silly sentences), gave me something to start with for 5 months in Brazil, then have to use language to progress. Now using an iPad app to get back into things – where everything is gamified – largely vocab focused.

Learner Corpora

Contains data by those learning a particular language… Native corpora don’t refer to the problems that learners tend to encounter (as natives don’t tend to make the mistakes that learners do). Identifying errors in essays, etc. allows development of new leaner corpora. There can be bigger complications than frequency, and what is the background of the original language, so what translates/makes sense, etc.? What about under/overuse of words (especially compared to native speakers).

Interesting differences between the keywords that were used in discussions re the use of mobile phones – where are the different cultural emphases?



Interesting – Americans tend to use personal pronouns (I/individual experiences), whilst Polish tended to use (we/group) – speculates whether Polish is more ‘academic’ writing style [or is it the cultural expectations – definitely assume that Americans talk individually] … or Polish have less mobile phones so probably use in the group, and they rely on abstract nouns anyway = more generalisations. Rhetorical style – can be practical reasons, can be teaching style/vocabulary, societal differences.

More common to come across written than spoken corpa data … more difficult to capture, and also captures a larger range of words than there are, as computer doesn’t recognise spelling mistakes, etc. If analysis is just at a lexical level misses the range of uses. Too much research is not shared.

#CORPUSMOOC Week 6 (Notes)

regular_6d7d092b-68a0-40e1-8787-caaeaa0ea753Before you watch the lecture, create two short dictionary definitions: One is for the word ‘threadbare’ the other is for the word ‘luckily’. Do not consult a dictionary or other reference resource – just use your own intuitions. If you do not think you know either word, just make a list of words that you think may be associated with each. Then watch the lecture. 

Threadbare: A condition in which clothes are worn through, nearly to rags.

Luckily: Where a situation could have gone wrong, but the outcome was positive.

VIDEO MATERIAL: History & Development of Corpus Linguistics.

Use large corpora to identify the words that are most frequently used. The most efficient form of language learning ties to the words that people use most frequently. Studies are corpus based in their philosophy.

Early – most was written rather than spoken data, and much was not on contemporary texts (e.g. 19th Century novels and the Bible). By end of 1950s from teaching words to teaching rules (grammar). Verbs = 60% of what we use, but are hard to teach, + irregular verbs. Look for the popular/typify speech words.

Listening to these videos as a piece of history, as the studies have developed over time, identifying various elements of text, speech, and how focus on the words that people actually use etc – a very small number, with a large number of common lexical bundles (less common in academic writing). Developments of dictionary – large numbers of words, especially rare words is not helpful – that’s is required for [e.g. Countdown]. Writing definitions – need examples of how the word is used in context… I like what is able to do with this in making the dictionary digital.

I’m not a linguist (but wanting to interrogate tweets), so I’m multi-tasking on this material and taking fewer notes!




#CORPUSMOOC Week 5 (Notes)

When taking a statement from a witness or suspect, what kinds of factors about them, the crime, or the larger social context should we take into account? One example to get you started: the interviewee’s age – children and the very elderly should be treated especially carefully.

Suspect many would say ideally classless, but their suspected role in the crime, the level of evidence, age, race, gender, religion, class, education level, the recency of the crime?

Forensic Logistics (Claire Hardaker)

Narrow View – forensics = court room views, etc.

Broad View = anything from criminal/civil trial or part of the investigative procedure (they may not have been expected to be forensic data, but they become some).

What is the meaning of this text? (what is the purpose of it?)

Who authored a given text? (actually written by x)

Language of legal texts/processes (e.g. was consent truly informed). Huge area, restricted only by the questions you ask…

Physical Evidence

Analyse ink/paper, etc. to see if appropriate to era, etc.

Historical Evidence

Knowing what the author of a particular document knew, usually most people a specialise in only one author as depth of knowledge

Cipher-based decryption

Author has deliberately encoded their name into texts (particular to Bible studies/Shakespeare studies). Not particularly serious method of analysis

Manual/Qualitative analysis

Conversation/discourse analysis, syntax, stylistic choices, etc. Look in depth at the language being used. Drawback = cherry picking – in a court can support offense/defence.

Automated/quantitative analysis

Computational linguistics, computational stylomotry, and today’s focus… Multi-variate approaches…

Combining forensic linguistics and corpus linguistics

What are the benefits or drawbacks? Combining approaches – don’t just celebrate the strengths, but also understand the pitfalls (especially if it’s evidence for a court case).

Looking at ‘disputed authorship’…

Corpus data = large datasets, that has often been cleaned for consistency of spelling, etc.

Forensic data = often small, e.g. a text, so difficult to analyse. Often quite messy.

May allow to set e.g. a text against a larger dialect set.

Looking at ‘style’, are looking for things that are ‘unconscious’ and therefore unchanged from general style – e.g. a forged suicide note. Can be hard to identify unconscious material.

Corpus – easy to search large datasets, whereas forensic information is difficult to encode – e.g. a thread, sarcasm, etc.

Adopting a corpus means that have made assumptions – e.g. that you are going to have something to count/that count will be meaningful. On e.g. Twitter how account for variations of e.g. ‘and’ = +, &, n, etc… Looking at texts, if always seems to write xx at end of texts, but doesn’t on this text –therefore not theirs = needs more context.

If no restriction on author, not going to be able to identify this. Corpus forensics works better at narrowing between a & b, rather than across the sector. Words have range of meanings, can end up with redundant data.


These 2 can still work together

  • Shared goal = objective, quantitative facts of yes/no
  • Is it common? What does it typically mean? Is it significant?
  • Corpus = how likely is it that this occurred by pure chance alone.

The Case of Derek Bentley: the crime

Diagnosed with mental age of 10, reading age of 8, 66 on IQ test (unusual). Who armed him (with knife/knuckle dusters), what he said, ‘diminished responsibility’ was not recognized.

The case of Derek Bentley: the evidence

Saying ‘the gun’ = was assumed (shared knowledge) that there was a gun that Bentley knew about. Police had to write down longhand, but couldn’t ask substantive questions (ask for repeat, but not, what time was that?). Bentley ‘witness statement’ was presented to court = a faithful witness of what he’d said. Throughout the trial Bentley said he didn’t write it himself, but 3 police officers said he did. The statement clearly demonstrates that a conversation has been turned into a witness statement = crucial to his conviction.

The Case of Derek Bentley: the analysis and conclusions

Note the use of ‘then’ (temporal = sequences of events). Typically monologic statements don’t display this, so suggests that there was intervention.




Can’t use this alone, but is another indicator, also see the pre/post-positioning and which is un/usual constructions (I then/then I). 1,000 times more often in Bentley’s statement than in entire Cobeld corpus.

Along with other features, if it becomes clear that Bentley hadn’t written the statement, and he was convicted largely on ‘the gun’, then how reliable is that evidence. Not fully pardoned til 1998. Can’t give Bentley back his life, but can challenge a miscarriage of justice.

PART 6: Other cases and datasets

Look at language he used – doesn’t help prevent a crime, but does help understand triggers, etc. and may provide notifications for other crimes.


Offers a set of forensic data, including Old Bailey, Unabomba, OJ Simpson, Harold Shipman, David Irving vs Penguin, Enron, Anders Breivik, Paul Ceflia vs Mark Zuckerberg, Conrad Murray, etc.

Be aware of version control, ethical nature of the material, whether edits have been made, etc. Ensure rigorous nature of the work that you do, as other’s prison sentences could depend upon it.

#CORPUSMOOC – Corpus Linguistics Week 3

Find an article in which the word ‘refugee’ is mentioned – make notes about how refugees, migrants, asylum seekers, etc are talked about. Chose:

  • Referred to in terms of numbers (large numbers)
  • Range of words indicating a ‘problem’ to be solved, stemmed, halted, stop them infiltrating, as a danger, etc.
  • Refugees = a destabilising influence
  • Humanitarian refugees (criteria unknown) only allowed.
  • ¼ people in Lebanon = refugees, highest number in the world = straining infrastructure and driving down wages.
  • Need ££ to deal with “influx”.

Oh, maybe it was supposed to be a British newspaper – ah well, pretty familiar!

Video 1: Refugees and Asylum Seekers in the UK Press

Methodologically – need large amounts of data, frequency data, hunt for co-occurrences, annotation/grouping, quantification and statistical significance.

Merits – helps us get ‘the big picture’, identify the ‘aboutness’/areas of interest that can be interrogated – can work qualitatively/quantitatively and check on ‘gut instinct’

Core terms – keywords, cluster, collocation, semantic prosody, discourse prosody.

Video 2: Building the Corpus and Initial Analysis

In UK universities is access to many newspapers, but need to define the keywords [x OR x OR x AND NOT x]

How derive a query? Collected a quick corpus of texts from a pilot study, then compared to ‘general English’ to define the ‘aboutness’, then used keywords/intuitions/concordancing to include/exclude from collection. Data was split into ‘tabloids’ and ‘broadsheets’ (interesting distinction). More data in the broadsheets, but articles in broadsheets = longer (so they are not ‘more obsessed’ about them.

Finding ‘topoi’ = finding key ‘theme’ in the data. How do ‘collocates’ (associated words) help construct that theme?

Statistical significance important. Red = tabloids; blue = broadsheets.



  • Generally about entry (mode, place, legality) – discourse largely established by the TABLOIDS
  • Number, Abuse, Numbers, Finance (cost/abuse), threat – also tabloids (except large numbers)
  • Residence, legality, issues with system, unwelcome (authentic and legitimacy only mentioned by broadsheets).
  • PLIGHT – much larger in the broadsheets (so more sympathetic?)

VIDEO 3: Tabloids, Broadsheets and Key Clusters

High probability for collocates. Red = tabloids, blue = tabloids; black = equal.


Related to numbers/quantity – different ways of doing it, but both speak in quantity metaphors, and also in the idea of ‘plight’ (based on number of collocates).quantity

To look with the word ‘illegal’ – manually checked it, then right-sorted to see what followed the word illegal. Identifying origin, ethnicity, religion, age, type of work, etc.


Number of clusters – some are more ‘emblematic’ of tabloids…


Equivalence is being ‘forced’ – terrorism, crime, fraud, etc. all being brought together in the discourse, rather than representing ‘reality’.

How many occurrences per million ‘normalised’ amongst words? Expect to see more in the tabloids than the broadsheets.


VIDEO 4: IN FOCUS. The expression ‘pose as’.

Who uses the term ‘Pose As’ in relation to RASIM? Tabloids use it 8 x times more than broadsheets…

Beggars, crooks, etc. are identified as ‘posing as RASIM’ = taken ‘as fact’, and therefore positive stance towards ‘tougher measures’ – this is particularly in the tabloids. It’s there in the broadsheets too, but the opposite view is presented (if with less words).

Identifying problems in the asylum system by police/reporters ‘posing as’ RASIM.

The tabloids focus particularly upon asylum seekers ‘posing’ as nurses, etc…

Criminals may pose as RASIM to harm RASIM – also in tabloids, but very low numbers…

VIDEO 5: Summing Up

Focus upon words ‘suffocated’ and ‘drowned’ – focus upon whether they were represented as ‘illegal’ – directly (illegal immigrants) or indirectly (sneaking)?


Dictionary may have a range of different meanings, but the press gives a range of terms that ‘mean similar’ … used in a particular way continuously.

Remember that there are distinctions within newspapers, rather than labelling ‘tabloids’. Question how helpful your distinctions are.

Move between largescale analysis, and closer/more-detailed readings of the text.

#CORPUSMOOC : Week 2 Notes from @drbexl

The second week of the MOOC ‘Corpus Linguistics‘ via Lancaster University:


I want you to think of two words – ‘diamond’ and ‘cause’.

Without consulting anybody else, or looking at any reference resources, write two short definitions for these words. Take no more than two minutes to complete this task.

A diamond is a compressed mineral whose rarity ensures that it has high value. It has gained meaning in recent centuries as a valuable gift, especially to signify love, and is commonly used in engagement rings. As an anniversary it signifies a long marriage.

The word cause may refer to ‘a cause’ that one supports, including charitable causes, or ‘to cause’, as in cause something to happen.

Recap and Introduction to Collocation

  • How can we manipulate and exploit that frequency data in order to gain insights?
  • Collocation is one way to do this – systematic co-occurrence of words in use, and may influence each other’s meanings, e.g. back/front, telephone/operator = the result of hunches.
  • Hunches can be right, but not always, as things may be more/less important than we think they are.
  • See, e.g. diamond, and we’ll be reminded of a range of meanings.

Screen Shot 2014-10-10 at 19.44.08

  • How close do these have to be to collocate? +/- 5 words seems to work, with a minimum of 10 occurrences, and be aware of sentence boundaries.
  • Know these words ‘by the words that they keep’, and can be before/after.
  • Frequency can’t be the only measurement – seek mutual information value, and identify if words rarely occur with other words.

Collocation, colligation and related features

  • What about grammatical words? Words do have strong affinities for certain prepositions or occasional articles. Colligation – affinity with a grammatical class (rather than meaning).
  • “For now, word form refers to any word that you may find in a corpus. So ‘fighting’ and ‘fought’ are both word forms. On the other hand, a lemma is what we might call the base form of a word – so the lemma ‘fight’ gives rise to multiple word forms, including ‘fighting’ and ‘fought’.”
  • Semantic preferences – e.g. diamond (often part of a class of gems), but ‘a glass of’ includes drinkable liquids.
  • Discourse Prosody – expresses speaker attitude = important for ‘discourse analysis’.
    • ‘Cause’ often associated with trouble, pain, suffering – subconsciously the word has negative discourse prosody.
    • The way that words in a corpus can collocate with a related set of words or phrases, often revealing (hidden) attitudes.”


  • Are there words that appear more frequently in Corpus A than they are in Corpus B? Can use statistical significance tests.
  • What words are ‘unusually frequent’ in this particular dataset? [I’m thinking here if we did research into words used by those of different religions on Twitter – what words would appear ‘unusually frequently’ in each religion?]
  • Analysts often cut off the top 50-100 keywords to create manageable data, and there must be 20+ keywords, and those distributed across the range of texts (and not bunched in one text/paragraph)
  • Typical keywords: Proper nouns (names), Style/genre markers (grammatical words), spelling idiosyncrasies (British/American English) – for discourse analysis = “the aboutness” of the text – the gist of the text.
    • Once identifying salient words – identifies interesting factors and explain ‘meaning’ and why those words are there.
    • Discover words (especially once run through computing power) that our conscious cognitive abilities would not identify as salient.
    • Can the experiment be replicated – follow the same process, and it should come out the same ‘objectively’.

Change over time and lock words

  • Which words have become steadily less/more frequent – or stayed the same (locked in place) – and what this tells us about cultural values.
  • The Brown Corpus – what were the key shifts happening in language 1931-2006 (4 sample points). E.g. Mrs down, health up and money largely ‘locked’.
  • What have declined?
    • A more informal society as less use of Mr, Miss, etc.
    • A modal verb – less comfortable with ‘imposing’ on people, so this is declining also.
    • Longer forms are contracting – as people seek to squeeze as much as possible into a short a space as possible [e.g. Twitter!]
  • What are lock-words?
    • Weaker modality
    • Wh – question words
    • Body parts
    • Other nouns, including money (we’re still obsessed)
  • Increased use
    • Contracted forms, such as it’s
    • Numbers as 34, rather than thirty-four
    • Social terms
  • Why has the word ‘children’ increased over time?
    • 1990s – fear of danger to children, promoting/supporting children and families… children are being problematized… [That fits with Raising Children in a Digital Age]
    • 2006 corpus – lots of moral panics…

Screen Shot 2014-10-10 at 20.27.09

  • Dominant discourse arising in Britain relating to children.


  • Corpora give us insights into the mechanics of language, and of the society within which that language is being used.
  • They can answer some questions really well, but others not so much – be mindful!
  • Corpora should be linked with other methods for study of language, society, history, etc… which expand the range of studies/findings?
  • Mesh qualitative/quantitative data…
  • Toolbox – use the right tool, in the right combination…

Should McCann Twitter abuser have been doorstepped on TV? for @ConversationUK


A recent piece, published for The Conversation UK, under Creative Commons licence (republished on Durham University):

Should McCann Twitter abuser have been doorstepped on TV?

By Bex Lewis, Durham University

Brenda Leyland, a 63-year old woman from Leicestershire who had been accused of publishing a stream of internet abuse about the family of missing child Madeleine McCann, has been found dead in a hotel room.

Her death raises important questions about the wrongs and rights of how we handle people who express unpalatable views online.

Leyland had been exposed in a Sky News report as the person behind the Twitter account @sweepyface, which had been used to post offensive messages about the McCanns. These included the accusation that Madeleine’s parents were responsible for her disappearance. When confronted by a Sky News reporter about whether she should have posted such messages, Leyland said: “I’m entitled to do that.”

Days before Leyland’s death, BBC Radio 4 ran a story about how the police were investigating abusive social media messages sent to, or published about, the McCanns. Madeleine’s father Gerry McCann featured, suggesting that these messages are fuelled by press reporting. He added that he thinks more people should be charged for internet abuse and revealed that his family tends to avoid the internet because of the nature of threats and insults they receive.

For obvious reasons, the McCanns had encouraged a high-profile press campaign after Madeleine’s disappearance. But without answers about what happened to Madeleine, conspiracy theories have abounded. Brenda Leyland was one of many to discuss the McCann case online. As Rev Pam Smith, one of my Facebook connections said, are we really saying that people are not “entitled” to share adverse views online?

Leyland said she “hoped she hadn’t broken any laws”, but the Malicious Communications Act 1988, which covers Twitter, notes that it is an offence to send messages to another person which are “indecent or grossly offensive”, threatening or false. If the message is intended to cause distress or anxiety to the recipient, they breach the law.

We have to consider whether Sky has a case to answer in this particular situation too though. The broadcaster’s correspondent approached Mrs Leyland on her own doorstep in a live broadcast. She evidently had no idea that she was going to be confronted or that the footage would be broadcast to the world.

Whether or not we like what Leyland had been doing, she was clearly just one of several people who had been expressing their opinions online. She was certainly not the worst. Is doorstepping people, outing them on TV, and ensuring that their face circulates the internet, really the answer? Had Sky done any research into this woman before they put her face in the public domain? Did they know anything about her mental state? Did she just have the misfortune to be the first person who could be made an example of?

Her case carried echoes of the recent media treatment of Cliff Richard. The BBC was heavily rapped for broadcasting live from his home as police raided it. The police of course need to investigate such stories but it is a worrying sign of our culture that trial by media and even trial by gossip appear to have become acceptable.

Media ethics are typically concerned with truthfulness, accuracy, objectivity, impartiality, fairness, public accountability and limitation of harm. After the Leveson inquiry, there has been increased emphasis on press responsibility. But in a time of rapid media change and fast-moving news, broadcasters must ensure they too meet their ethical responsibilities.The Conversation

Bex Lewis does not work for, consult to, own shares in or receive funding from any company or organisation that would benefit from this article, and has no relevant affiliations.

This article was originally published on The Conversation.
Read the original article.

#CORPUSMOOC : Week 1 Notes from @drbexl

Here’s my notes from week 1 of Lancaster University’s MOOC ‘Corpus Linguistics’ (Haven’t got time to do the practical exercise, but this is twigging some thinking re my PhD thesis database!):

Screen Shot 2014-10-01 at 18.03.52 (3)

What is a corpus?

  • A collection of words?
  • It’s a methodology but not a theory of language.

Why might I use corpus linguistics?

  • Look at language ‘as it is’
  • Large amounts of data which are difficult to ID with intuition/anecdotes
  • Large amounts of data show us things we’re doing that we don’t even realise.
  • ID rare/exceptional cases not identifiable in others ways
  • Human beings are slower/less accurate than computers for purposes of this kind of research.

What is your research question/hypothesis?

  • Is the corpus ‘off the shelf’ useful to your question?
  • If you’re developing a corpus – how will you need to define it?
  • 30,000 à billions of words.
  • Needs to be representative of the corpus – e.g.
  • Must be machine-readable (not just a photo of the text) – so that the computer can identify the words
  • It may act as a standard reference for what is typical in language.
  • May be annotated with extra linguistic codes (e.g. grammar)

What is annotation and markup?

Computers do not have the cultural knowledge that we have, so we have to mark-up the text so it can read the nuances, etc.

  • Delimit particular sections as e.g. a ‘heading’, a ‘sentence’, etc. allowing computer to analyse just those areas, etc.
  • Understand how this is done, as the computer can automatically do this, etc. then allows sophisticated searches through the data.

Types of Corpora

Come in different flavours, so different things can be assessed – e.g. date, time, genre, etc. Specific = outline the areas, but are also general ‘corpora’ – especially language (note difference between spoken/written).

  • Think about the shape of spoken language – especially the differences between e.g. the different people you talk to.
  • Parallel, new language, historic material, on-going corpus…

Frequency data, concordances and collocation

A search, how often does it appear, but also how frequently per million words, and what kind of documents/context does it appear within.

  • Think you see a pattern emerging, can ‘sort’ so can start to see patterns emerging [on the basis of which attain themes to identify].
  • Needs a cycle of extraction of data, and analysis, and close reading of relevant parts of the text.
  • Collocation – co-occurance – from which meaning (and possibly) grammar appears – words are not randomly put together – words ‘shade one another’s meanings’ and ‘co-construct meaning’ – seek patterns in language.

Corpora and Language Teaching

This is less relevant to me, but interesting that need to identify the right words that help people understand which words are used frequently, so which to come first within a textbook – could be helpful within digital literacy training.

What can’t we do with corpus?

  • Just because it doesn’t exist in the corpus doesn’t mean it can’t be used – may be rare.
  • As with any scientific method, we are making deductions, not facts.
  • No visual information (pictures or body language) – traditionally people have set aside the visual and focused on the written language, but tools are being developed. * See database methodology – visual material collated in the 1990s for PhD research.

Speaking Up Required in Higher Education…


A really interesting piece about the place of ‘introverts’ in higher education where action learning means not speaking up is no longer an option:

However, university students are no longer allowed to be shy. “Active learning” has become a modern mantra. Students must ask questions, express opinions, lead oral presentations and participate enthusiastically in community projects. To collaborate is sacrosanct. Passivity, on the other hand, is considered the enemy of learning. They must be vocal, expressive and assertive. The extrovert ideal, as Cain calls it, is all the rage.

Read full story, and see my PGCLTHE work re: group work… and this letter response.

Summer Fruitfulness in Academia?

ook5dkSGreat piece on the joys of academic pressure:

It’s not true, of course, that September is entirely grim. Being back at work means colliding with cheery colleagues in the corridor, each of us ruefully clutching course-packs and exchanging sympathetic smiles; it also means spotting suntanned students with newly stylish “second-year haircuts” sauntering around all the familiar spaces. I’ve missed them all. But there is a certain pang that strikes you like the memory of a lost youth – or, more precisely, the memory of an article you were meant to finish, a book proposal you intended to start, a research proposal you thought you would concoct and never did as July sank into August and then, whoops, suddenly September started impudently banging on your door.

Read full article.

Value in a PhD?

An interesting piece on the ‘value’ and employability of a PhDphdgrad

Who would do a PhD? Who would willingly submit to spending endless hours, over three or four years, in the laboratory or library, racked by self-doubt and money worries, in preparation for a career for which vacancies were never more oversubscribed? …

But do doctoral students really feel prepared for life beyond the ivory tower? And how ready are they to embrace it? Here, we speak to five current and former doctoral students from a range of disciplines and universities about why they did their PhDs, what their experience was like and where they see their futures now.

Read full piece, with a couple of more positive responses 28/811/9.


Group Work: Not Pulling Your Weight?

Enthusiastic Study GroupI’m really interested in group dynamics (see mini project undertaken as part of PGCLTHE), so an interesting case study here:

“Academically ambitious students contribute to moving the average up – unenthusiastic students and shirkers do just the opposite,” he added.

Indeed, some hard-working students may even defect to “the dark side” of the slackers if they saw their classmates getting away with minimal effort, he continued. To combat this problem, students were offered the chance to gain a higher mark for their group assignment if they managed to raise the grade scored by the weakest student in individual tests on the same subject.

Read full piece.